Replication and field experiments

Here is a little summary of what we talked about at stats beerz this week!

Multivariate vs Univariate analyses

The first question was about the relative merits and needs for multivariate vs univariate statistics. Answer was… well it depends on the question(s)! In short, multivariate statistics are interesting when the Y variables are all closely interconnected and interacting, and we are interested in those relationships (e.g. “What were the differences in food diet of dogfish over time?” — a high concentration in squid means that the dogfish ate less shrimps, e.g.). Univariate statistics come in handy when the Ys are very independent (large sense of the word), not inter connected, or when we are interested in a specific Y variable (e.g. “How did the concentration of squid vary in the dogfish diet over time?”). Typically, multivariate statistics are more descriptive and univariate statistics more predictive, but that is not always the case. And of course, both sets of methods can be used together to highlight various aspects of the questions/datasets studied.

Replication and field experiments

I brought up a classic case of questions I get from consultants (full disclaimer, I wasn’t paid for it! ;-), as what I hoped would be a good example of what we are often asked for, as well as a good discussion on replication and pseudoreplication that would benefit us all. Below are some of the conclusions I drew from it:

Problem:

Data was collected on small mammal communities from 2010 to 2014. A large chunk of land was split into polygons that were affected by different types of disturbances (fire, logging) or in a “natural” state of succession. A 100m x 100m grid consisting of 49 traps (rows of 7 x 7) was set up in each polygon and sampled in the spring and fall for 12 consecutive nights. The original goal of the study was to see how different the small mammal communities were in the disturbed polygons compared to the natural ones, and how long it would take for the disturbed communities to ressemble the natural ones. Problem was: there are no replicates! The consultants want to expand the study to include replicates, but in a way that will make it possible to still use the data gathered from 2010 to 2014.

Some of the theory behind replication and pseudoreplication:

Replication of treatments is mandatory if significance tests are to be performed. If no significance tests are intended, replication is not so much of an issue. Easy to confuse is the replication of treatment levels, and the replication within experimental unit; replication within treatments is crucial to assess whether or not the differences seen among treatments are due to the treatments. Replication within experimental units is important to increase the sensitivity and precision of the experiment. For example, if the treatments are applied on polygons, then you need more than one polygon per treatment to assess whether or not the treatments had effects, and more than one sample per polygon to assess the within-polygon variability. However, increasing the number of samples within polygons does not add any degrees of freedom to test for differences among treatments.

Pseudoreplication in manipulative experiments (sensu Hurlbert) occurs when samples are replicated but not treatments, or replicates are not statistically independent – that violates the conditions of independence required by most inferential statistics. Pseudoreplication in mensurative experiments is a little more tricky than with manipulative experiments; Hurlbert defines it as “a consequence of the actual physical space sampled being smaller than the inference space implicit in the inferential statistics tested”. If you do not have replication at the polygon level, you may still use inferential statistics, but you won’t be able to generalize your results beyond the polygons you’re looking at. In other words, you’ll be able to say “the wildlife communities are different between these two polygons”, but not “the wildlife community in treatment 1 is different from the wildlife community in treatment 2” — which may or may not matter depending on your study and objectives. In other words, you are testing for a location difference, not for a treatment difference. That said, replication is often impossible or undesirable when large scale systems are studied; in those cases (involving unreplicated but subsampled treatments), we can still use 95% CI along with means (or standard deviation), and discuss assumed effects of treatments without using any direct tests of significance. Then, we wouldn’t be doing pseudoreplication. There are celebrated examples in the literature of very large scale experiments (watersheds, rivers etc) that could not have replication of treatments, but still made great contribution to science — the ELA experiments by Schindler et al. for example. Those could be used as examples of how to validly interpret results from non-replicated experiments.

As far as the specific case presented:

I would recommend to at least increase the replication within polygons, if it is impossible to increase the replications of treatments (i.e. increase the number of polygons per treatment) – increasing the number of samples per polygon will inform on the within-polygon variability. We need more information to assess more specifically how that would be possible: e.g., what is the home range of those small mammals? Is the assumption that the 100mx100m grid (and the number of traps used) cover the homerange of one population? How was it decided to use 100m x 100m, and 49 traps? Did someone look at the variance within grid (among traps), and over the 12 nights? Three solutions could be suggested to increase the number of samples per polygons: 1) replicate the 100m x100m grid (impossible for budget and logistics), 2) reduce the number of traps per grid to 25, and double the number of grid (unsure how that would change the variance, and how comparable the results going forward would be to those in the past), or 3) reduce the length of the sampling period to 4 nights, and triple the number of grids per polygons (need to look at the data to see how captures varied over the 12 nights, and if that solution is viable). The third option appears the most likely if we could show from the old data that the number of captures plateaud after 4 nights, but we need the data to confirm its feasibility/validity.