# Analysing mesocosm experiments using Generalized Linear Models

### Introduction

The first post of this blog was about analysing mesocosm data using Principal Response Curves (PRC). At the of the second part of this post I mention alternatives to PRC. I picked up this topic for my Master’s Thesis, which resulted in a paper that was published two weeks ago.

Based on 11 mesocosm data sets (thanks to the co-authors who kindly shared their data) I compared 3 different methods to analyse such data. In this post I show how to analyse mesocosm data using Generalized Linear Models for multivariate data ($GLM_{mv}$).

Compared to PRC these have a few advantages:

- no need to choose a transformation of the community reponse, GLMs provide a natural way to model count data.
- better representation of the responding taxa.
- there is some indication that the models have higher power (Warton et al. (2012), Szöcs et al. (2015)*).

Nevertheless, there are also some drawbacks:

- $GLM_{mv}$ are computationally more expensive
- The graphical presentation of results (but see below).

In this post I will demonstrate how to analyse the chlorpyrifos data set also used in the first post.

### Setup and data set

First we load some packages that I use in this post.

The abundance data we will use come with the `vegan`

package. For demonstration purposes and to save computation time I use only eight species (of the 178). Moreover, the data in this data set was `log(10 * x + 1)`

transformed and I backtransform to the raw counts.

The explaining variables (time, treatment and replicate number) are not shipped with `vegan`

but can be easily generated:

As always before a data analysis we first take a look at the raw data to get an impression what’s going on.

We see that *caenhora* and *cloedipt* show a strong treatment related negative response with a recovery after some weeks.
This is also the case for *chaoobsc*, but weaker. *olchaeta*, *binitent* and *gammpule* might profit from treatment.
*libellae* and *agdasphr* are low in abundance and we cannot much about their response.

### Fit and check models

To fit GLMs to these eight species we use the `mvabund`

package (Wang et al. (2012)).
We first need to create a mvabund object from our abundance data

Then we can try to fit a Poisson GLM with treatment, time and their interaction as predictors:

But does this model fit to the data? We can use residual plots to check the model:

The residual plot shows a pronounced fan-shape, which indicates that this model does not fit very well to the data! Let’s try a negative binomial model (which allows accounting for overdispersion):

This fits much better, we will stick with this model…

### Test hypotheses

To test hypotheses we must take the repeated measure design into account.
As in PRC, we do this by using restricted permutations (maybe there will be multivariate mixed effect models available in the future).
One permutation design would be to permute whole time series of replicated ditches (`plots`

), keeping the order of samplings (`Within(type = 'none')`

).
This is easily done via the `permute`

package.

The object `permuations`

is a permutation matrix which we will use later in hypothesis testing.

#### Treatment x Time Interaction

One hypothesis we could test is ‘The treatment effect is constant over time’, which is the interaction between treatment and time. To test if we have a statistically significant interaction, we first fit a model without the interaction term

Then we compare the two model using a Likelihood-Ratio Test. This is done with the `anova()`

method

The output is twofold - first we get a multivariate test:

Which indicates that there is an interaction between treatment and time.

And second we retrieve a univariate test for every species. Here are just the p-values displayed:

This indicates that *caenhora*, *cloedipt* and *chaoobsc* show treatment effect varying with time.

#### General treatment effect

We could also compare the model with interaction to a model containing only time as explaining variable. This gives us a test for the total treatment effect (treatment + treatment x time):

#### Period of treatment effect

Now that we know that the treatment effect is varying with time, we could further explore this interaction by testing the treatment effect at every sampling date. This is similar to PRC.

For every week I fit a model with treatment as explaining variable and the run a Likelihood-Ratio test. Moreover, I also return the summary of the models (will be used later for plotting). Note, that I used only 100 permutations here to safe computation time.

From the resulting list we can extract informations that we are interested in. E.g. we could extract the p-values at every sampling week:

This indicates a treatment effect on community structure from the day of application till week 12.

Note, this test is different to the method described in the PRC post.
There I used the log-transformed treatment as **continuous** predictor (regression design). In contrast, I use treatment as **categorical** predictor (anova design) in this example, which explains the differences.

### Plotting treatment effect

Let’s further investigate what’s going on graphically. How dose the treatment effect on the community over time look like? Which species are when responsible for this effect?

To plot the effect of treatment I extract the species deviances (a sum-of-squares equivalent). Species with a high deviance show a strong treatment effect. The community effect is then simply the sum of species effects/deviances.

And here is plot of treatment effects over time:

We see that the treatment effect increases after chlorpyrifos application (Lo and behold!) and then lowers off (but not to the initial level).
Moreover, wee see that the contributions to the effect also change with time:
Shortly after application till week 8 *caenhora*, *cloedipt* and *chaoobsc* are responsible for the treatment effect. However, after week 8 the contribution of *gammapule* increases. Other taxa play only a minor role.

Of course, we could extract this information also numerically, similar to the explained variance in PRC.

### A PRC like plot

We could also draw a PRC like plot (using the summary object from above). The procedure is always similar: extract the necessary information (coefficients in this case), prepare it, plot it.

This plot basically show the same patterns as the PRC.

Getting a species plot is a bit harder.
The species information displayed in standard PRC plot condenses a lot of information.
Basically, the plot shows the general species responses.
Therefore, I the general treatment test (`mod_treat_aov`

from above) for this plot.

I first extract and calculate contribution of each species to the community response from this test:

Then I just plot the species names according to their contribution.

The only difference to the PRC is that the direction of response is not displayed.

However, the plot showing the species contribution over time (see “Plotting treatment effect”) is much more informative

### Outlook

I hope this tutorial showed that applying GLMs for multivariate data to mesocosm data is not harder than performing a PRC in R. However, currently there is no alternative to R and no graphical user interface (gui). PRC can be also run in the commercial CANOCO software package with a nice gui. This may hinder the adoption of these models to ecotoxicology and easy to use software is needed.

### References

- Warton, D.I., Wright, S.T., Wang, Y., 2012. Distance-based multivariate analyses confound location and dispersion effects. Methods in Ecology and Evolution 3, 89–101.
- Szöcs, E., Van den Brink, P.J., Lagadic, L., Caquet, T., Roucaute, M., Auber, A., Bayona, Y., Liess, M., Ebke, P., Ippolito, A., ter Braak, C.J.F., Brock, T.C.M., Schäfer, R.B., 2015. Analysing chemical-induced changes in macroinvertebrate communities in aquatic mesocosm experiments: a comparison of methods. Ecotoxicology. doi:10.1007/s10646-015-1421-0
- Wang, Y., Naumann, U., Wright, S.T., Warton, D.I., 2012. mvabund- an R package for model-based analysis of multivariate abundance data. Methods in Ecology and Evolution 3, 471–474.

(*) Note, the main criticism from the reviewers was that I cannot deduce power differences from the analysis of these 11 data sets. E.g. Figure 1 in Szöcs (2015) shows lower p-values for GLM, which indicates a higher power. This annoyed me a little, so I did a (univariate) simulation study. It is currently under review and I’ll post soon about it.