Eduard Szöcs

Data in Environmental Science and Eco(toxico-)logy

# Species Sensitivity Distributions (SSD) with R

## Introduction

Species Sensitivity Distributions (SSD) are a central tool for ecological risk assessment (ERA). Species show different sensitivities to chemicals and the variation between species can be described by a statistical distribution. A concentration at which x% of species are affected can be derived from SSDs (= $HC_x$ ). Usually, an HC5 is derived (with 95% confidence interval) and used in ERA.

## Data

In this example, I will generate an SSD for the insecticide Chlorpyrifos (CAS 2921-88-2).

SSDs are generated using data of toxicity experiments (like EC_50 / LC50 values). Such kind of data is available e.g. from US EPA ECOTOX database, ECHA or ETOX.

I prepared some data from the US EPA ECOTOX database for this post. I will skip the data cleaning and data quality checks here - but note, this data has not been checked thoroughly and prepared only for demonstration purposes. However, data cleaning and checking is a very important step for every data analysis.

You can read the data into R with these three lines

## A first look at the data

SSDs are typically displayed as a plot showing the fraction of affected species on the y-axis and the concentration on the x-axis.

To calculate the fraction affected we order the species by their toxicity values and then calculate the fraction:

Then we can take a first look at the data:

## Fitting a distribution to the data

To fit a distribution to the data we can use the fitdistr() function from the MASS package or the more flexible fitdist() from the fitdistrplus package (there are also others). I will use the MASS package here to fit a lognormal distribution to this data.

The mean (meanlog) and standard deviation (sdlog) of the lognormal distribution were estimated from the data. We could fit and compare (e.g. by AIC) different distributions, but I stick with the lognormal here.

## Derive HC5

From the estimated parameters of the fitted distribution we can easily extract the HC5.

## parametric Bootstrap

To be more conservative the lower limit of the confidence interval (CI) around the HC5 is sometimes used. The lower limit of the CI can be estimated from the data using parametric bootstrap.

The idea is:

• generate random values from the fitted distribution
• fit to these random values the distribution
• estimate the HC5 from this new distribution
• repeat many times to assess the variability of HC5 values

Alternatively, also non-parametric bootstrap could be used (resample from the data, not from the fitted distribution).

In R we write a function (myboot()) that does steps 1-3 for us:

We repeat this function 1000 times and get the quantiles of the bootstrapped HC5 values:

So for this data and the lognormal distribution the HC5 would be 0.096 with a CI of [0.046; 0.214].

## A fancy plot

Finally, I generate a fancy SSD plot with predictions (red), with bootstrapped values (blue), CI (dashed) and the raw data (dots):

Note, that the bootstrap function is slightly different: this is because for this plot I wanted to have for each bootstrap a different (blue) line.

## fitdistrplus

** Update on 9. October 2015 **

Although writing a bootstrap function is straightforward, you need to write your own function and understand what’s going on. The fitdistrplus package makes the above steps even more easy:

• 1 - fit the lognormal distribution

This gives identical results as above

• 2 - extract HC5
• 3 - bootstrap CI

This performs a parametric bootstrap, for a non-parametric boostrap set bootmethod = 'nonparam'.

• 4- plotting is again a bit cumbersome

Written on October 9, 2015