Thankfully, Prof. Newman provided me the data for this example. You can get it from the github-repo (TOXICTY.csv).
The data consists of 5 columns:
TTD : Time to death
TANK : Tank
PPT : NaCl Concentration
WETWT : wet weight
STDLGTH : Standard length
Columns 4 and 5 have 70 NA’s (no data available due to measurement error), but we won’t use these in this example. The observations with TTD = 97 are ‘survivors’, since the experiment run only 96 hours.
First we need to create a column FLAG for the status of the animal (dead/alive):
So 1 denotes alive and 2 dead.
Then we can plot the data. Each line is a tank and colors denote the NaCl concentrations.
We see a clear relationship between concentration and the survival curves. In this example we are interested in differences between the duplicates. We see that the two curves for the 11.6 g/L concentration are quite similar, while there is more divergence between tanks in the 13.2 g/L treatment.
We can test for differences using the survdiff function. With the rho argument we can specify the type of test: rho = 0 is a log-rank test and rho = 1 is equivalent to the Peto & Peto modification of the Gehan-Wilcoxon test.
First the log-rank test for each concentration:
We could also run this in a for loop (here the Wilcoxon test):
Basically we get the same results as in the book:
None of the log-rank tests is statistically significant (at the 0.05 level).
The wilcoxon test for the 13.2 g/L treatment shows a p < 0.05.
This is also in agreement with the plot.
The $\chi^2$ values differ slightly but share the same trend - I suspect this is due to different data used.
With this dataset we can do much more. We already saw that there might be a relationship between survival time and concentration, but more on this later (example 4.10).
Code and data are available on my github-repo under file name ‘p176’.