Eduard Szöcs

Data in Environmental Science and Eco(toxico-)logy

Web scraping pesticides data with R

Update: The function allanwood() has been integrated into the webchem package and the function removed from the esmisc package!

This week I had to find the CAS-numbers for a bunch of pesticides. Moreover, I also needed information about the major groups of these pesticides (e.g. herbicides, fungicides, …) and some of them were in German language.

ETOX is quite useful to find the CAS-numbers, even for German names, as they have also synonyms in their database.

Another, useful source is the Compendium of Pesticide Common Names by Allan Wood.

Since I had > 500 compounds in my list, this was feasible to be done manually. So I wrote two small functions (etox_to_cas() and allanwood()) to search and retrieve information from these two websites.

Both are available from my esmisc package on github.com. These are small functions using the RCurl and XML packages for scraping. They have not been tested very much and may not be very robust.

Query CAS from ETOX

require(esmisc)
etox_to_cas('2,4-D')
## [1] "88-85-7"

If you have a bunch of compounds you can use ‘sapply()’ to feed etox_to_cas:

sapply(c('2,4-D', 'DDT', 'Diclopfop'), etox_to_cas)
##     2,4-D       DDT Diclopfop 
## "88-85-7" "50-29-3"        NA

Query CAS and pesticide group from Allan Wood

Update: The function allanwood() has been integrated into the webchem package and the function removed from the esmisc package!

allanwood('Fluazinam')
sapply(c('Fluazinam', 'Diclofop', 'DDT'), allanwood)
Written on November 8, 2014