Web scraping chemical data with R
Update: These functions have been integrated into the webchem package and the functions removed from the esmisc package!
I wrote up some functions to interact from R with these servers. You can find them in my esmisc package:
These functions are very crude and need some further development (if you want to improve, fork the package!), however, here’s a short summary:
Covert CAS to SMILES
Suppose we have some CAS numbers and want to convert them to SMILES:
Note, that ChemSpider requires a security token. To obtain a token please register at ChemSpider.
Retrieve other data from CAS
All these web resources provide additional data. Here is an example retrieving the molecular weights:
ChemSpider and PubChem return the same values, however the results from cactus are slightly different.
Retrieve partitioning coefficients
Partition coefficients are another useful property. LOGKOW is a databank that contains experimental data, retrieved from the literature, on over 20,000 organic compounds.
get_kow() extracts the ‘Recommended values’ for a given CAS:
This function is very crude. For example, it returns only the first hit if multiple hits are found in the database - a better way would be to ask for user input, as we did the taxize package.
Currently I have no time to extensively develop these functions. I would be happy if someone picks up this work - it’s fairly easy: just fork the repo and start.
In future this could be turned into a ROpenSci package as it is within their scope.