User:ChemConnector

From WikiProjectMed
Jump to navigation Jump to search

Antony Williams

I am Antony John Williams, the "ChemConnector".

I am an NMR Spectroscopist by training with an interest in small molecule structure elucidation, NMR processing algorithms and Computer Assisted Structure Elucidation (CASE).

I was in the commercial world of cheminformatics for over 10 years as the Chief Science Officer of a chemistry software company. During that time I managed a number of products including structure drawing, structure databasing, systematic nomenclature tools, analytical data processing tools, NMR prediction tools, CASE tools, workflow and sample management tools and so on.

Check out my Scholia page here: https://tools.wmflabs.org/scholia/author/Q4777220

ChemSpider

I was the Host of a hobby project ChemSpider and I used to go by the name ChemSpiderman on Wikipedia. ChemSpider was a great success in the community and was acquired by the Royal Society of Chemistry and I now work for them as the VP of Strategic Development. I still operate as ChemSpiderman in regards to the project but in my non-RSC life I remain an active scientist and have assumed the role of a ChemConnector. I host a blog : ChemConnector Blog and you can find me on Twitter as @ChemConnector.

A passion for Quality

My passion is in the delivery of chemistry related information to as many people as possible but with a focus on the QUALITY of that information. I spend a lot of time looking at data and while I can be a big picture guy when designing systems I enjoy the details too. I remain engaged in the Wikipedia Chemistry Curation project.

The Wikipedia Curation project

I continue to look at all the chemical structures on Wikipedia to check for consistency between structures and names and between those and external systems. I have already blogged about my efforts in this area around Taxol.

Returning to Wikipedia after a hiatus - Chemistry Curation AGAIN

After a long hiatus away from Wikipedia, other than some minor contributions, I am returning to my work that I engaged with previously with people like DMacks, Walkerma, Beetstra to curate chemistry and chemical compounds. Now, because of my work with the CompTox Chemicals Dashboard I believe that, because of the greater level of curation of the data (full time curators working daily to improve the data quality) and the focus of the database rather than simply assembling the largest database (875k as of April 2019 versus 71 million for ChemSPider and 91 million for PubChem) I believe that our data quality is much higher. I have been able to identify 15,010 unique inorganic and organic chemical compounds to date on Wikipedia (excluding polymers at present). My intention is to provide as many systematic names, SMILES, InChIStrings and InChIkeys as possible for all chemicals so that a comparison exercise with what is presently in the Drug and ChemBoxes can be done. I can tell that there are already incorrect SMILES and InChIs for a number of chemicals and would like to clean this up if possible AND finally provide a single source of Wikipedia Chemicals in SDF format that can be downloaded, modeled, profiled, etc. For example this is a SUBSET : WIKILIST: Additives in cigarettes but I am building a series of these: WIKILIST: Extremely hazardous substances.

I am looking to see whether anyone in Wikipedia Chemicals world would be willing to help me harvest Names and CASRN to associate with Wikipedia Chemical articles and ChemBox/DrugBox so I can continue the work of building the definitive file?

I am wondering: are you reading (scraping) the enwiki articles for data? Did you consider working with Wikidata? (Wikidata has huge differences wrt enwiki, but is much more structured and allows mass edits). -DePiep (talk) 13:10, 22 April 2019 (UTC)