Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better retrieve SMILES codes from Wikidata instead of Wikipedia #22

Open
kelson42 opened this issue Mar 28, 2015 · 5 comments
Open

Better retrieve SMILES codes from Wikidata instead of Wikipedia #22

kelson42 opened this issue Mar 28, 2015 · 5 comments

Comments

@kelson42
Copy link

Wikidata is a structured database and it's therefore easier to parse Wikipedia infoboxes.

Have a look for example to the list of entities including a SMILES code:
https://www.wikidata.org/wiki/Special:WhatLinksHere/Property:P233

@lpatiny
Copy link
Member

lpatiny commented Mar 30, 2015

Thank you for your suggestion that looks very interesting.
Seems that in order to be able this tool we will need some help ... and we have many questions ...

  • Could you help us to create the URL that would directly retrieve a JSON containing the article name, SMILES, wikipedia reference
  • How often is the list updated ? Is it real time based on wikipedia article modification
  • Does this property also look in https://en.wikipedia.org/wiki/Template:Infobox_drug ? I'm not sure based on the definition
  • How to do when there is many SMILES (SMILES1, SMILES2, ...) and also comment about the smiles. In some articles we have the smiles for racemate, smiles for R, smiles for S

@kelson42
Copy link
Author

I'll do my best, I'm not a wikidata expert...

How to retrieve the list of Molecules with a SMILE code on Wikidata?
https://www.wikidata.org/w/api.php?action=query&prop=linkshere&titles=Property:P233&lhshow=!redirect&lhnamespace=0&lhlimit=500&format=json
Documentation link:
http://en.wikipedia.org/w/api.php?action=help&modules=query

How to retrieve the details about a Molecule?
https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q132428&format=jsonfm
Documentation link:
https://www.mediawiki.org/wiki/Wikibase/API#wbgetentities

How often is the list updated ?
It's real-time

Does this property also look in Wikipedia?
This is call Wikidata deployment phase2 by us. I'm not sure, but IMO auto. integration between Wikipedia in English and Wikidata is still a discussion topic. On the the Wikipedia in French it works, although this is still not broadly effective https://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Wikidata#Infoboxes_.28phase_2.29

How to do when there is many SMILES?
Not sure to fully understand that question. Maybe this is due to a misunderstanding of what Wikidata is. Wikidata as only one page per molecule and assuming there is only one SMILE per molecule, there is also only one SMILE per molecule wikidata page. You should not have to care then about the infobox in Wikipedia anymore.

@lpatiny
Copy link
Member

lpatiny commented Mar 31, 2015

Thanks for all the answers. We will try to find an internship to improve the wikidata and retrieve not only SMILES but also other interesting parameters and improve our search tool.
Concerning the SMILES problem when multiple SMILES are available the description in
https://en.wikipedia.org/wiki/Template:Chembox (Long parameter list) shows:
Indexed parameters take indexes 'blank'–1–5 (six options). They should have straight input, such as a correct CAS Registry Number. Each parameter can have a comment: |CASNo_Comment=.

Indexed parameters take indexes 'blank'–1–5 (six options). They should have straight input, such as a correct CAS Registry Number. Each parameter can have a comment: |CASNo_Comment=.

Eight base parameters are indexed this way, all identifiers:
CASNo, ChEBI, ChEMBL, ChemSpiderID, DrugBank, InChI, KEGG, PubChem, SMILES, UNII

So in fact we may have up to 6 SMILES and each of them may contain a comment that is sometimes used to described the SMILES or racematate or enantiomerically pure products.

@kelson42
Copy link
Author

If there is many SMILES possible for a molecule, then this should be in wikidata too. In all scenarios involving wikidata you should not have to to parse the the Chembox anymore. We can talk this point next time we meet.

@kelson42
Copy link
Author

I have asked about the support of multiple SMILES codes here:
https://www.wikidata.org/wiki/Property_talk:P233

I have asked about the lack of elementary physical/chemical properties for chemical components here:
https://www.wikidata.org/wiki/Talk:Q79529

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants