New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A translator for GEO, Gene Expression Omnibus #3299
base: master
Are you sure you want to change the base?
Conversation
return "dataset"; | ||
// if (url.includes("acc.cgi?acc")) { | ||
// return "dataset"; | ||
// } | ||
// return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to actually check if the page matches here (and ideally support search pages, if this site has them)
newItem.title = text(doc, '#ui-ncbiexternallink-1 > table > tbody > tr > td > table:nth-child(6) > tbody > tr:nth-child(3) > td:nth-child(2) > table > tbody > tr > td > table > tbody > tr > td > table:nth-child(6) > tbody > tr > td > table:nth-child(1) > tbody > tr:nth-child(3) > td:nth-child(2)'); | ||
newItem.abstractNote = text(doc, '#ui-ncbiexternallink-1 > table > tbody > tr > td > table:nth-child(6) > tbody > tr:nth-child(3) > td:nth-child(2) > table > tbody > tr > td > table > tbody > tr > td > table:nth-child(6) > tbody > tr > td > table:nth-child(1) > tbody > tr:nth-child(6) > td:nth-child(2)'); | ||
newItem.url = url; | ||
// url is of format: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE251923 | ||
// subset the last part of the url to get the accession number | ||
const acc = url.split("acc=")[1]; | ||
newItem.identifier = acc; | ||
// status is of format: Public on Dec 27, 2023 | ||
// subset it to get the date | ||
const status_str = text(doc, '#ui-ncbiexternallink-1 > table > tbody > tr > td > table:nth-child(6) > tbody > tr:nth-child(3) > td:nth-child(2) > table > tbody > tr > td > table > tbody > tr > td > table:nth-child(6) > tbody > tr > td > table:nth-child(1) > tbody > tr:nth-child(2) > td:nth-child(2)'); | ||
newItem.date = status_str.split("on")[1].trim(); | ||
// authors is of format: Chen J, Song Y, Huang J, Wan X, Li Y | ||
// push into newItem.creators | ||
const author_str = text(doc, '#ui-ncbiexternallink-1 > table > tbody > tr > td > table:nth-child(6) > tbody > tr:nth-child(3) > td:nth-child(2) > table > tbody > tr > td > table > tbody > tr > td > table:nth-child(6) > tbody > tr > td > table:nth-child(1) > tbody > tr:nth-child(10) > td:nth-child(2)'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't reasonable - I'm guessing output from Chrome/Firefox devtools? We need selectors that will remain stable between pages. It's possible that we'll have to walk through cells in the table and look at the labels ("Status", "Title") to figure out which field is which.
newItem.url = url; | ||
// url is of format: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE251923 | ||
// subset the last part of the url to get the accession number | ||
const acc = url.split("acc=")[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaks if there's more in the query string/an anchor on the URL.
const acc = url.split("acc=")[1]; | |
const acc = new URL(url).searchParams.get("acc"); |
// status is of format: Public on Dec 27, 2023 | ||
// subset it to get the date | ||
const status_str = text(doc, '#ui-ncbiexternallink-1 > table > tbody > tr > td > table:nth-child(6) > tbody > tr:nth-child(3) > td:nth-child(2) > table > tbody > tr > td > table > tbody > tr > td > table:nth-child(6) > tbody > tr > td > table:nth-child(1) > tbody > tr:nth-child(2) > td:nth-child(2)'); | ||
newItem.date = status_str.split("on")[1].trim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ZU.strToISO(status_str)
will extract the date fine, no need to split.
(Also rename to use camelCase, not snake_case.)
@@ -0,0 +1,141 @@ | |||
{ | |||
"translatorID": "5a325508-cb60-42c3-8b0f-d4e3c6441059", | |||
"label": "GEO", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call this NCBI GEO
for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your detailed review. I will refine the code when I have time
No description provided.