Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suggest to include NCBI taxon ids and more #4

Open
jhpoelen opened this issue Jan 15, 2021 · 2 comments
Open

suggest to include NCBI taxon ids and more #4

jhpoelen opened this issue Jan 15, 2021 · 2 comments

Comments

@jhpoelen
Copy link

jhpoelen commented Jan 15, 2021

Hi!

I was just made aware of your exciting mammal-virus association aggregate dataset sourced from existing datasets (see globalbioticinteractions/globalbioticinteractions#585 ).

As I was working reviewing your impressive work, the following thoughts/ideas came to mind:

  1. In https://github.com/viralemergence/clover/blob/main/output/Clover_v1.0_NBCIreconciled_20201218.csv , many of the virus and host taxa have been resolved against the NCBI taxonomy. However, the NCBI taxon ids for host and virus are not included. Did you consider adding these resolved taxon identifiers (e.g. NCBI:txid9606 for homo sapiens) in separate columns like virusNameId and hostNameId . I think this would not only help downstream workflows, but would also be consistent with NCBI's citation guidelines.
  2. You cite the datasets you reused in fields Database and DatabaseVersion, however, no full citation or DOI is provided. Did you consider adding a DatabaseDOI and/or DatabaseCitation to help others retrace the provenance of your host-virus association claims?
  3. re: filename https://github.com/viralemergence/clover/blob/main/output/Clover_v1.0_NBCIreconciled_20201218.csv - you've included version information inside the filename (e.g., v1.0 and 20201218) even though you are use a git as version control. If you leave out this information, others might have an easier time to re-use your data in the future (e.g., no need to update R scripts when you release a new version). Also, I wonder whether _NBCIreconciled_ was meant to be _NCBIreconciled_ (notice NBCI -> NCBI)

Hope this helps and curious to hear your comments / thoughts.

@cjcarlson
Copy link
Member

Thanks so much @jhpoelen - some of these are already on our to-do list and others will be clearer with the incoming preprint! I'll leave this open until the rest are more addressed

@jhpoelen
Copy link
Author

@cjcarlson Thanks for responding and good luck with getting your publication out there!

PS. If you'd like to have your dataset indexed directly by GloBI, please let me know and I can prepare a pull-request with some index configuration (e.g., schema mapping, citation info). Note that Shaw et al. (liampshaw/Pathogen-host-range#3) and Urban et al. (PHI-base/data#2) accepted such pull requests in the past to keep their indexed data up-to-date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants