Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO.db过于老旧 #57

Open
GuangchuangYu opened this issue Nov 17, 2021 · 2 comments
Open

DO.db过于老旧 #57

GuangchuangYu opened this issue Nov 17, 2021 · 2 comments

Comments

@GuangchuangYu
Copy link
Member

把新的数据,https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkab1063/6424774, 和DO.db比一下,实在是多年没更新。

> 
> x= readLines("HumanDO.obo")
> x[which(x == "[Term]") +1] -> id
> sub("id:\\s+", "", id) -> id
> keys(DO.db::DOTERM) -> id2
> table(id %in% id2)

FALSE  TRUE 
 6793  6562 
> table(id2 %in% id)

FALSE  TRUE 
    8  6562 

或者我们应该打包一个HDO.db的包,自己跟着HumanDO.obo更新。

当然这个先不着急,因为光有DO,用处也不大,我们还是关注对人类数据的挖掘上,需要有human gene -> disease association这样的注释。

https://pubmed.ncbi.nlm.nih.gov/23197658/ 这篇文章的链接已经不可用,没有相应人类基因的注释,我们应该寻找新的注释。

@allenbaron
Copy link

allenbaron commented Nov 17, 2021

I work for the Human Disease Ontology and agree that an update is needed. DO.db hasn't been updated for quite some time and thousands of diseases have been added.

If you are looking for up-to-date gene-disease annotations, one possibility is the data from the Alliance of Genome Resources. The 6 model organism databases that are members of the Alliance have curators that are constantly adding new disease-gene annotations using the Human Disease Ontology and those annotations are kept up-to-date with the latest release of the ontology. This data is available in the "All disease associations" file on their downloads page.

If you decide you want to pursue creating a new R data package to replace DO.db, please reach out to them directly to find out what license their data is released under and whether they would allow you to bundle the data as an R package. They might be willing to do it themselves to make the data more accessible.

An alternative (again depending on the license) could be to provide users of DOSE, functions to download and read in data from the Alliance. Example functions to accomplish this can be found in a package I'm developing (DO.utils, download_alliance_tsv() and read_alliance()).

Feel free to contact me if you have questions or need further assistance. We may be able to suggest other sources.

@GuangchuangYu
Copy link
Member Author

Thanks, @allenbaron. This is very helpful. We will look into it to find a solution to work it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants