diseasy

Ideas

Compare text vs. semantic similarity
Bake-off various language distance metrics
Various method of distance too
Compare to random

Can you use text comparison methods to find similarities between human diseases and zebrafish phenotypes?

Or do you need a very custom mapping via ontologies? The original design was to do something like this. Good idea to first do a bunch of comparisons and then determine if you need to develop something new.

Download a bunch of python-based text comparision libraries
Start some simple bake-offs
Figure out how to do the random model
Which are our gold standards?

Q: What does failure look like? A: Random is indistinguishable from real diseases

Q: What does success look like? A: Gold standards are found

Clustering

Compare human diseases vs. human diseases Compare zf phenotypes vs zf phenotypes

textcompare1.py

install conda pip3 install nltk scikit-learn transformers torch fasttext

curl https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz --output cc.en.300.bin

textcompare2.py

pip3 install -U sentence-transformers

works

textcompare3.py

pip3 install tensorflow tensorflow_hub

works

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
archive		archive
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
averager.py		averager.py
comparescores		comparescores
comparescores_all		comparescores_all
comparisonAverages.Rmd		comparisonAverages.Rmd
comparisonAverages.html		comparisonAverages.html
diseasy.json		diseasy.json
h2h.aves		h2h.aves
h2z.aves		h2z.aves
methodstats.py		methodstats.py
parsedoid.pl		parsedoid.pl
reader.py		reader.py
semcmp.py		semcmp.py
setup.py		setup.py
template.json		template.json
txtcmp.py		txtcmp.py
z2z.aves		z2z.aves

License

KorfLab/diseasy

Folders and files

Latest commit

History

Repository files navigation

diseasy

Clustering

About

Resources

License

Stars

Watchers

Forks

Languages