Skip to content
ShweataNHegde edited this page Nov 7, 2020 · 18 revisions

What is openVirus?

OpenVirus is a project that aims to develop knowledge resources and tools to help tackle the COVID-19 outbreak.

The world faces and will continue to face viral epidemics that arise suddenly. Scientific and medical knowledge is a critical resource for battling epidemics. Despite over 100 Billion USD being spent on medical research worldwide, much knowledge is behind publisher paywalls and is only available to rich universities. Moreover it is usually badly published, dispersed without coherent knowledge tools. This particularly disadvantages the Global South.

This project aims to use modern tools, especially Wikidata (and Wikipedia), R, Java, text mining, with semantic tools to create a modern integrated resource of all current published information on viruses and their epidemics. It relies on collaboration and gifts of labour and knowledge.

How do I get started?

The main documentation is on the Wiki - see the sidebar. https://github.com/petermr/openVirus/wiki/GETTING-STARTED will list some of the most important topics

How Can I Help?

Take a look at the project README and the How Can I Help section of the FAQ

Feel free to raise issues or ask questions on the project issue tracker.

Our Progress So Far:

1. Mini-Projects

We have 8 mini-projects. The details about each one of these can be found below:

Owner and Collaborator of the Mini-projects Mini-Project Dictionary
Ambreen H, Pooja Pareek, Ayush miniproject: viral epidemics and country (What countries do viral epidemics occur in?) Country Dictionary
Priya, Dheeraj Kumar miniproject: viral epidemics and disease (What diseases co-occur with epidemics? Not necessarily causation) Disease Dictionary
Pruthiv Rajan, Urja Biswas, Israel miniproject: viral epidemics and drugs (What drugs are used during epidemics) Drug Dictionary
Vaishali Arora, Simranleen Singh, Shweata Hegde miniproject: viral epidemics and funders (Which funders support research on viral epidemics?) Funders Dictionary
Charles Li, Anugrah miniproject: viral epidemics and non pharmaceutical interventions (What non-pharma interventions are used during epidemics? ) NPI Dictionary
Kareena Singh, Jitu Ram Bhargav miniproject: viral epidemics and viruses (What are the main viruses causing epidemics) Virus Dictionary
Sana Saifi miniproject: viral epidemics and zoonoses (what is the role of zoonosis i.e.,animal hosts?) Zoonosis Dictionary
Vanisha Arora, Om Prakash Miniproject:Testing and tracing in viral epidemics (Who reports Test and Trace strategies) Testing and Tracing Dictionary
  • All the dictionaries are made available on our Dictionary GitHub page.

2. Jupyter Notebooks

We have done a lot of work in Jupyter Notebook. Follow the links below to find out more.
https://github.com/petermr/openVirus/tree/master/jupyter https://github.com/petermr/openVirus/blob/master/Wikicite%20Presentations%20of%20presenters/Wikimedia_Hamadani_1.ipynb https://github.com/petermr/openVirus/blob/master/Wikicite%20Presentations%20of%20presenters/Wikimedia_Hamadani_2.ipynb

Future Directions

1. The 8 projects are roughly split into:

  • Country, Disease, Drug, Organization These are generic to almost any biomedical project. We should continue to clear them up and maintain them and offer them to the world. They are clearly based directly on Wikidata. The next is specific to viruses and epidemics and less well developed. They probably need more cleaning
  • Human Virus, TestTrace, NPI, Zoonosis.
    Action: Create a (meta-)dictionary project with a specification, testing/validation so that we know dictionaries are fit for purpose.

We also need,
(i) a micro-test corpus for testing code (locate within AMI, e.g. ZIKA10)
(ii) a tutorial/test corpus (e.g. 200 entries)
(iii) larger more specific corpora for "research"

2. Notebooks and Exploration:

We are basing the next phase on Python libraries within Notebooks. We believe that everyone can run simple Python calls to numpy, pandas, nltk, matplotlib and later scikit-learn (and maybe other tools - word2vec and keras). ami-picocli can be run from Notebooks (but needs installing). This allows us to extract sections from Ctree documents and do powerful exploratory work. Everyone can practice this. Exploration shows us what we might be able to do. However science requires us to validate and test the results and all community code must be reviewed, tested and validated.

3. Software development/refactoring

We now have enough software-experienced people to start developing new software, initially through Python libraries. The software includes:

a. rewriting getpapers in Python to be more maintainable and extend to new sources
b. ami-words so we can do word frequencies from CProjects
c. ami-search for better search and more tools such as regex, abbreviations, etc.

(b. and c. will possibly be multilingual).
ALL software development should be testable (Python unit test and be test-driven-development TDD).

We expect there to be 4-5 software miniprojects.

  • dictionaries/validation/maintenance
  • getpapers
  • ami-words
  • ami-search
  • Containerization using Docker

Every project must have:

  • an Issue
  • at least 2 people, one of whom should represent the users
  • Every project should be reviewed/cross-validated by non-project members.

New repo for the developmental purpose has been created to better manage our work. Link to openvirusdev GitHub page can be found here.

Clone this wiki locally