openVirus

aggregation of scholarly publications and extracted knowledge on viruses and epidemics.

UPDATE! (20210813)

We now have a python version of getpapers and ami. There are many workflows that are being developed where we are annotating the literature at a sentence-level instead of document or section level (see docanalysis). Most of the work now happens in

docanalysis (https://github.com/petermr/docanalysis)
CEVOpen (https://github.com/petermr/CEVOpen/)
pygetpapers (https://github.com/petermr/pygetpapers/)
pyami (https://github.com/petermr/pyami)
dictionary (https://github.com/petermr/dictionary)

We still parts of technology and the workflow developed @ openvirus.

NOTE

This site is to develop knowledge resources and tools to help tackle the COVID19 outbreak. It is NOT a guide to public COVID information. The actual content created from the site is drawn from reliable sources (journals, guidelines) but has NOT been filtered or reviewed.

tech background

All are welcome to participate. We assume a basic level of running programs (commandline, R, text editing) and - initially - won't be able to hand hold. However we know from experience that people can learn very fast, so feel free to dive in and try the tech.

discipline background

This site is initially created by scientists in the bioscience/chemical area but without discipline knowledge of epidemiology, health care, virology, societal aspects, etc. .

background

The world faces (and will continue to face) viral epdemics which arise suddenly and where scientific/medical knowledge is a critical resource. Despite over 100 Billion USD on medical research worldwide much knowledge is behind publisher paywalls and only available to rich universities. Moreover it is usually badly published, dispersed without coherent knowledge tools. It particularly disadvantages the Global South.

This project aims to use modern tools, especially Wikidata (and Wikpedia), R, Java, textmining, with semantic tools to create a modern integrated resource of all current published information on viruses and their epidemics. It relies on collaboration and gifts of labour and knowledge.

goals

to collect all freely visible scientific/medical publications on COVID19, viral epidemics and transform them to uniform form.
to use Natural Language Processing (NLP) and textmining so machines can extract meaning from the articles.
to build dictionaries of terms related to viruses and viral epidemics for (a) search (b) classification (c) understanding.
to collect knowedge and publish it in WikiJournal of Medicine (a peer-reviewed OA journal with an emphasis on review)

diary/blog

I have started a diary which I hope to complete each day. No consistent structure. Will praise and challenge.

how we will work

This is a digital knowledge-based project (i.e. no laboratory or clinical work). It is open to all who are prepared to contribute components of the system.

Some examples of the skills and knowledge required within the project:

Wikimedia (esp. Wikipedia, Wikidata, Wiki technology, WikiJournal)
Scholarly publications including preprints
Scraping web pages and building metadata
SPARQL/RDF , XML, JSON
Textmining , supervised and unsupervised
Virology
Epidemiology
Computation
Societal aspects of disease (e.g. public health policy).
Language translation (with a scientific emphasis)
Git and Github
Open collaborative projects

Our initial framework is based on simple dictionaries and ontologies (e.f. RDF, XML), public sources of scientific articles (especially preprints and country-specific inclusivity (e.g. Latin America , Redalyc, SciELO)). Current software is mainly Java, R, Node, Python but as the data are exposed as text files a variety of tools can be used).

scope

Initially we will use papers retrieved by "coronavirus" . Typical results are:

Europe PubMedCentral (EPMC): 6563 papers
biorxiv preprints: ~400
medrxiv preprints: ~300
SciELO ...
Redalyc ...

languages and countries

COVID19 is a global emergency and it's critical that knowledge is global, not centered on the North Atlantic regions. We want to see other languages and other nations involved. As a start we are developing a scraper for Latin American OpenAccess publications, initially the Redalyc server.

tasks

We will list tasks on github.com/petermr/openVirus/issues. These are things we have to do including components, integration, bugs, tutorials, etc. There may soon be a large number of "Open" Issues - this should be seen as positive - some issues are ongoing and don't get closed.

Open Notebook publication

We are using the Open Notebook philosophy fo Jean-Claude Bradley and implicitly of Wikimedia content and of many Free/Open Software projects. Everything is posted publicly as soon as it is created. That means that every iteration is visible and will almost certainly contain bugs/errors. Each subsequent commit fixes some of these. We know from past experience that this is the quickest way to create high-quality content and also gives a feeling of communal ownership.

Installation

Installation and Docker instructions can be found at INSTALLING.md.

Our Progess and Future Directions

We have made quite a progress since the start of openVirus. Please refer to our Home wiki page for more information. In short, we are switching to Jupyter Notebook, as Python is an efficient way of doing things.

In addition, we have two new repositories. One, explicitly for Dictionaries and, two for our development works.

Dictionary: https://github.com/petermr/dictionary
openVirus Development: https://github.com/petermr/openvirusdev

NOTE for Hackathon

See hackathon EUvsVirus.org for weekend EUvsVirus hack this weekend (24-26 April) and our project contentmine-scientific-knowledge-for-all . You may also wish to register on their site (https://euvsvirus.org channel t_contentmine_ffr2axj4x) as well..

Name		Name	Last commit message	Last commit date
Latest commit History 877 Commits
.vscode		.vscode
Machine_Learning		Machine_Learning
Wikicite Presentations of presenters		Wikicite Presentations of presenters
assets		assets
biorxiv700		biorxiv700
biorxiv_medrxiv		biorxiv_medrxiv
cambiohack2020		cambiohack2020
coordination		coordination
covidseq		covidseq
diary		diary
dictionaries		dictionaries
docs		docs
examples		examples
jupyter		jupyter
logs		logs
miniproject		miniproject
outreach		outreach
runtime-New_configuration		runtime-New_configuration
software		software
subprojects/medrxiv		subprojects/medrxiv
textIndexing		textIndexing
workflows		workflows
ACHITECTURE.md		ACHITECTURE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
EUvsVirus.md		EUvsVirus.md
INSTALLING.md		INSTALLING.md
LICENSE		LICENSE
OVERVIEW.md		OVERVIEW.md
OpenPub2020.pptx		OpenPub2020.pptx
README.md		README.md
WDNetworkVis.nb.html		WDNetworkVis.nb.html
Wishlist.md		Wishlist.md
dictionaries.xml		dictionaries.xml
wikiPackageTesting.R		wikiPackageTesting.R

License

petermr/openVirus

Folders and files

Latest commit

History

Repository files navigation