What is openVirus? (OpenPublishingTalk)

what is openVirus?

an 8-minute presentation for https://openpublishingfest.org/calendar.html#event-178 on 2020-05-27

Origins and motivation

Sparked by realisation around 2020-02 that there was no simple way for citizens to find scientific information on the COVID-19 epidemic. A group of activists released about 5000 papers from SciHub [1]. Possibly in response a number of closed access publishers released a few thousand articles into the "CORD-19" database. this was restricted to viruses or COVID. (It's since developed into a larger user community). content was largely JSON.

We were working on OpenClimateKnowledge (OCK), for citizens to extract knowledge from the distributed scientific literature. When COVID-19 hit, we decided to use the same technology to tackle viral epidemics.

We felt that the selection of a very narrow section of the scientific literature , selected by commercial publishers, was a minimal response. With simple searches we found that 60-90% of the literature was still closed for topics such as aerosols, masks, ventilators, social distances, legal issues and many others. Citizens are confined to information on:

topics selected by publishers
sources of content restricted by current systems

openVirus was created as a citizen volunteer community to create tools and sources for citizens to ask their own questions of their own sources.

Principles

to welcome Open (free to use, re-use and re-distribute)
to create a single point of entry for searching the Open Literature
to provide a toolset that citizens could download, modify and use
to create a Wikidata-based query, using simple dictionaries that citizens can create and modify
to create an atmosphere where a community can grow.
to emphasize globalness such as multilinguality and GlobalSouth publications.
to use the most appropriate Open solutions. Collaborate not compete.

Strategy

largely carried out by users on their own machines.

Many resources are server-centric and offer limited chance of systematic download.

build scrapers or API query tools for Openly readable sources.
query or scrape user questions
download raw content (PDF, HTML, images) - 10 - 10,000 articles
clean and semantify
annotate with dictionaries
expose , analyze, display.

Sources

EuropePMC
biorxiv and medrxiv
DOAJ
EThOS
Redalyc (MX)

Toolkit

any tool can be included as long as it can communicate through files on local storage in our CProject format.. This is not an exclusive list.

framework: ami + CProject data
scrapers: getpapers, Ferret, curl, scrapy
cleaners: PDFBox, Tidy/Jsoup, etc. Grobid
transformers: xml2html, ami ocr, KNIME
dictionaries: ami dictionary
indexing and annotation: Solr, ami
Analysis and display: R, KNIME

The central philosophy is a defined *semantic universal data structure, CProject. The tools can be varied or swapped.

Contributors

Remko Popma,
Lezan Hawizy, Tim Voronov,
Andy Jackson,
Clyde Davies,
Thomas Shafee,
Priya JK , Kareena Singh,
Simon Worthington, (check omissions)

Endproduct

toolkit
dictionaries
tutorials
citizen openVirus downloadable or boxed

====

[1] Bender, Maddie (3 February 2020). "'It's a Moral Imperative:' Archivists Made a Directory of 5,000 Coronavirus Studies to Bypass Paywalls". Vice. https://www.vice.com/en_us/article/z3b3v5/archivists-are-bypassing-paywalls-to-share-studies-about-coronaviruses

[CORD-19] (https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly