Skip to content

miniproject: viral epidemics and non pharmaceutical interventions

Zeyang Charles Li edited this page Sep 24, 2020 · 24 revisions

Do Non-pharmaceutical Interventions (NPIs) have effects on viral epidemic controls?

Owner:

Zeyang Charles Li

Background

NPIs are actions, apart from getting vaccinated and taking medicine, that people and communities can take to help slow the spread of illnesses like pandemic influenza (flu). NPIs are also known as community mitigation strategies [1]. Common NPIs include face masks, social distancing, quarantine etc.

Objective

This miniproject is set to find, in literature, whether the reported NPIs have an effect on controlling viral epidemics.

Methodology

  • Conduct manual binary classifications on communal corpus Epidemic50noCov and create a spreadsheet #1589F0STARTED

  • Create dictionary specific to this miniproject, starting from Wikidata/Wikipedia and build dictionary with amidict #1589F0STARTED

        https://en.wikipedia.org/wiki/Non-pharmaceutical_interventions
    
  • Re-run the query with project-specific dictionary and retrieve a new corpus of 950 papers with amisearch getpapers

  • Section papers to extract paragraphs mostly related to NPIs #f0b215NOT STARTED

Software

  • amidict will be used for creating dictionaries

    ami installation failed due to objective errors. SPARQL is then opted.

    Current step: merging multiple SPARQL queries

  • getpapers for retrieving papers in new corpora

    getpapers paper retrieving failed due to proxy server.

    getpapers runs after proxy setting change, a reduced corpus (k=580) was created.

    Current step: Attempts of reducing corpus size/ work locally.

  • KNIME for data flow

  • R for data analyses and visualisation

Updates

01/08/20

getpapers runs after proxy change and VPN reset. An initial corpus (k=580) was created.

Multiple SPARQL queries results obtained. Attempted merging all using UNION feature. Then attempted removing the redundant terms by DISTINCT

#f0b215BLOCK: errors during merging Instances of and Main Subject queries.

05/08/20

Successfully cloned ami-jar repository (finally)

For cloning a big repo with poor internet connection (low download speed)

  • First increase the postBuffer value

    git config --global https.postBuffer 157286400 
    
  • Then turn off repository compression

    git config --global core.compression 0
    
  • Then partially download a chunk of the repo using depth 1. This helps reducing the connection time with remote host and reduces the risk of fatal clone failures.

    git clone --depth 1 https://github.com/petermr/ami-jars.git
    
  • Once the first part is cloned, finish download.

    git fetch --unshallow 
    
  • Once unshallow task finishes, retype git fetch --unshallow and you should see

    fatal: --unshallow on a complete repository does not make sense
    
  • Change PATH for ami

Binary classification has been done on k=580 corpus but too many false positives (19/20). This could be caused by not refining the search terms in getpaper query.

  getpapers -q "Viral Epidemics and Non-pharmaceutical Interventions" -k 950 -x -o NPIcorpus2

#f0b215BLOCK:

  • Re-ran SPARQL and changed all queries to Instances of but many terms were lost -- terms included in 'main subject' are not present in 'instances of'

  • Significant noise in wikidata main subjects terms and discrepancies between wikidata / wikipedia

09/08/20

Created a new corpus (CorpusNPI2) of 760 articles and started binary classification on both viral epidemics and NPI

Attempted altering PATH for ami but my PATH looks a bit tangled

echo $PATH

/Users/charlesli/.nvm/versions/node/v7.10.1/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Users/charlesli/Desktop/apache-maven-3.6.3/bin 

Curated a NPI dictionary without SPARQL

12/08/20

Successfully installed ami3-2020.08.09_09.54.10 and changed PATH by editing .sh profile

Created a third corpus using query terms "viral epidemics" and "non-pharmaceutical" to minimise noise brought by "interventions", k=464

Downloaded Docker and Jupyter Notebook

Attempted creating a xml dictionary with amidict containing terms from curated dictionary.

17/08/20

Run smoke tests on Docker and Jupyter Notebook

Classified 40 papers from CorpusNPI3 and results were better (24 pos /40) and attempted to commit to github

Created dictionary using amidict and committed to github

https://github.com/petermr/openVirus/blob/master/dictionaries/NPIdict1.xml

#f0b215BLOCK:

  • multiple words (entries) in ami dictionary (solved)

  • synonyms for terms

23/08/20

Installed Anaconda Navigator and ran Jupyter Notebook

Created a .csv file containing dictionary terms

Created new dictionary with phrases for terms NPIdict2

Attempted validation on NPIdict2 and showed dictionary as NULL but with 38 entries, using this syntax

amidict --dictionary NPIdict2 --directory /Users/charlesli/Desktop/NPIdict2 display --validate

Could be the display command??

#f0b215BLOCK:

  • Validation

26/08/20

Attempted debugging validation

Run ami-section on corpus (individually)

07/09/20

Deleted all empty repositories

Sectioned all 437 papers in NPIcorpus2 using ami-section

Run ami-search on NPIcorpus2

#f0b215BLOCK: Errors during ami-search: "cannot read stopward stream"

and " SXXP0003 Error reported by XML parser: Content is not allowed in prolog. java.lang.RuntimeException: cannot transform NPIcorpus2/PMC5959063/fulltext.xml"

10/09/20 - 22/09/20

No updates due to illness.

23/09/20

run ami-search on 437 papers but the results only showed word counts for every word (not specific to my dictionary)

Current Tasks/Challenges

Change PATH for ami without altering existing PATH of MAVEN and git #c5f015 DONE

Rerun getpapers and download a second (third by 12/08) corpus, then manually classify #1589F0 STARTED

Resolve SPARQL merge issues and query noises #c5f015DONE

Add properties (wikidata ID etc.) and synonyms to curated dictionary #f0b215 NOT STARTED

Commit dictionary to github #c5f015DONE

Covert .csv file to xml #c5f015 DONE

Validate new dictionary #1589F0 STARTED

Commit corpus


Reference(s)

[1] https://www.cdc.gov/nonpharmaceutical-interventions/index.html

Clone this wiki locally