Skip to content

MiniProject:Testing and tracing in viral epidemics

vanishaarora881 edited this page Oct 1, 2020 · 39 revisions

Miniproject:

Testing and tracing in viral epidemics

Owner:

Vanisha Arora

Collaborator:

Om Prakash

Summary:

The project aims to extract information about various "Tests" to diagnose the viral infections during the epidemic and also to encircle various ways to prevent its further spread.

  • "Contact tracing" is the process of identification of persons who may have come into contact with an infected person ("contacts") and subsequent collection of further information about these contacts
    The goals of contact tracing are:
  1. Interrupting ongoing transmission and reduce the spread of an infection.

  2. To offer diagnosis, counseling and treatment to already infected individuals

  3. If the infection is treatable, helping prevent reinfection of the originally infected patient

  4. Interrupting ongoing transmission and reduce the spread of an infection.

Objective:

This miniproject is based on extracting information about testing and tracing done in the viral epidemics.

Methodology:

🟨 Conduct binary classification on communal corpus "EpidemicnoCov50" and create a spreadsheet.

🟨 Creating dictionaries for the project using AMI . Searching for the test names for disease diagnosis in the scientific literature.

🟨 Dictionary:Testing and Tracing Dictionary (https://github.com/petermr/openVirus/blob/master/cambiohack2020/dictionaries/testTrace.xml)

🟨 Downloading a corpus of 250 articles using getpapers. using quer"Testing and tracing in viral epidemics).

🟨 Using the ami section for the Sectioning of the papers.

Command used: ami -p name of directory section (The directory having the corpus)

🟨 Running ami search on the corpus for searching the terms of the dictionary in the corpus.

Command used: ami -p name of directory search --dictionary test_trace

🟨 Multi dictionary search.

🟨 Jupyter notebooks

🟨 Annotation of the corpus.

🟨 Machine learning

How I created dictionary:

Following steps were followed:

  • Create a text file (.txt) containing a list of Terms related to "Testing and contact tracing"(From wikipedia or through research papers.) The terms include the tests to diagnose the virus during an epidemic and also tracing terms.

  • Meanwhile, create a directory by giving command in the command prompt as : mkdir mydictionaries This is the ouput directory where you are going to get the dictionary.

  • Open the command prompt and give the command as: amidict -v --dictionary testing_and_tracing --directory mydictionaries --input test_trace.txt create --informat list --outformats xml,html

  • The input file in the syntax is the txt file with the terms.

  • After giving the above command, it took a while to create the dictionary.

  • Open the folder 'mydictionaries' in the system, the dictionary is created as both xml and html file.

Link to the dictionary: https://github.com/petermr/openVirus/blob/master/dictionaries/test/test_trace.xml

This dictionary includes only names and terms. Addition of other attributes require a different syntax.

Adding wikidata ids and description to the dictionary:

Given the command:

amidict -v --dictionary testTrace --directory mydictionaries --input test_trace.txt create --informat list --outformats xml,html --wikilinks wikipedia, wikidata

Link to the dictionary: https://github.com/petermr/openVirus/blob/master/cambiohack2020/dictionaries/testTrace.xml

The above dictionary includes wikidataID,URLs description as well. The above dictionary is valid.

Committing the corpus :

Tried committing through: ✔️ Github desktop

If you are using Github desktop to commit: Following steps are followed:

  • Install Github desktop from : https://desktop.github.com
  • Clone the repository openVirus into the system using Gitbash command line : git clonehttps://github.com/petermr/openVirus.git
  • Open the folder where you want to upload your CProject.
  • Paste your project to the folder in openVirus repository(our remote repository) where you want to commit the files.
  • Open the Github desktop.
  • Go to 'File', then 'Add Local Repository'.
  • Now, choose the openVirus repository from your system.
  • Add a commit message and go to 'Commit to master'.
  • After committing, go to 'Push to origin'.
  • After completion of pushing the repository, your uploaded files can be viewed on the Github repository.

Issues faced:

  • Existence of the lock file in the repository, which has to be deleted to proceed. ✔️ Suggestion is to delete the file if any before starting to commit to avoid the wastage of time.
  • Connectivity issue : A good internet connection is required.

⚠️ PMR committed a corpus of 500 for me.

*Used Git pull command in git bash to download the corpus for running ami section and ami search.

git pull path.git

This command showed error in my windows . So, i used command prompt for cloning the repository.

PMR SUGGESTION: To Start working on a small corpus to make things easy and avoid the time wastage.

Softwares used:

  1. GETPAPERS for downloading corpus.

  2. AMI for creating dictionary and sectioning the corpus.

  3. KNIME for data extraction and Binary classification.

  4. KNIME, R for analysis.

  5. Jupyter notebooks

Not started 🟫 :

****Usage of knime and R

****Usage of jupyter notebooks

****Language variants

Started 🟨 :

****Ami section for sectioning of corpus and Ami search for searching the tests in the corpus.

****Dictionary modification. Annotating the corpus

****Adding more attributes to the dictionary.

BLOCKED 🟥 :

**** PREVIOUS BLOCKER(Solved now)

Ami search , The Testing and tracing being very rarely mentioned in the papers, so it is not searching the tests, hence, not getting the data tables. But trying the same in the corpus and searching for funders or countries is giving the results.

Tried ami search on the corpus of 100 and 150 as well but data tables still empty.

**PMR: I agree. This is a hard search. I think we need to collect terms iteratively from Wikipedia, from papers and gradually build a multi-term query. Use Wikipedia's "Contact tracing" as a good source of words and phrases. Currently, not blocked on anything.

FINISHED 🟩 :

**** Binary classification for EpidemicnoCov50

**** Creating a Corpus of 250 papers.

**** Dictionary:

**** AMI sectioning.

**** Ami search: Results in the form of data tables.

**** Multi dictionary search against a small corpus. The dictionaries Funders, drugs, test trace, diseases and country searched against a corpus. Got the data tables and cooccurence graphs.

**** Dictionary validation.

**** Corpus commit by PMR.

**** Annotation of the 50 papers with 12 false positives and 38 true positives.

Clone this wiki locally