Skip to content

miniproject: viral epidemics and drugs

Pruthiv Rajan edited this page Sep 20, 2020 · 48 revisions

alt text What drugs are used in viral epidemics?

Owner:

Pruthiv Rajan K

Collaborator:

Urja Biswas

Initial Summary

Submitted by - Urja Biswas

alt text Why this Mini-project?

A viral epidemic poses a threat to human life and the last resort to tackle it, on which the entire human race rely on, is the availablity of drugs. With varieities of drugs available to us, for example anti-viral drugs, palliative drugs, antibacterial drugs and so on, it has become difficult to track the role of the drug. While some drugs are proving to be effective antiviral there are drugs that only cure the symptoms. Hence, this miniproject provides the necessary tools that would provide complete information at one one place to the public.

Objectives :

  • To create a dictionary consisting of drugs for diseases and its local name in different countries.
  • Differentiate drugs that work directly against the pathogen and the drugs that only work on symptoms.

Tools :

  • getpapers to obtain research papers. To know more about getpapers click here.
  • ami for sectioning paper. To know more about sectioning via ami click here.
  • ami/SPARQL for the creation of dictionaries. Click to know about dictionary creation via ami and sparql.
  • nodejs, nvm, cmd, maven, jdk, git are backbone of the above mentioned tools without which commands cannot be executed.
  • Python, R and related software for used data analysis.

alt text Corpus950

This comprises of top 950 articles retrieved from EuroPMC which is a platform that provides free acces to million of articles related to biomedical science. getpapers will be used to search and download these articles. The processing of the software is very quick (approx 10 minutes) which which when downloaded individually could have taken 'n' number of hours. In this project such articles will be downloaded to study drugs related to viral epidemics.

The other advantage of this project is we dont have to go through all the papers to understand the gist instead with the help of ami : section the papers are tabulated/sectioned and one can go through all without reading them all and select the ones that is of interest.

Course of Action :

For this purpose, the following command will be executed :

getpapers -q "antiviral drugs" -o dir_corpus950 -x -p -k 1000

The command getpapers will initiate the process and -q refers to the query which is to be searched. The query is entered in inverted commas as is done in "antiviral drugs". The next element is -o which refers to output directory and the parameter that follows it in the name of the directory which is dir_corpus950 in our case. Then, -x -p corresponds to xml and pdf files to be included in our search and -k 1000 limits our search to 1,000 files only. After successful completion of the command we get our Corpus-950 ready.

Sectioning of downloaded files will create a tree structure for us which will help in exploring the content of the file. Sectioning done using section function of ami. The command executed will be : ami -p dir_corpus950 section

alt text Drugs Dictionary

Wikidata is an open-source database containing information stored semantically. Thus, applications of softwares could help us relate various properties of similar drugs and use it constructively. So, we will be creating a dictionary with an attempt to simplify the downloaded 950 articles and relate drugs with its usage. This would require application of 'machine learning' to achieve the same.

Course of Action :

When all the files are downloaded and succesfully stored in our dir_corpus950 directory, our next task is to generate the dictionary using ami tools. We will use the follwing command:

ami -p dir_corpus950 search --dictionary drugs

ami will initiate the ami function and -p will set the path to dir_corpus950. Then search --dictionary drugs will search drugs from the dictionary and create an html file consisting of the data in a tabular form.

alt text Goals

  1. To differentiate false positive and true positive, i.e to find papers that are related to "viral epidemic and drugs" and remove unnecessary ones.
  2. To find relationship between drugs, this may help in suggesting alternative drug for an epidemic.
  3. To segregate drugs that take action against the virus and the drugs that suppresses only the symptoms.
  4. To maintain a simplified drug dictionary that would be open to public.

Miniproject Summary

alt text Objective :

  • The goal is to find out "the drugs which are used to treat viral diseases and their local name in different countries "
  • Insight many drugs has been reported that they are used to "treat only the symptoms caused by viral infection or diseases than viral infection". In an account of it studying the purpose of drugs and their drugs action.

alt text Methodology :

  1. Creating dictionaries
  2. Binary classification
  3. Sectioning
  4. Finding local drugs in countries
  5. Drug action

alt text Corpora :

  1. A communal corpus called epidemic50noCov of 50 articles for viral epidemics is created.
  2. Expanding our search we create a new corpus consisting of 950 papers.
  3. Using software tools we create a dictionary for our corpus.

alt text Dictionaries :

  1. A test drug dictionary was created using a list of 10 viral drugs.
  2. List of FDA approved drugs has been updated. (Refer Here)

alt text Softwares :

  • ami for dictionaries and sectioning.
  • ami/SPARQL for the creation of dictionaries.
  • Python, Rrelated software for data analysis.

alt text Commiting Corpus :

To commit corpus via git we use the following commands :

C:\Users\admin\openVirus\miniproject\drug>git status

C:\Users\admin\openVirus\miniproject\drug>dir

C:\Users\admin\openVirus\miniproject\drug>git add *

C:\Users\admin\openVirus\miniproject\drug>git status

C:\Users\admin\openVirus\miniproject\drug>git commit -am "first commit all corpus"

C:\Users\admin\openVirus\miniproject\drug>git pull

C:\Users\admin\openVirus\miniproject\drug>git push

This will initiate redirection of the page to login to your GitHub account. Successful execution of the mentioned commands will commit the files.

alt text Constraints :

The pharmaceutical drugs are listed here with their INN (International Non-proprietary Name) name instead of biological/natural name.


alt text Update :


alt text STARTED :

  1. Multi Linguistics (English,हिन्दी, தமிழ்) dictionary using SPARQL.
  2. Binary classification using NLP.

alt text FINISHED :

  1. Use the communal corpus of 50 articles on viral epidemics.
  2. Manual classification of communal corpus of 50 articles on viral epidemics.
  3. Creating corpus of 250 on antiviral drugs.
  4. Manual classification of corpus of 250 articles on antiviral drugs.
  5. ami search and section been used in corpus of 250 articles on antiviral drugs.
  6. Corpus and their ami search, section results were committed to git.
  7. Updating ami
  8. Created FDA approved drug dictionary by amidict -v --dictionary drug --directory drug --input drug.txt create --informat list --outformats xml,html. results : https://github.com/petermr/openVirus/blob/master/dictionaries/FDA%20Drug/drug.xml
  9. Created dictionary using SPARQL wikidata query. Reference https://github.com/petermr/openVirus/wiki/Dictionary:-Drugs
  10. SPARQL wikiata query results : https://github.com/petermr/openVirus/blob/master/dictionaries/drug/drugs.sparql.xml
  11. Drugs with high occurrence in corpus are Ribavirin,Oesltamivir. Ribavirinare are mostly used check their activity, inhinitor, antiviral, HCV, herpes,hepitites. Oesltamivir used as antiviral drug, pandemic drug, replication, virus and host interaction.

alt text NOT STARTED :

  1. Smoke test.

alt text BLOCKED : Creating table from corpus for NLP to do binary classification.

Clone this wiki locally