Skip to content

Documentation: Drug

Pruthiv Rajan edited this page Oct 24, 2020 · 24 revisions

Remdesivir chloroquine C7H5NS2

OWNER: PRUTHIV RAJAN

COLLABORATORS : ISRAEL EBHOHIMEN, URJA BISWAS.


Introduction:

A viral epidemic poses a threat to human life and the last resort to tackle it, on which the entire human race relies on, is the availability of drugs. Each drug has its own synonyms and it is called in different name in different parts of the world. With varieties of drugs available to us in different names, for example anti-viral drugs, palliative drugs and antibacterial drugs and so on, it has become difficult to track the role of the drug. While some drugs are proving to be effective antiviral there are drugs that only cure the symptoms. Hence, this miniproject provides the necessary tools that would provide complete information at one place to the public.

Objective of this drug mini project is to To create a dictionary consisting of drugs for diseases and its local name in different countries. Differentiate drugs that work directly against the pathogen and the drugs that only work on symptoms.

Methods and Methodology:

Getpapers:

Get papers is a freely available tool which runs in command prompt. It collects all freely available research papers in full text and xml format to your local machine.The command getpapers will initiate the process and -q refers to the query which is to be searched. The query is entered in inverted commas as is done in "antiviral drugs". The next element is -o which refers to output directory and the parameter that follows it in the name of the directory which is drug_corpus in our case. Then, -x -p corresponds to xml and pdf files to be included in our search and -k 1000 limits our search to 1,000 files only.

Installation

Get papers used to create corpus of viral epidemic in drug

general code: getpapers -q <"project title "> -o <file name> -x -p -k <number of papers requied>

project code:

             getpapers -q "viral epidemics and antiviral drugs" -o drug -x -p -k 800

Project code helps to build corpus of 750+ research articles with full text and xml file

ami:

ami is freely available software which is used to scrap and annotate research papers.

Installation

ami section:

Ami section which is used to section the research papers into the front, body, back ,floats and groups. Sectioning of downloaded files will create a tree structure for us which will help in exploring the content of the file. Sectioning done using section function of ami .Which runs on command prompt.

General code:ami -p <cproject> section

Project code:

            ami –p drug section

ami search:

Ami search which search and analysis the terms in your project repository and gives the frequency is terms and the histogram of your corpus.

General code:ami –p <cprooject> search –dictionary <path>

Project code:

             ami -p drug search --dictionary dict/country dict/disease dict/drug dict/virus dict/funder dict/testTrace dict/npi dict/zoonosis

(this code finds the country,disease,drug,virus,funder,test and traces,non pharmaceutical intervention,zoonosis mentioned in the research papers)

SPARQL:

“SPARQL Protocol and RDF Query Language”, enables users to query information from databases or any data source that can be mapped to RDF. SPARQL WIKIDATA is used to collect data from Wikipedia data source in order to build dictionary. This query tells us the drug name, drug alternative name, drug formula, drug picture and the referral url in English, Hindi, Tamil, Sanskrit, Urdu, Spanish, Portuguese , Hausa, German.

SELECT ?wikidata ?wikidataLabel ?wikipedia ?wikidataAltLabel ?wikidataDescription ?wikidataformule ?wikidatapicture ?hindi ?hindiLabel ?hindialtlabel ?hindiwikipedia ?tamil ?tamilLabel ?tamilaltlabel ?tamilwikipedia ?sanskrit ?sanskritLabel ?sanskritaltLabel ?sanskritwikipedia ?spanish ?spanishLabel ?spanishaltLabel ?spanishwikipedia ?urdu ?urduLabel ?urdualtLabel ?urduwikipedia ?portuguese ?portugueseLabel ?portuguesealtLabel ?portuguesewikipedia ?hausa ?hausaLabel ?hausaaltLabel ?hausawikipedia ?german ?germanLabel ?germanaltLabel ?germanwikipedia WHERE {
  ?wikidata wdt:P31 wd:Q12140;
    wdt:P274 ?wikidataformule;
    wdt:P117 ?wikidatapicture.
   OPTIONAL { ?wikipedia schema:about ?wikidata; schema:isPartOf <https://en.wikipedia.org/> }
   OPTIONAL { ?hindiwikipedia schema:about ?wikidata; schema:isPartOf <https://hi.wikipedia.org/> }
   OPTIONAL { ?tamilwikipedia schema:about ?wikidata; schema:isPartOf <https://ta.wikipedia.org/> }
  OPTIONAL { ?sanskritwikipedia schema:about ?wikidata; schema:isPartOf <https://sa.wikipedia.org/> }
  OPTIONAL { ?spanishwikipedia schema:about ?wikidata; schema:isPartOf <https://es.wikipedia.org/> }
  OPTIONAL { ?portuguesewikipedia schema:about ?wikidata; schema:isPartOf <https://pt.wikipedia.org/> }
  OPTIONAL { ?hausawikipedia schema:about ?wikidata; schema:isPartOf <https://ha.wikipedia.org/> }
  OPTIONAL { ?germanwikipedia schema:about ?wikidata; schema:isPartOf <https://de.wikipedia.org/> }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en".

## Selecting the prefered label 
    ?wikidata skos:altLabel ?wikidataAltLabel ; rdfs:label ?wikidataLabel; schema:description  ?wikidataDescription          
  } 
   SERVICE wikibase:label {
    bd:serviceParam wikibase:language "hi".
## Selecting the prefered label
    ?wikidata skos:altLabel ?hindialtlabel .
    ?wikidata rdfs:label ?hindiLabel .
    ?wikidata schema:description ?hindi ;
  } 
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "ta".
## Selecting the prefered label
    ?wikidata skos:altLabel ?tamilaltlabel .
    ?wikidata rdfs:label ?tamilLabel .
    ?wikidata schema:description ?tamil ;
  } 
SERVICE wikibase:label {
    bd:serviceParam wikibase:language "sa".
## Selecting the prefered label
    ?wikidata skos:altLabel ?sanskritaltlabel .
    ?wikidata rdfs:label ?sanskritLabel .
    ?wikidata schema:description ?sanskrit ;
  } 
SERVICE wikibase:label {
    bd:serviceParam wikibase:language "es".
## Selecting the prefered label
    ?wikidata skos:altLabel ?spanishaltlabel .
    ?wikidata rdfs:label ?spanishLabel .
    ?wikidata schema:description ?spanish ;
  } 
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "ur".
## Selecting the prefered label
    ?wikidata skos:altLabel ?urdualtlabel .
    ?wikidata rdfs:label ?urduLabel .
    ?wikidata schema:description ?urdu ;
  } 
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "pt".
## Selecting the prefered label
    ?wikidata skos:altLabel ?portuguesealtlabel .
    ?wikidata rdfs:label ?portugueseLabel .
    ?wikidata schema:description ?portuguese  ;
  } 
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "ha".
## Selecting the prefered label
    ?wikidata skos:altLabel ?hausaaltlabel .
    ?wikidata rdfs:label ?hausaLabel .
    ?wikidata schema:description ?hausa ;
  } 
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "de".
## Selecting the prefered label
    ?wikidata skos:altLabel ?germanaltlabel .
    ?wikidata rdfs:label ?germanLabel .
    ?wikidata schema:description ?german ;
  } 
}

Once results obtained. End sparql point from links and save the file in.xml extension.

SPARQL Mapping:

Sparql mapping is used to validate in ami formate using ami dict.

Code:

      amidict -vv --dictionary drug --directory dic --input sparql_drug9.xml create --informat wikisparqlxml --sparqlmap wikidataURL=wikidata,wikipediaURL=wikipedia,altNames=wikidataAltLabel,name=wikidataLabel,term=wikidataLabel,description=wikidatadescription,formulae=wikidataformule,picture=wikidatapicture,Hindi=hindiLabel,Hindi_description=hindi,Hindi_altNames=hindiAltLabel,Tamil=tamilLabel,Tamil_description=tamil,Tamil_altNames=tamilAltLabel,Urdu=urduLabel,Urdu_description=urdu,Urdu_altNames=urduAltLabel,Sanskrit=sanskritLabel,Sanskrit_description=sanskrit,Sanskrit_altNames=sanskritAltLabel,Spanish=spanishLabel,Spanish_description=spanish,Spanish_altNames=spanishAltLabel,Portuguese=portugueseLabel,Portuguese_description=portuguese,Portuguese_altNames=portugueseAltLabel,Hausa=hausaLabel,Hausa_description=hausa,Hausa_altNames=hausaAltLabel, German=germanLabel,German_description=german,German_altNames=germanAltLabel --transformName wikidataID=EXTRACT(wikidataURL,.*/(.*)) --synonyms=wikidataAltLabel

Result and disscussion:

Getpapers:

Collected freely available papers from EUROPMC. Once getpapers command executive, #Time taken 2:00 mins.

getpapers

FIGURE 1: OUTPUT OF GETPAPERS FULL RESULTS

AMI section:

Results of ami section. It sections the papers in the directory.#Time taken 1:30 mins.

amisection

FIGURE 2: OUTPUT OF AMI SECTION

AMI search:

Results are in the form of table , histogram and in the each folder results. #Time taken 1:00 mins.

amisearchfolder

FIGURE 3: OUTPUT OF AMI SEARCH IN FOLDER

amifullsearchtable

FIGURE 4: OUTPUT OF AMI SEARCH IN TABLE WITH FREQUENCY FULL RESULTS

allplots

FIGURE 5: ALL PLOTS OF .SVG FILE FULL RESULTS

SPARQL:

Results will be displayed. It contains Wikidata Id,molecular name,molecular formula,compound picture,molecular Alt Label,molecular description,wikipedia link(English,தமிழ்,हिन्दी,اردو,spanish,sanskrit,español,Português,Deutsche,Hausa).#Time taken less than a min.

SPARQL

FIGURE 5: RESULTS OF SPARQL

SPARQL Mapping:

After end sparql point from links and save the file in.xml extension at local machine. File looks like. #Time taken 1:00 mins.

SPARQL.XML

FIGURE 6: SPARQL.XML OUTPUTFULL RESULTS

AMI DICT:

Refines the sparql out. Output looks like. #Time taken 1:00 mins.

SPARQL MAPPING

FIGURE 7: ami dict OUTPUT FULL RESULTS

# DISCLAMIER TIME TAKEN DEPENDS ON THE NETWORK

PAGE UNDER CONSTRUCTION

Clone this wiki locally