Skip to content

miniproject: viral epidemics and organization

ShweataNHegde edited this page May 10, 2021 · 1 revision

What organizations fund research on viral epidemics?

Owner:

Vaishali Arora, Shweata N. Hegde


Collaborators:

Simranleen Singh


Mini project summary:

  • The scientific objective is to find out which are the most active organization to Viral Epidemic research.
  • To retrieve valuable information about them from the Scientific Literature.

Methodology: 📌

  • Using the communal corpus Epidemic50noCov on 50 articles. 🟩DONE

  • Subjecting them to binary Classification based on various parameters- related to viral epidemic or not, funders named or not and so on. 🟩DONE

  • Rerunning the query to get a corpus of 950 articles. 🟩DONE

  • Work on sectioning to filter the module of Acknowledgement or Funding in the paper, as it is the sole part of a scientific paper where Funders are more likely to occur. 🟩DONE

  • Creating dictionary Funder using ami and SPARQL/Wikidata Query Service. 🟩DONE

  • Using Machine Learning tools for entity extraction so that we can look for particular and very specific phrases, words and regex in those scientific papers. 🟪NOT STARTED

  • Subjecting the spreadsheets to analysis in order to find which funders are the most active. 🟪NOT STARTED


Corpora: 📂

❓ How I committing my corpus 950 :

Scroll down and see the section committing the corpus 950 to github.


Dictionaries:

Dictionary update: 🆕

  • Updated on: September 18, 2020

  • Source: Crossref

  • Number of entries: ~17k

  • Method: SPARQL/Wikidata Query Service

  • Attributes in there: term, name, description, WikdataID, wikidataURL, wikipedia URL, crossrefID, country, synonyms

  • SPARQL query used:

#Funders
 SELECT DISTINCT ?Funder ?FunderLabel ?FunderDescription ?FunderAltLabel ?Country ?CountryLabel ?instanceofLabel ?crossrefid ?wikipedia WHERE {
   ?Funder wdt:P3153 ?crossrefid;
     wdt:P31 ?instanceof;
     wdt:P17 ?Country.
   OPTIONAL { ?wikipedia schema:about ?Funder; schema:isPartOf <https://en.wikipedia.org/> }
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
 }
 LIMIT 20000000

Syntax used:

amidict -vv --dictionary funders --directory mydictionaries --input funder.sparql.xml create --informat wikisparqlxml --sparqlmap wikidataURL=Funder,term=FunderLabel,name=FunderLabel,country=CountryLabel,crossrefid=crossrefid,description=FunderDescription,wikipediaURL=wikipedia,wikidataURL=Funder --transformName wikidataID=EXTRACT(wikidataURL,.*/(.*)) --synonyms=wikidataAltLabel

Final dictionary: https://github.com/petermr/openVirus/blob/master/dictionaries/funders/sparql2ami/funder.xml

Refined sparql query

Click,here and it will take you to wikidata sparql query service.

Updated syntax to create the dictionary

amidict -vv --dictionary organization --directory _sparlendpoint  --input sparql_organization.xml create --informat wikisparqlxml --sparqlmap wikidataURL=Organization,term=OrganizationLabel,name=OrganizationLabel,country=CountryLabel,crossrefIDs=crossrefIds,description=description,wikidataURL=Organization --transformName wikidataID=EXTRACT(wikidataURL,.*/(.*))

Dictionary validation : ✅

amidict --dictionary C:\Users\myPC\mydictionaries\funders(1).xml -v display --fields --validate

Generic values (DictionaryDisplayTool)
================================
--testString        : d      null
--wikilinks         : d [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@1ae7dc0
--fields            : m        []
--files             : d        []
--maxEntries        : d         3
--remote            : d [https://github.com/petermr/dictionary]
--validate          : m      true
--help              : d     false
--version           : d     false
--dictionary        : d [C:\Users\myPC\mydictionaries\funders(1).xml]
--directory         : d      null

Specific values (DictionaryDisplayTool)
================================
list all fields
dictionaries from C:\Users\myPC\ContentMine\dictionaries

❓ Result : I checked the folder dictionaries as suggested in the above path. This folder was empty, Should I do something else or the software is built that way ?


Tools & Softwares: 🛠

1. ami for the creation of dictionary, and sectioning : 🟩DONE

  • To download my corpus of 950 articles in XML format in the directory mini project:

  • Open the Command Prompt and give the syntax:

        `getpapers -q "Funders in viral epidemic research" -o miniproject -f mycorpus/log.txt -k 950 `
    
  • To divide the CProject into sections, again open the Command Prompt and give the syntax in the Command prompt:

           `ami -p miniproject section`
    
  • This will create a subfolder of sections in each folder of the scientific paper that is there in your directory.

  • Open the folder sections, you will get subfolders as - Front, Body, Back, etc.

  • This completes the sectioning of my Cproject.

2. ami searching, full.data.tables (https://github.com/petermr/openVirus/blob/master/miniproject/funder/full.dataTables.html) and _cooccurrence created for the dictionary funder (https://github.com/petermr/openVirus/tree/master/miniproject/funder/_cooccurrence) 🟩DONE

3. Jupyter Notebook for machine learning and data mining. 🟨STARTED

4. Later, R for analysis and to display the results graphically. 🟪NOT STARTED


Releasing the corpus 950 using Github desktop : 🟩DONE

  • Installed Github desktop from : https://desktop.github.com
  • Cloned the repository openVirus into my system using Gitbash command line : git clone https://github.com/petermr/openVirus.git
  • Open the folder where you want to upload your CProject.
  • Paste your project to the folder in openVirus repository(our remote repository) where you want to commit the files.
  • Open the Github desktop.
  • Go to 'File', then 'Add Local Repository'.
  • Now, choose the openVirus repository from your system.
  • Add a commit message and go to 'Commit to master'.
  • After committing, go to 'Push to origin'.
  • After completion of pushing the repository, your uploaded files can be viewed on the Github repository.

💡 Tip: Committing the corpus in parts of five will make the uploading easy.


Updating ami : 🟩DONE

  • Open command prompt and type :

     `cd ami3`
     `git pull`
     `mvn clean install -Dmaven.test.skip=true `
    
  • Wait for some time till the command runs.

  • A BUILD SUCCESS message comes out in the command prompt.

💡 Tip: If you are getting BUILD FAILURE, then close the other command prompt if it is open on your system.


Blockers: 🚫


Software usage: 🔗

  1. Core softwares:
  • Node
  • getpapers
  • Java jdk
  • Maven
  • ami
  1. Optional softwares:
  • KNIME
  • R graphics
  • Jupyter Notebook
  • Github desktop

NOT STARTED:🟪

  • Binary classification of Corpus 950 into True and False positives using different libraries in Python.

STARTED:🟨

  • Working on usage ofJupyter Notebookby looking into tutorials on the internet
  • Maintaining the dictionary FUNDERS so that merging of the dictionaries could become easier

BLOCKED:🟥

FINISHED:🟩

  • Creating the corpus 950
  • Ami search on the corpus 950
  • Sectioning the corpus 950
  • Creating the ami dictionary funder
  • Creating the SPARQL dictionary funder
  • Manual binary classification of corpus 50 "EpidemicnoCov50"
  • Corpus 950 released
  • Dictionary funder released
  • Dictionary validation using ami
  • Classifying first 50 papers from corpus 950 into True and False positives
  • Smoke test on Jupyter Notebook
  • Jupyter Notebook to create dictionary from a text file of funders


Summary:----

Submitted by-

Simranleen Singh


Introduction:

  • Under this project we are collecting useful data from authentic and global websites which are easily accessible and tabulating data so that it is clear to all that visits it.
  • My miniproject is on Viral Epidemic and funders. So, It will be dealing with all the Funders from all over the world that provide funds to viral epidemic.

Preliminary work:

  • My work would be first downloading useful software which will provide me easy access to the data which i am looking for and i will be able to download it and seggregate it whether it is useful to me or not.
  • initially I have to installed node for the framework of installing other softwares.
  • One of them is getpapers using the link and information provided by my mentor(given below in the reference) Reference : https://github.com/petermr/tigr2ess/blob/master/installation/INSTALLATION.md

Installation of getpapers:

Getpapers is a necessary software for this project as we have to download papers(of our need and subject) several paper in one go and here Getpapers helps us downloading that.


Blockers:

Installing getpapers.


Current work

Currently I am maintaining the dictionary of funders manually till my issue get solved.


Clone this wiki locally