Skip to content
/ CLASP Public

COVID-19 Literature Analysis and Summarization Platform

License

Notifications You must be signed in to change notification settings

kausko/CLASP

Repository files navigation

CDAC Hackathon

COVID-19 LITERATURE ANALYSIS AND SUMMARIZATION PLATFORM

Steps to run the project:

  • Requirements:

    1. python: version 3.8.x
    2. yarn: version 1.22.x
    3. node: version 12.16.x
    4. pip and virtualenv.
  • Steps:

    1. Clone the repository using
      git clone https://github.com/tanmaypardeshi/CDAC-Hackathon.git
    2. Download the glove folder from the google drive link provided above and save it in the project directory.
    3. Download all the other csv and json files from the google drive link and store it in the data folder of the project directory.
    4. Use command virtualenv venv in project directory to create virtualenv.
    5. Use source venv/bin/activate to activate virtualenv.
    6. For the first time, use pip install -r requirements.txt in project directory to install all dependencies
      This will only be for the first time to install the packages.
    7. Navigate to the frontend folder and run yarn install for the first time to install all javascript dependencies for React.
    8. To run the flask server use python run.py in the project directory.
    9. Navigate to the frontend folder and run yarn start to start development server and use the platform while keeping the flask server running as well.
    10. Use deactivate to deactivate virtualenv.

Documentation about the files in the repository

(Click on the links to open the folder)

1. glove: Embeddings used to perform text summarization and information retrieval for Real Time Research News.

2. summariser.py: Makes use of the TextRank algorithm to summarize the input Biomedical Text.

3. ir_author.py: Makes use of levenshtein distance to generate a similarity score between the author based query and documents

4. ir_title.py: Makes use of levenshtein distance and keyword indexing to generate a similarity score between the title based query and documents

5. ir_optimised.py: Makes use of levenshtein distance and keyword indexing along with a keywords pickle file to generate a similarity score between the author based query and documents

6. news.py: Makes use of scraping techniques to retrieve unstructured COVID-19 research news from the internet and makes use of info retrieval to display relevant results on the basis of a query.

7. Q&A_CDQA_Finetuning.py: The script written to fine-tune BERT with respect to a subset of CORD-19 dataset

8. Anomaly_detection.py: The script written to fine-tune BERT with respect to a subset of CORD-19 dataset

9. qna.joblib Trained model which predicts answers on the basis of the question query

10. ir_old.csv Dataset created by using CORD-19 data for Information Retrieval

Research papers referred while working for the project:

(Click on the links to open the research paper)

Snippets of the platform:

  • Welcome modal

1.png 2.png

  • News

3.png 4.png

  • Login and Signup

5.png 6.png

  • Summarization and My Summaries

10.png 11.png 16.png 14.png

  • Information Retrieval and My Bookmarks

8.png 9.png 7.png 15.png

  • Q & A and My Questions

19.png 20.png 13.png

  • Anomaly Detection Map

17.png 18.png