Skip to content

Ailuropoda1864/PMC-text-mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PMC-text-mining

The aim of this project is to use the text from biomedical and life science literature to gain insights on research topic trends over time. The data is extracted from the text mining collections made available by the PubMed Central (PMC) archive, an archive of biomedical and life sciences journal literature at the U.S. National Institutes of Health's National Library of Medicine. Through dynamic topic modeling, I have discovered the underlying themes of the text collections and observed some interesting changes over time. Using this method, I can potentially automate the processes of organizing, searching, indexing, and browsing large document collections.

You can start by reading main_analysis.ipynb in the code folder, which contains the executive summary and the actual code throughout the project; or you can go over the presentation slides (presentation.pdf). The data folder includes two subsets from the original text mining collections.

This is my Capstone project for the Data Science Immersive program at General Assembly (Washington, DC).

About

The aim of this project is to use the text from biomedical and life science literature to gain insights on research topic trends over time.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published