PMC-text-mining

The aim of this project is to use the text from biomedical and life science literature to gain insights on research topic trends over time. The data is extracted from the text mining collections made available by the PubMed Central (PMC) archive, an archive of biomedical and life sciences journal literature at the U.S. National Institutes of Health's National Library of Medicine. Through dynamic topic modeling, I have discovered the underlying themes of the text collections and observed some interesting changes over time. Using this method, I can potentially automate the processes of organizing, searching, indexing, and browsing large document collections.

You can start by reading main_analysis.ipynb in the code folder, which contains the executive summary and the actual code throughout the project; or you can go over the presentation slides (presentation.pdf). The data folder includes two subsets from the original text mining collections.

This is my Capstone project for the Data Science Immersive program at General Assembly (Washington, DC).

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
code		code
data		data
visualization		visualization
.gitignore		.gitignore
README.md		README.md
presentation.pdf		presentation.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data

data

visualization

visualization

.gitignore

.gitignore

README.md

README.md

presentation.pdf

presentation.pdf

Repository files navigation

PMC-text-mining

About

Releases

Packages

Languages

Ailuropoda1864/PMC-text-mining

Folders and files

Latest commit

History

Repository files navigation

PMC-text-mining

About

Topics

Resources

Stars

Watchers

Forks

Languages