Classification-of-PubMed-documents-using-machine-learning

This project involves building a robust classifier that classifies whether a document (from abstract content) belongs to cancer class or not.

Dataset Description The training as well as test data contains research papers abstract in .nxml format. Training data contains two folders

Cancer :- Contains document related to cancer
Non Cancer: - Contains document not related to cancer. It contains document related to any category apart from cancer, spanning from music, videos to HIV and stroke. Test data contains 100 files with names ranging 1 to 100.nxml. Output should contain labels in below format.

Prerequisition

pip install bs4
pip install html2text
pip install tqdm
pip install xml
pip install nltk
pip install numpy
pip install sklearn

Install xgboost

git clone --recursive https://github.com/dmlc/xgboost
cd xgboost
make -j4
cd python-package
python setup.py install

Error(import xgboost OSError:version `GOMP_4.0' not found)

conda install libgcc

Links

dmlc/xgboost#1786

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Dataset		Dataset
util		util
CreateDataSet.py		CreateDataSet.py
LICENSE		LICENSE
README.md		README.md
cancer_classification.ipynb		cancer_classification.ipynb
model.py		model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset

Dataset

util

util

CreateDataSet.py

CreateDataSet.py

LICENSE

LICENSE

README.md

README.md

cancer_classification.ipynb

cancer_classification.ipynb

model.py

model.py

Repository files navigation

Classification-of-PubMed-documents-using-machine-learning

Prerequisition

Install xgboost

Error(import xgboost OSError:version `GOMP_4.0' not found)

Links

About

Releases

Packages

Contributors 2

Languages

License

erayon/PubMed

Folders and files

Latest commit

History

Repository files navigation

Classification-of-PubMed-documents-using-machine-learning

Prerequisition

Install xgboost

Error(import xgboost OSError:version `GOMP_4.0' not found)

Links

About

Topics

Resources

License

Stars

Watchers

Forks

Languages