Skip to content

skhadka007/JMLR_Python_Web_Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JMLR Python Web Scraper

  • Web scraper built using Python 3 & the beautifulsoup4 web scraper library.
  • Built specifically for The Journal of Machine Learning Research website/publication
  • CURRENT STATUS: In progress.

Installation

Use the package manager pip to install packages.

pip install beautifulsoup4
pip install lxml
pip install requests
pip install unidecode

Usage

  • JMLR_scraper_FULL.py : For the entire scrape.
    • Titles, Abstracts, Abstract URLs, Authors, Keywords, Affiliations, Month of Publication, Volume URL, Journal Name, Year of Publication, Volume List, and Issue List.
  • JMLR_scraper_VolumeX_abstract.py: Use to scrape just the abstracts of specific volumes(x).
  • JMLR_scraper_VolumeX_abstractURL: Use to scrape just the abstracts URLs of specific volumes.
  • Similar usage for the rest of the individual scrapers.
python JMLR_scraper_FULL.py
  • Output is written onto csv files in the same directory as the program file:
JMLR_Volume_1.csv
JMLR_Volume_2.csv
...
etc.

Contributing

Created by Santosh Khadka skhadka.code@gmail.com 

Pull requests are welcome.

License

MIT

About

Web scraper built using Python3 and beautifulsoup4. Built to scrape The Journal of Machine Learning Research.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages