GitHub - enriquecallejascastro/arXiv-scrape: Python tools for scraping arXiv search results.

Why Scrape arXiv?

Researchers often need to review the latest scientific literature. While this process can be done manually by performing searches on search engines such as Google Scholar, it is recommended to follow a systematic procedure to obtain the most relevant evidence. To automate this process, we provide two functions for scraping arXiv search results.

Hope this helps!

Code description

This code consists of two functions for scraping and downloading papers from arXiv. The functions are as follows:

scrape_arxiv(url, output_filepath='output.xlsx'): This function takes a search strategy URL for arXiv and scrapes the search results. It retrieves the titles, authors, abstracts, and PDF links for each article and returns a pandas DataFrame with this information. The search results are also saved in an Excel file specified by output_filepath (default is 'output.xlsx').
download_papers(links_list, path='papers'): This function takes a list of PDF links and downloads the corresponding papers. The downloaded files are saved in a folder specified by path (default is 'papers').

contact: enriquecallejascastro@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md
arXiv-scrape.ipynb		arXiv-scrape.ipynb
example.xlsx		example.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

arXiv-scrape.ipynb

arXiv-scrape.ipynb

example.xlsx

example.xlsx

Repository files navigation

Why Scrape arXiv?

Code description

About

Releases

Packages

Languages

enriquecallejascastro/arXiv-scrape

Folders and files

Latest commit

History

Repository files navigation

Why Scrape arXiv?

Code description

About

Topics

Resources

Stars

Watchers

Forks

Languages