Skip to content

enriquecallejascastro/arXiv-scrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 

Repository files navigation

Why Scrape arXiv?

Researchers often need to review the latest scientific literature. While this process can be done manually by performing searches on search engines such as Google Scholar, it is recommended to follow a systematic procedure to obtain the most relevant evidence. To automate this process, we provide two functions for scraping arXiv search results.

Hope this helps!



Code description

This code consists of two functions for scraping and downloading papers from arXiv. The functions are as follows:

  1. scrape_arxiv(url, output_filepath='output.xlsx'): This function takes a search strategy URL for arXiv and scrapes the search results. It retrieves the titles, authors, abstracts, and PDF links for each article and returns a pandas DataFrame with this information. The search results are also saved in an Excel file specified by output_filepath (default is 'output.xlsx').

  2. download_papers(links_list, path='papers'): This function takes a list of PDF links and downloads the corresponding papers. The downloaded files are saved in a folder specified by path (default is 'papers').



contact: enriquecallejascastro@gmail.com

Releases

No releases published

Packages

No packages published