Skip to content

saroarjahan/Google_sholars_ACM_digital_library_crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google sholars and ACM digital library crawler by beautiful suop

Google_sholars and ACM digital library crawler for collecting papers title, authors name, abstract and years

Instruction:

  1. clone the repository
  2. Run code :D

warning

Both google scholars and ACM digital Library will block IP after sending so many requests. In my experience, I got blocked from ACM after a 4-5k request. Therefore, I had to run a yearly code basis with a small range of pagination. I did my job done. was able to scrape 20k+ data.

But for google scholar, I got block after 1000 requests. I put a sleep code to control each loop run time. However, this problem will continue. I would suggest using multiple computers or different IPs after some intervals.

About

Google_sholars and ACM digital library crawler for collecting papers title, authors name, abstract and years

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published