Skip to content

Latest commit

 

History

History
11 lines (8 loc) · 748 Bytes

File metadata and controls

11 lines (8 loc) · 748 Bytes

Google sholars and ACM digital library crawler by beautiful suop

Google_sholars and ACM digital library crawler for collecting papers title, authors name, abstract and years

Instruction:

  1. clone the repository
  2. Run code :D

warning

Both google scholars and ACM digital Library will block IP after sending so many requests. In my experience, I got blocked from ACM after a 4-5k request. Therefore, I had to run a yearly code basis with a small range of pagination. I did my job done. was able to scrape 20k+ data.

But for google scholar, I got block after 1000 requests. I put a sleep code to control each loop run time. However, this problem will continue. I would suggest using multiple computers or different IPs after some intervals.