Skip to content

CIRCLECI-GWP/nodejs-cheerio-web-scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web scraping with Nodejs and Cheerio

CI pipeline for a web scraper built with Nodejs and Cheerio

Install dependencies

npm install

Run scraper

npm start

Then head to a HTTP client (like Postman, Insomnia, or Hoppscotch), enter the endpoint "/scrape", enter a request body like the one below (this app is limited to URLs with the base URL as https://www.amazon.com) and run the request.

{
  "url": "https://www.amazon.com/s?k=all+headphones&crid=2TTXQBOK238J3&qid=1667301526&sprefix=all+headphones%2Caps%2C284&ref=sr%5C_pg%5C_1"
}

Expect a response like the screenshot below depicts and a file in the data folder.

Postman screenshot

Test the scraper

Run

npm test

Expect results like these

> test
> jest --detectOpenHandles

  console.log
    Server is running on port 3000

      at Server.log (src/server.js:15:11)

 PASS  __tests__/scraper.test.js
  scraper
    ✓ generateFilename() returns a string (2 ms)
    ✓ saveProductJson() saves a file (1 ms)
    ✓ POST /scrape returns a 200 status code (2688 ms)

Test Suites: 1 passed, 1 total
Tests:       3 passed, 3 total
Snapshots:   0 total
Time:        2.927 s, estimated 3 s
Ran all test suites.

About

"CI pipeline for a web-scrapper build with Nodejs and Cheerio" by @mwaz

Resources

Stars

Watchers

Forks