Skip to content

Latest commit

 

History

History
60 lines (41 loc) · 1.23 KB

README.md

File metadata and controls

60 lines (41 loc) · 1.23 KB

Web scraping with Nodejs and Cheerio

CI pipeline for a web scraper built with Nodejs and Cheerio

Install dependencies

npm install

Run scraper

npm start

Then head to a HTTP client (like Postman, Insomnia, or Hoppscotch), enter the endpoint "/scrape", enter a request body like the one below (this app is limited to URLs with the base URL as https://www.amazon.com) and run the request.

{
  "url": "https://www.amazon.com/s?k=all+headphones&crid=2TTXQBOK238J3&qid=1667301526&sprefix=all+headphones%2Caps%2C284&ref=sr%5C_pg%5C_1"
}

Expect a response like the screenshot below depicts and a file in the data folder.

Postman screenshot

Test the scraper

Run

npm test

Expect results like these

> test
> jest --detectOpenHandles

  console.log
    Server is running on port 3000

      at Server.log (src/server.js:15:11)

 PASS  __tests__/scraper.test.js
  scraper
    ✓ generateFilename() returns a string (2 ms)
    ✓ saveProductJson() saves a file (1 ms)
    ✓ POST /scrape returns a 200 status code (2688 ms)

Test Suites: 1 passed, 1 total
Tests:       3 passed, 3 total
Snapshots:   0 total
Time:        2.927 s, estimated 3 s
Ran all test suites.