Skip to content

Using Scrapy framework to scrape data simultaneously from several Jumia sites, the leading e-commerce site in Africa

Notifications You must be signed in to change notification settings

hericlibong/Jumia-Scraper-multisites

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MULTIJUMIA WEBSITES SCRAPER

python Python

Presentation

The Scrapy program from this repo allows the user to scrap data from multiple Jumia websites simultaneously by running a single command line.
Data is retrieved from sites based in the following countries:

  • Kenya
  • Nigeria
  • Uganda
  • Algeria
  • Tunisia
  • Morocco
  • Ivory Coast
  • Senegal

What is jumia?

logo

Jumia is a Pan-African technology company that is built around a marketplace, logistics service and payment service. The logistics service enables the delivery of packages through a network of local partners while the payment services facilitate the payments of online transactions within Jumia’s ecosystem. It has partnered with more than 100,000 active sellers and individuals and is a direct competitor to Konga in Nigeria and Amazon in Egypt.

Prerequisites

  • Python

versions3.10 or 3.8

Install and run

create a virtual environment

virtualenv venv

... activate it

source venv/bin/activate
  • Clone the repo
  • open JUMIA_INTER folder
  • install dependencies
pip install -r requirements.txt 

or

pip install scrapy

To scrape all sites simultaneously from the root of the project run :

python run_spider.py

To scrape a single spider :

  • from the root :
scrapy crawl <spidername> ex: jumia_kenya or jumia_senegal>
  • from a single spider :

got to spiders folder

cd JUMIA_INTER/JUMIA_INTER/spiders

choose your spider and run it :

scrapy runspider jumia_kenya.py 

About

Using Scrapy framework to scrape data simultaneously from several Jumia sites, the leading e-commerce site in Africa

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages