Skip to content

A command-line app to extract proxy information written in Python

License

Notifications You must be signed in to change notification settings

krauss/prox_crapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Proxy Scrapper CLI

prox_crapper

1. Description

2. Web Scrapping Approach

3. Local Setup

4. Docker Setup

Description

Proxy Scrapper CLI is a command line interface application written in Python that extracts information about proxies from this website and save them as a json or xml file in the local export folder.

Web Scrapping Approach

Due to the web scrapping sensitivity nature of such web site, the chosen approach was to simulate a human navigation pattern by opening requests through a web browser client (selenium + geckodriver) and firing click events on page links (<a> tags) in order to extract information from the other pages of the web site.

prox_crapper, when set locally, requires the following dependencies:

Local Setup

Execute the commands below to setup the application according to your platform; Linux or Windows only.

Windows 📺

The following setup was successfully run on a Windows 10 Pro 64-bit machine

  • Clone this repository:
git clone https://github.com/krauss/prox_crapper.git
  • Change directory:
cd prox_crapper
  • Create a virtual environment:
python -m venv venv
  • Activate the virtual environment:
.\venv\Scripts\activate
  • Install prox_crapper dependencies:
pip install -r requirements.txt
  • Run prox_crapper application:
python src\main.py
  • When you're done, to exit the virtual environment:
deactivate

Linux 🐧

The following setup was successfully run on a Linux Fedora 33 64-bit machine

  • Clone this repository:
git clone https://github.com/krauss/prox_crapper.git
  • Change directory:
cd prox_crapper
  • Create a virtual environment:
python -m venv venv
  • Activate the virtual environment:
source venv/bin/activate
  • Install prox_crapper dependencies:
pip install -r requirements.txt
  • Run prox_crapper application:
python src/main.py
  • When you're done, to exit the virtual envirnoment:
deactivate

Docker Setup 🐳

In order to quickly try this out, follow the steps below to build the container and run it:

  • Build the container using the Dockerfile file provided
docker build -t prox_crapper .
  • [ Linux ] Run the container specifying a volume for the resulting json file
docker run -it -v $PWD/export:/usr/src/app/export  prox_crapper
  • [ Windows ] Run the container specifying a volume for the resulting json file
docker run -it -v %USERPROFILE%\export:/usr/src/app/export  prox_crapper

About

A command-line app to extract proxy information written in Python

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published