Skip to content
This repository has been archived by the owner on Jun 5, 2023. It is now read-only.

jm-janzen/py_crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Simple CLI Python Webcrawler example

Package Requirements

Python-v3.5.2, Geckodriver-v0.11.1-linux64 (for browser), xorg-server 1.18.4 (for Xvfb virtual display)

Python Module Requirements

Selenium-v3.0.2, Xvfbwrapper-0.2.8


Install Python modules [locally]

pip install [--user] selenium
pip install [--user] xvfbwrapper

Install FF webdriver

# download
wget https://github.com/mozilla/geckodriver/releases/download/v0.11.1/geckodriver-v0.11.1-linux64.tar.gz

# unpack tarball
tar xf geckodriver-v0.11.1-linux64.tar.gz

# add unpacked bin to PATH env (NB: This should be added to an upstart script!)
export PATH="$PATH:/path/to/geckodriver/dir"

Running

python crawler.py <URL> <ATTR>  # Some HTML attribute to search for

About

Example Selenium Firefox Web Crawler

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages