Skip to content

peterdalle/screenshot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Screenshot

Create a screenshot of a full web page, not only the visible part of the web page that is above the fold (browser viewport). This is achieved by automatically opening and scrolling through a web page to force dynamic images to load. Then a screenshot is saved to a PNG file.

PNG files can become quite large, like 30 MB or so for the front page of a news site.

Install

  1. Install dependencies Selenium (the actual thing that does all the work) and python-slugify (converts URLs into file names, e.g. www.google.com into www-google-com):
pip install selenium
pip install python-slugify
  1. Download a web driver. I recommend firefox over chrome due to compatability.

  2. Make sure Python can find the web driver by modifying your PATH environment variable. This is described in the Selenium installation guide.

  3. Download Screenshot:

$ git clone git@github.com:peterdalle/screenshot.git

Usage

Provide a URL or domain name as argument:

$ python screenshot.py google.com

A file like 2018-01-12_18-02_http-google-com.png is then saved in your current directory, with current date and time stamp (yyyy-mm-dd_hh-mm).

Provide multiple URLs or domin names as arguments:

$ python screenshot.py google.com bbc.com svt.se "https://example.net/search?q=test&p=3"

Note that the & character in URLs have a special meaning in the terminal/command prompt, so don't forget to enclose those URLs in " quotes.

You can also provide a file name (urls.txt) with one URL or domain name per line:

$ python screenshot.py urls.txt

Settings

Change the behavior of the program in the settings class. Each setting is documented there.

The most important setting is probably headless = True which means that a browser is opened in the background without opening a visible browser window.

Known issues

Memory hog

Selenium seem to have a problem closing the web driver, resulting in lots of web drivers left running and clogging down memory resources. You may need to kill the running processes now and then, especially if you screenshot with crontab.

Alternative approach

Another approach is to use the following bash command that creates a virtual x server environment:

xvfb-run --auto-servernum --server-num=1 --server-args="-screen 0 1024x8048x16" cutycapt --url="http://example.net/" --out="example.net.jpg"

The file bash_screenshot.py is just a wrapper around this command that takes a url as input parameter and outputs a file with a time stamp and url.

Use it as follows:

$ python bash_screenshot.py http://example.net/

This will produce a file like 2018-01-01-18-40_http-www-example-net.jpg. Make sure to use .jpg as file extension since .png will create much larger files (JPG has a lossy compression).

About

Take a screenshot of a whole web page (including the page below the fold and dynamically loaded images)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages