curl-auth-csrf

Python tool that mimics cURL, but performs a login and handles any Cross-Site Request Forgery (CSRF) tokens.

Useful for scraping HTML normally only accessible when logged in.

Features

Runs on any OS supported by Python
Runs in Python2 and Python3
Allows specifying arbitrary GET/POST data to be included with login form submission (i.e. username)
Reads password from stdin (to avoid the plain-text password showing up in shell history)
Parses login form and dynamically replicates all form inputs (including hidden ones such as csrfmiddlewaretoken)
Automatically populates HTTP Referer header consistent with expected login sequence
To support multiple login forms on the page, script allows specifying HTML id of form
To support multiple password fields within the same login form (though rare), script allows specifying HTML field name for password
Handles HTTPS and HTTP 302 redirects
Allows validating login success by testing resultant URL and/or content on resultant page
Uses Python Requests HTTP library for session (cookie) management during every script run
Allows an arbitrary number of pages to be fetched while logged in
Optionally performs logout (to avoid leaving a session open from the server's perspective)
Allows User-Agent string spoofing (chooses a "safe" default if not otherwise specified)
Defaults to output via stdout, but can alternatively output to file

Usage

usage: curl-auth-csrf.py [-h] [-a USER_AGENT_STR] -i LOGIN_URL [-f FORM_ID]
                         [-p PASSWORD_FIELD_NAME] [-d DATA] [-u SUCCESS_URL]
                         [-t SUCCESS_TEXT] [-j LOGOUT_URL] [-o FILE]
                         [--version]
                         url_after_login [url_after_login ...]

Python tool that mimics curl, but performs a login and handles any Cross-Site
Request Forgery (CSRF) tokens.  Useful for scraping HTML normally only
accessible when logged in.

positional arguments:
  url_after_login

optional arguments:
  -a USER_AGENT_STR, --user-agent-str USER_AGENT_STR
                        User-Agent string to use
  -i LOGIN_URL, --login-url LOGIN_URL
                        URL that contains the login form
  -f FORM_ID, --form-id FORM_ID
                        HTML id attribute of login form
  -p PASSWORD_FIELD_NAME, --password-field-name PASSWORD_FIELD_NAME
                        name of input field containing password
  -d DATA, --data DATA  adds the specified data to the form submission
                        (usually just the username)
  -u SUCCESS_URL, --success-url SUCCESS_URL
                        URL substring constituting successful login
  -t SUCCESS_TEXT, --success-text SUCCESS_TEXT
                        HTML snippet constituting successful login
  -j LOGOUT_URL, --logout-url LOGOUT_URL
                        URL to be visited to perform the logout
  -o FILE, --output FILE
                        write output to <file> instead of stdout
  --version             show program's version number and exit
  -h, --help            show this help message and exit

If actual password is not passed in via stdin, the user will be prompted.

Password Entry

The script expects the password to be passed in via stdin, to avoid the plain-text password showing up in shell history. A simple way to do this is as follows:

echo ThisIsMyPassword | ./curl-auth-csrf.py -i http://foobar.com/login -d username=bob http://foobar.com/secure_page

(Trailing newlines in the password are ignored.)

However, this defeats the purpose, as the password still shows up in the shell history. (Exception: In Bash, start the line with an initial space, which will prevent the line from showing up in the history. Refer to Bash documentation on HISTCONTROL and HISTIGNORE.)

A better way to handle this is with a CLI password management tool, such as pass. This is the recommended approach. For example, assuming that your password is managed by pass and already encrypted under the handle foobar.com:

pass foobar.com | ./curl-auth-csrf.py -i http://foobar.com/login -d username=bob http://foobar.com/secure_page

If nothing is passed in via stdin, then the user will be prompted for the password (interactively):

./curl-auth-csrf.py -i http://foobar.com/login -d username=bob http://foobar.com/secure_page
Password:

Examples

If your username is bob@email.com for pbs.org, following is how you might normally scrape the zip code from your user profile:

curl -sL https://account.pbs.org/accounts/profile/ | grep Zip

However, since doing so requires being logged in, here's one way to do it using curl-auth-csrf:

pass pbs.org | ./curl-auth-csrf.py -i https://account.pbs.org/accounts/login/ -d email=bob@email.com -u https://account.pbs.org/accounts/profile/ -j https://account.pbs.org/accounts/logout/ https://account.pbs.org/accounts/profile/ | grep Zip

Notes:

The URL of the login page is https://account.pbs.org/accounts/login/
The HTML input field of the username is email
The URL we're taken to upon successful login is https://account.pbs.org/accounts/profile/
The URL of the logout page is https://account.pbs.org/accounts/logout/
The URL we want to scrape the zip code from is https://account.pbs.org/accounts/profile/
The information scraped is the only data written to stdout, so we can grep over it to pull what we're looking for

Another example, with a logout page and multiple pages fetched while logged in:

pass thefastpark.com | ./curl-auth-csrf.py -i https://www.thefastpark.com/ -d username=bob@email.com -u https://www.thefastpark.com/myrewards/history/ -j https://www.thefastpark.com/myrewards/logout/ https://www.thefastpark.com/myrewards/history/ https://www.thefastpark.com/myrewards/redeempoints/ | egrep -i '(Total Points|points available)'

Limitations

This script only handles standard logins involving a single form submission with a username, password, and hidden fields for CSRF. It will not handle the following scenarios:

Logins involving CAPTCHA
Logins involving re-authentications (i.e. multiple successive password prompts)
Logins involving two-factor authentication
Logins involving any client-side password transformations (i.e. passing a hashed password to the server)

If all you need is basic HTTP authentication, this script is overkill. cURL and Wget can do that out-of-box.

Gotchas

If you're redirecting (or piping) output to file (or another utility) and receiving an UnicodeEncodeError exception, try setting PYTHONIOENCODING=UTF-8 in your terminal. See this post.

Disclaimer

Please don't abuse this tool. Only use it with accounts that rightfully belong to you. If you use this tool with someone else's login, you are solely responsible and may face legal consequences.

This script isn't perfect. See the Limitations section above; also, there may be defects. Beware that some Internet services won't take kindly if you login incorrectly (i.e. not in a normal browser). Your using this tool means that you accept full responsibility for anything that might happen.

Debugging

If you're having trouble finding the right parameters, you can change the default debugging level from "WARNING" to "DEBUG" at the top of the Python script. See discussion at #2.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
curl-auth-csrf.py		curl-auth-csrf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

curl-auth-csrf.py

curl-auth-csrf.py

Repository files navigation

curl-auth-csrf

Features

Usage

Password Entry

Examples

Limitations

Gotchas

Disclaimer

Debugging

About

Releases 7

Packages

Languages

License

JElchison/curl-auth-csrf

Folders and files

Latest commit

History

Repository files navigation

curl-auth-csrf

Features

Usage

Password Entry

Examples

Limitations

Gotchas

Disclaimer

Debugging

About

Resources

License

Stars

Watchers

Forks

Languages