europarl_scraper

A Python scrapy project for scraping data from European Parliament's website.

Installation

pip install -r requirements.txt

Just give me the data!

It's on S3 in a public bucket!

To run:

First, grab the start urls. Run python get_urls.py
Then, run any of the scrapers:

scrapy crawl europarl_speeches -o data/speeches.csv scrapy crawl europarl_debates -o data/debates.csv scrapy crawl europarl_speakers -o data/speakers.csv

Notes

There are many TODO's for this still, so plz be patient.

Figure out why not all members are in initial JSON
Where are sources for this? https://github.com/eliflab/European-Parliament-Open-Data/blob/master/meps_full_list_with_twitter_accounts.csv
How to divide work to speed up?

Questions?

Feel free to reach out on Twitter or Freenode (@kjam).

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
europarl_scraper		europarl_scraper
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
get_urls.py		get_urls.py
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

europarl_scraper

europarl_scraper

notebooks

notebooks

.gitignore

.gitignore

README.md

README.md

get_urls.py

get_urls.py

requirements.txt

requirements.txt

scrapy.cfg

scrapy.cfg

Repository files navigation

europarl_scraper

Installation

Just give me the data!

To run:

Notes

Questions?

About

Releases

Packages

Languages

kjam/europarl_scraper

Folders and files

Latest commit

History

Repository files navigation

europarl_scraper

Installation

Just give me the data!

To run:

Notes

Questions?

About

Resources

Stars

Watchers

Forks

Languages