Skip to content

rubenros1795/hansard

Repository files navigation

Enriching Parliamentary Metadata with Wikidata/Wikipedia

Intro

British parliamentary data aggregated in the PoliticalMashup project lacks many metadata in the period before 1935. This repo aims to add missing metadata. It is focused on appending information to <membership> tags in the PoliticalMashup speaker metadata. Every speaker can have multiple memberships (member of the House of Commons in a specific period, or a cabinet position). Because Wikidata contains data on the MPs during parliamentary periods, Wikidata information can be used to enrich the data.

I focus on parties, but because the Wikidata keys are present throughout the process all kinds of metadata can be added (birth date etc.)

Steps

  1. Scrape election results from Wikipedia (s0_parse_wikipedia_data.ipynb)
  2. Download .tsv files for every election with the Wikidata query service. This is the query format.
  3. Transform the downloaded Wikidata tables (s1_transform_wikidata_tables.py)
  4. Add missing party data to Wikidata tables using the Wikipedia tables gathered in step 1 (s2_add_parties_to_wikidata.ipynb)
  5. Flatten the .xml member data to .csv files (because .xml is horrible) (s3_flatten_member_metadata.ipynb)
  6. Enrich the district/constituency metadata in the Wikidata tables (s5_enrich_memberships.ipynb). Fuzzy string matching is used to match districts. I correct this manually afterwards. This step is necessary because I use the districts/constituencies as a way to match speakers and parties later.
  7. The final (not yet complete) step is to match all the enriched wikidata with the PoliticalMashup membership tags based on exact name matches, district and parliamentary dates matches and, if all else fails, fuzzy string matching. (s6_add_party_to_memberships.ipynb). As far as I can see the whole endeavour leads to 56% of the missing party data being found!

Alt text

About

Notebooks for Hansard enrichment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published