Skip to content

holderdeord/hdo-transcript-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hdo-transcript-search

Build Status Code Climate

Visualize language usage in the Norwegian parliament. See it in action at tale.holderdeord.no.

image

This project consists of two parts:

  • indexer/ - download and index Stortinget transcripts in ElasticSearch
  • webapp/ - web frontend to present / visualize the data

Running with docker-compose

$ docker-compose up -d es webapp
$ docker-compose run --rm indexer

Requirements

  • elasticsearch
  • node.js
  • ruby

indexer

Download and index transcripts (requires a local elasticsearch):

$ cd indexer/
$ gem install bundler
$ bundle install
$ bundle exec ruby -Ilib bin/hdo-transcript-indexer

Re-create the index. This is necessary when a mapping is changed:

$ bundle exec ruby -Ilib bin/hdo-transcript-indexer --create-index

Convert a single XML transcript to indexable JSON:

$ bundle exec ruby -Ilib bin/hdo-transcript-converter transcript.xml

webapp

Start the webapp in dev mode:

$ cd webapp
$ npm install
$ npm run dev
# open your browser at http://localhost:7575/

Caveats

  • Because of deficiencies in the transcripts, we don't know the correct time for all speeches. The "time" field will in these cases be set to midnight.