Skip to content

stephanos/subvoc

Repository files navigation

subvoc Build Status Coverage Status

This project was created by me to scratch my own itch. I love to watch movies and am always keen to expand my vocabulary. But it's difficult to notice an unknown word during a movie without spoiling the experience. That's where subvoc comes in: search for a movie and discover its vocabulary.

Online Demo

Visit https://subvoc.stephanbehnke.com (hosted on Heroku, takes a few moments to start sometimes).

NOTE: The external API can be flaky - you can visit a cached analysis in this case.

To get a quick impression, here are some screenshots:

Homepage Find Movie List of words Word details

How it works

When you select a movie, the OpenSubtitles API is queried for its subtitles. Then, the result is parsed, tokenized and analyzed sentence by sentence, word by word with the help of the Python Natural Language Toolkit. The difficulty of a word is determined by its relative frequency in the English language, assuming that more difficult words are simply used less.

Features

  • landing page with search bar
  • search movie by query
  • sort search results by popularity
  • host on Heroku
  • list of words sorted by difficulty
  • use the base of each word
  • lazy load analysis
  • show movie context for each word
  • include movie poster
  • support for idioms
  • support for TV show episodes
  • show context in another language side by side
  • wild idea: display YouTube videos with a certain word

Development

(requires Docker)

  • run server with scripts/dev-py.sh
  • build client scripts/dev-js.sh
  • run tests with scripts/test-py.sh and scripts/test-js.sh

License

MIT (see LICENSE).