Skip to content

scanner-research/tv-news-viewer

Repository files navigation

Stanford TV News Analyzer

Setup instructions

  1. Install Rust (see https://rustup.rs/)
  2. Clone submodules: git submodule init && git submodule update
  3. Run ./install_deps.sh to install the submodules
  4. cd vgrid-widget and run ./install.sh.
  • This will require npm and other javascript dependencies (npm install --save react react-dom mobx mobx-react). Install them as needed.
  • Once this succeeds, cd .. to return to the top level.
  1. Install python dependencies: pip3 install -r requirements.txt
  2. Copy/symlink the indexed captions as data/index
  3. Copy/symlink the data directory as data
  4. Run ./derive_data.py to generate derived data
  5. Run ./develop.py to start a development server or edit config.json to serve using wsgi.

Running tests

Run pytest -vs tests from the top directory.

Indexed captions directory

There should be 4 entries in this directory

  • documents.txt (a list of documents that are indexed)
  • lexicon.txt (a list of all the words)
  • index.bin (a directory or inverted index file)
  • data (a directory of all the binary encoded captions)

Data directory

The data directory consists of the following files and directories:

  • videos.json (metadata about the videos)
  • faces.ilist.bin (intervals when faces are on screen)
  • people (directory containing intervals when identified people are on screen)
  • people.metadata.json (optional; JSON dictionary of names to metadata tags)
  • hosts.csv (optional; a list of people and channels that they are hosts of)
  • face-bboxes (directory containing face bounding boxes)
  • derived (this directory is generated by ./derive_data.py)

Terminology in the code

  • IntervalList (or ilist) - These are files that store intervals with a binary bit-vector payload. The intervals can overlap, but must be sorted by start time.
  • IntervalSet (or iset) - These are files that store non-overlapping intervals, sorted by start time. Unlike IntervalList, there is no bit-vector payload.

About

Interactive exploration of a decade of TV news

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •