Skip to content

thtroyer/imdb-downloader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

imdb-downloader

IMDb provides a data dump for noncommercial use here: https://www.imdb.com/interfaces/

Unfortunately, the data provided is in tsv format and the files are quite large and difficult to work with and join across.

This script downloads zipped files, extracts them, and imports the data into a sqlite database so that queries can be written against it.

Requirements

Linux or Linux-like environment with the following packages installed:

  • Bash
  • sqlite3
  • wget
  • gunzip

Usage

git clone git@github.com:thtroyer/imdb-downloader.git
cd imdb-downloader
./run.sh

Open the resulting imdb.db in your favorite sqlite tool and then you can run queries like this:

-- Show all movies/shows, ordered by highest rating first
SELECT *
FROM title_basics AS tb
INNER JOIN title_ratings AS tr ON tb.tconst = tr.tconst
ORDER BY CAST(numVotes AS number) DESC;