imdb2meta

A service for getting movie and TV show metadata for an IMDb ID via HTTP or gRPC, using the official IMDb datasets

Content

Content
Usage
Protocol buffer generation
⚠ Warning

Usage

First you need import the data of the IMDb dataset into a database, then you need to start the web service which is backed by the database and finally you can query it via HTTP or gRPC.

1. Import data

First you need import the data of the IMDb dataset into a database. We support BadgerDB and bbolt.

Steps:

Download the title.basics.tsv.gz dataset from https://datasets.imdbws.com
- For more info about IMDb datasets see https://www.imdb.com/interfaces/
- ⚠ Warning: IMDb.com, Inc is the copyright owner of the data in the IMDb datasets. You may only use the data for personal and non-commercial use. For more info see "Can I use IMDb data in my software?" and their copyright/conditions of use statement.
Exract the TSV file somewhere
Run the import tool with the appropriate CLI arguments
- Example: imdb2meta-import -tsvPath "/home/john/Downloads/data.tsv" -badgerPath "/home/john/imdb2meta/badger"

Note: The import takes a while (and much longer with bbolt than with BadgerDB), the process requires a lot of memory and the final DB size is fairly big.
With a 6-core, 12-thread CPU and a mid-range SSD, an import of all data (7351639 rows as of 2020-11-21) into BadgerDB takes 4 minutes, up to 1.03 GB memory and the final DB size is 1.29 GB.
When skipping TV episodes and storing only the minimal metadata it takes 1 minute and 5 seconds, up to 530 MB memory and the final DB size is 314 MB.

CLI reference:

Usage of imdb2meta-import:
  -badgerPath string
        Path to the directory with the BadgerDB files
  -boltPath string
        Path to the bbolt DB file
  -limit int
        Limit the number of rows to process (excluding the header row)
  -minimal
        Only store minimal metadata (ID, type, title, release/start year)
  -skipEpisodes
        Skip storing individual TV episodes
  -skipMisc
        Skip title types like "videoGame", "audiobook" and "radioSeries"
  -tsvPath string
        Path to the "data.tsv" file that's inside the "title.basics.tsv.gz" archive

2. Run service

After importing the data you can start the web service.

Example: imdb2meta-service -badgerPath "/home/john/imdb2meta/badger"

CLI reference:

Usage of imdb2meta-service:
  -badgerPath string
        Path to the directory with the BadgerDB files
  -bindAddr string
        Local interface address to bind to. "localhost" only allows access from the local host. "0.0.0.0" binds to all network interfaces. (default "localhost")
  -boltPath string
        Path to the bbolt DB file
  -grpcPort int
        Port to listen on for gRPC requests (default 8081)
  -httpPort int
        Port to listen on for HTTP requests (default 8080)

Docker

You can also run the service as Docker container.

Update the image: docker pull doingodswork/imdb2meta-service
Start the container: docker run --name imdb2meta -v /path/to/badger:/data -p 8080:8080 -p 8081:8081 doingodswork/imdb2meta-service -badgerPath "/data"
- Note: Ctrl-C only detaches from the container. It doesn't stop it.
- When detached, you can attach again with docker attach imdb2meta
To stop the container: docker stop imdb2meta
To start the (still existing) container again: docker start imdb2meta

3. Query service

After starting the web service you can query it via HTTP or gRPC:

HTTP

Example request: curl "http://localhost:8080/meta/tt1254207"

Example response:

{
    "id": "tt1254207",
    "titleType": "SHORT",
    "primaryTitle": "Big Buck Bunny",
    "startYear": 2008,
    "runtime": 10,
    "genres": [
        "Animation",
        "Comedy",
        "Short"
    ]
}

gRPC

Example request (using grpcurl): grpcurl -plaintext -d '{"id":"tt1254207"}' localhost:8081 imdb2meta.MetaFetcher/Get
(In Windows/PowerShell you have to use '{\"id\":\"tt1254207\"}')

Example response:

{
    "id": "tt1254207",
    "titleType": "SHORT",
    "primaryTitle": "Big Buck Bunny",
    "startYear": 2008,
    "runtime": 10,
    "genres": [
        "Animation",
        "Comedy",
        "Short"
    ]
}

Protocol buffer generation

To re-generate the meta.pb.go file from the meta.proto file, run: protoc -I="./protos" --go_out=./pb --go_opt=paths=source_relative meta.proto

To re-generate the service.pb.go and service_grpc.pb.go files from the service.proto file, run: protoc -I="./protos" --go_out=./pb --go_opt=paths=source_relative --go-grpc_out=./pb --go-grpc_opt=paths=source_relative service.proto

⚠ Warning

IMDb.com, Inc is the copyright owner of the data in the IMDb datasets. You may only use the data for personal and non-commercial use. For more info see "Can I use IMDb data in my software?" and their copyright/conditions of use statement.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
cmd		cmd
docker		docker
pb		pb
protos		protos
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd

cmd

docker

docker

pb

pb

protos

protos

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

go.mod

go.mod

go.sum

go.sum

Repository files navigation

imdb2meta

Content

Usage

1. Import data

2. Run service

Docker

3. Query service

HTTP

gRPC

Protocol buffer generation

⚠ Warning

About

Releases 4

Languages

License

Deflix-tv/imdb2meta

Folders and files

Latest commit

History

Repository files navigation

imdb2meta

Content

Usage

1. Import data

2. Run service

Docker

3. Query service

HTTP

gRPC

Protocol buffer generation

⚠ Warning

About

Topics

Resources

License

Stars

Watchers

Forks

Languages