Skip to content

woctezuma/steam-CLIP

Repository files navigation

Steam CLIP: match Steam Banners with OpenAI's CLIP

This repository contains Python code to retrieve Steam games with similar store banners, using OpenAI's CLIP.

Image similarity is assessed by the cosine similarity between image features encoded by CLIP.

Similar vertical banners

Requirements

  • Install the latest version of Python 3.X.
  • Install the required packages:
python -m pip install --upgrade pip
pip install -r requirements.txt

Model

CLIP is a neural network:

  • combining an image encoder (either ResNet or Vision Transformer ViT) and a text encoder (Transformer),
  • pre-trained on the WebImageText (WIT) dataset, consisting of 400 million (image, text) pairs.

In this repository, the image encoder is ViT-B/32.

Data

Data is available in download-steam-banners-data/.

The most recent data snapshot was downloaded with this Colab notebook on January 9, 2021. Open In Colab

This snapshot is shared as an archive (original_vertical_steam_banners.tar, 1.5 GB) on Google Drive.

It consists of vertical Steam banners (300x450 resolution), available for 29982 out of 48792 games, i.e. 61.4% of games.

Resized images are provided in the same repository for resolutions 256, 224, 128, 64, etc.

The list of appIDs (before any potential filtering) is from steam-store-snapshots.

Filtering out

Information is also provided in .txt logs about a possible filtering out of images based on:

  • image size (before resizing images):
    • there is 1 image with resolution 600x900,
    • this is not a big issue as the image ratio is equal to the expected ratio for 300x450 images,
  • image channels (before and after resizing images):
    • most images are 'RGB' (for true color images) ; total: 29642 images,
    • a few images are 'L' ('luminance' for greyscale images) ; total: 306 images,
    • very few images are 'CMYK' (for pre-press images) ; total: 34 images,
  • blank images:
    • there are 2 images either totally black (appID: 603280) or totally white (appID: 1076060),
    • these specific images are not reported about since they already appear in the log w.r.t. image channels.

It is up to the reader to filter out the dataset based on these logs. Logs can be reproduced with this Colab notebook. Open In Colab

Usage

Run match_steam_banners_with_CLIP.ipynb. Open In Colab

This will:

  • compute and store the 512 features corresponding to each banner,
  • find the 10 most similar store banners to curated query appIDs,
  • find the one most similar store banner to all appIDs available on the store, then display the most unique games.

NB: by default, query appIDs consist of:

  • the top 100 most played games during the past two weeks, according to SteamSpy,
  • a few manually curated games.

NB: unique games are ones which are the most dissimilar (low similarity score) to others to their first neighbor.

Web apps

Results can be interactively explored with web apps:

Results


The CLIP embedding for the ~30k banners is shared on Google Drive.

Results obtained with OpenAI's CLIP are shown on the Wiki.

The linked pages contain a lot of images and might be slow to load depending on your Internet bandwidth.

Caveat

Note

As noticed on January 16, 2021 in the #1 issue in the official repository of openai/CLIP, image similarity is driven by the text present in the images. This was then officially discussed by OpenAI on March 4, 2021 on the Distill website and in a blog post. This remark also appears in Figure 5 (now 6) of the DALL·E 2 paper published on April 13, 2022.

DALL·E 2

Similar games

Direct links to similarity results are available below:

For instance: Similar vertical banners


Fall Guys


Ring of Elysium


Chivalry


Call of Duty


Day of Defeat


Cities

Dinosaurs

Dinosaurs

Dinosaurs

WWII / Heroes / War / W

War

Logos

The model is able to retrieve games from the same franchise.

Arma / Army / Arms

Arma

Borderlands / Land

Borderlands

Braveland / Capital "B" letter

Braveland

Call of Duty

Call of Duty

Age of Empires / Empire(s) / Emperor

Age of Empires

Fallout

Fallout

Grand Theft Auto

Grand Theft Auto

Guacamelee

Guacamelee

Hitman

Hitman

λ / Half-Life / Hal

λ

NBA / 2K

NBA 2K

Words

The model relies on words present in the images, as shown in the following examples.

Forest

Forest

Legend

Legend

Civilization / Civ / tion

Civilization

Dawn

Dawn

Dead / Light

Dead / Light

Don't / Together

Don't / Together

Heroes

Heroes

Hunt

Hunt

Life / Strange

Life / Strange

Monster / Hunt

Monster / Hunt

Planet

Planet

Portal

Portal

Rain

Rain

Secret / Lab

Secret / Lab

Story

Story

Story (again)

Story (again)

Truck

Truck

Black

Black

Ust

Ust

Dead

Dead

Valley

Dead

The absence of words allows to retrieve similar image content.

Tanks

Unique games

Direct links to similarity results are available below:

For instance: Unique vertical banners

References