Skip to content

RahulNenavath/Guess-The-Hero

Repository files navigation

Introduction:

Title: Guess the Super Hero!

Problem Statement:

Given a description of a superhero, return two guessed superhero names

Example Input & Output:

Input description: Knight of Dark, Gotham protector, Smart, Intelligent, martial artist, master of dark, educated
Output: Batman

Dataset Description:

Dataset link: https://www.kaggle.com/datasets/jonathanbesomi/superheroes-nlp-dataset

Total rows: 1450
Total columns: 81

Solution:

The task is to map a given description to an entity name.
Challenges:

  • Limited number of records for supervised machine learning approches.
  • There is no target class: Each record describes a unique super hero. Thereby not a regular classification & regression task
  • Although multiple superheros may have similar characteristics, each super hero is different. Thereby not exactly clustering

Considering the input to the system, first and second points in challenges, I need to construct superhero description from the provided dataset. This transformation of structured information into unstructured text is unconventional but it is efficient this way.

The assumption is this constructed superhero description will have rich information about the superhero. This description will help us match the input description to superhero name.

Solution: Use Semantic Search to match the input description query to existing superhero descriptions and fetch top - k records. Later, use Keyword Search to find the records that have large number of similar words as of input description.

Tech Stack:

  • Python - 3.8
  • Spacy - 3.4
  • Spacy Transformers - 1.1.8
  • KeyBERT - 0.6.0
  • hnswlib - 0.6.2 - Hierarchical Navigable Small World for Approximate Nearest Neightbour Search
  • Sentence Transformers - 2.2.2
  • Text Distance - 4.5.0
  • AWS EC2
  • Docker & Docker Compose
  • GitHub Actions CI/CD - Deploy on AWS on Git Push
  • Caddy - Reverse Proxy & Automatic SSL Certificate Generation and Verification

API Input / Output:

{description: "marvel comics, super strength, leader, avengers, super solider, strong, honest, brooklyn"}

{Superhero_Guess: [Knockout, Captain America]]}

About

Given a superhero description, find the superhero associate with that description - A Semantic Search Project

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages