Skip to content

Dan6erbond/Fuzzle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fuzzle

Scaleable search engine with multiple versions and basic mistype correction.

Discord

Key Features

  • Searching through strings.
  • Ignoring low coverage results.
  • Sorting results by accuracy.
  • Support for dictionaries.

Algorithm

A concise documentation of the algorithm can be found here.

Versions

Currently Fuzzle exists in three different languages:

Data

To test the search engine, data was gathered from multiple sources to ensure features like tags, mistype correction and coverage work as expected. All the data was restructured into JSON lists/dictionaries for cross-platform compatability and is being expanded upon as new data is added to the sources.

Movies

A list of 28 795 movies from 1990 to the present day with the source being a JSON file found here which was restructured to turn the movies' names into keys and their cast into the tags. This allows you to search for a movie not only by name, but also by actor.

Games

The set of games was scraped from the Steam API. It currently contains around 27 thousand games which were then loaded into the games.json file containing the game's name as the key and the following as tags:

  • Categories: The SteamAPI has (so far) returned 27 unique categories including captions available, multi-player, online multi-player, includes source SDK, includes level editor, in-app purchases, shared/split screen, full controller support, MMO, online co-op, cross-platform multiplayer, partial controller support, steam achievements, local co-op, steam leaderboards, stats, commentary available, steam turn notifications, steam workshop, steam cloud, single-player, steam trading cards, co-op, local multi-player.
  • Genres: The data contained 30 unique genres so far including action, utilities, gore, strategy, animation & modeling, photo editing, education, sports,simulation, web publishing, documentary, sexual content, software training, tutorial, indie, rpg, massively multiplayer, design & illustration, game development, video production, nudity, audio production, casual, free to play, racing, adventure, violent, early access and accounting.
  • Developer(s) and publisher(s).
  • Platform(s): Currently steam stores these values as booleans and the three available options are windows, linux and macos.

Since this dataset is quite large and may prove useful in your own projects, the current state of the data as well as the scraper and it's dependancies were archived in the ZIP-Folder and can be downloaded for you to freely use!

Companies

A list of 5002 companies with their respective industry, state and city as tags which allows searches such as "california" or "food" to yield brands that do not contain the searched keyword in their name but instead are based in a specific state, city or are active in a certain industry.

Countries

A list of countries (presumably with duplicate values) with most of their major cities added as tags to allow finding a country by searching for a city.

Links

Logo

The Logo for Fuzzle was created by @lydocia who has her own GitHub profile as well as a website.

Roadmap

  • Removing irrelevant results.
  • Supporting tags.
  • Mistype correction.
  • Prioritizing fields.
  • Support for custom objects.
  • Returning 100% matches at first position followed by rest.
  • Models for different search types.