Built a complete search engine by creating an Inverted Index on the Wikipedia corpus ( of 2018 with size 72 GB). That gives you top search result related to given query words.
-
Updated
Oct 1, 2020 - Jupyter Notebook
Built a complete search engine by creating an Inverted Index on the Wikipedia corpus ( of 2018 with size 72 GB). That gives you top search result related to given query words.
A search engine is constructed to return customised recipes according to three sorting algorithms. Speed is improved by performing pre-processing and inverted index.
A Python 3 module that provides functions for splitting identifiers found in source code files.
A tiny utility that takes a string and decomoposes it to the letters of the Hungarian alphabet.
TokenScript schema, specs and paper
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
It is an end-to-end text summarizer application, which uses Meta's BART model and is fine-tuned on the Samsung dataset.
Frames Android: making native card payments simple
Open morphology for Finnish
This project predicts MBTI personality types from users' recent 50 posts using NLP and ML techniques.
Frames iOS: making native card payments simple
Taiwanese Hokkien Transliterator and Tokeniser
Taiwanese Hokkien Transliterator and Tokeniser
Add a description, image, and links to the tokenisation topic page so that developers can more easily learn about it.
To associate your repository with the tokenisation topic, visit your repo's landing page and select "manage topics."