#

tokenisation

Here are 13 public repositories matching this topic...

darshank15 / wikipedia-search-engine

Built a complete search engine by creating an Inverted Index on the Wikipedia corpus ( of 2018 with size 72 GB). That gives you top search result related to given query words.

search wikipedia mergesort inverted-index response-time stemming tokenisation search-engine-algorithm stop-words

Updated Oct 1, 2020
Jupyter Notebook

Freud16 / f_backyard__Search_Engine

A search engine is constructed to return customised recipes according to three sorting algorithms. Speed is improved by performing pre-processing and inverted index.

inverted-index optimisation tokenisation data-cleansing

Updated Nov 24, 2021
Jupyter Notebook

casics / spiral

A Python 3 module that provides functions for splitting identifiers found in source code files.

python machine-learning source-code splitter camelcase identifier mining-software-repositories split-string tokenisation splitting-identifiers

Updated Jan 12, 2023
Python

kbnim / Letteriser

A tiny utility that takes a string and decomoposes it to the letters of the Hungarian alphabet.

java string-split tokenisation command-line-utility

Updated Sep 23, 2023
Java

SmartTokenLabs / TokenScript

TokenScript schema, specs and paper

security cryptography mobile xml blockchain tokens web3 tokenization tokenisation

Updated Dec 6, 2023
JavaScript

alasdairforsythe / tokenmonster

Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript

tokenizer vocabulary vocabulary-builder tokenize tokenization tokenisation tokenizing text-tokenization vocabulary-generator

Updated Jan 28, 2024
Go

mudittt / text-summarizer

It is an end-to-end text summarizer application, which uses Meta's BART model and is fine-tuned on the Samsung dataset.

docker bart summarization tokenisation huggingface huggingface-transformers huggingface-datasets

Updated Mar 22, 2024
Jupyter Notebook

checkout / frames-android

Frames Android: making native card payments simple

android validation checkout payment credit-card fintech tokenisation payment-express card-validations mobile-payments

Updated May 23, 2024
Kotlin

omorfi

flammie / omorfi

Open morphology for Finnish

python analysis python-bindings spell-check morphological-analysis finnish tokenisation

Updated Apr 22, 2024
Python

DataRish / MBTI-Personality-Predictor

This project predicts MBTI personality types from users' recent 50 posts using NLP and ML techniques.

python random-forest machine-learning-algorithms logistic-regression data-preprocessing data-preparation decision-tree lemmatization tokenisation multinomial-naive-bayes linear-support-vector-machine xgboost-classifier catboost-classifier

Updated Apr 29, 2024
Jupyter Notebook

checkout / frames-ios

Frames iOS: making native card payments simple

swift ios validation checkout payment credit-card fintech tokenisation card-payment card-validations mobile-payments payments-expert

Updated May 23, 2024
Swift

taibun

andreihar / taibun

Taiwanese Hokkien Transliterator and Tokeniser

python nlp natural-language-processing tokenizer transliteration tl poj romanization nlp-library transliterator tokeniser zhuyin hokkien tokenization taiwanese tokenisation taigi romanisation

Updated May 25, 2024
Python

taibun.js

andreihar / taibun.js

Taiwanese Hokkien Transliterator and Tokeniser

javascript nlp natural-language-processing js tokenizer transliteration tl poj romanization nlp-library transliterator tokeniser zhuyin hokkien tokenization taiwanese tokenisation taigi romanisation

Updated May 25, 2024
JavaScript

Improve this page

Add a description, image, and links to the tokenisation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tokenisation topic, visit your repo's landing page and select "manage topics."