Text Matching using Siamese Char CNN in Pytorch

Context

Inspired by these papers

this is a system that takes a set of input characters (eg an employee name, a bank transaction text) and returns the most likely match from a set of options (eg a database of employees / company names).

Structure

The system is trained in a siamese fashion, trying to minimise the triplet loss between matching strings. The idea is to first pretrain a model on a large set of different but related strings (misspelled words, lemmatisation, etc) and then adapt it for use on whatever data is available for the use case.

It is currently set up for use with lower-case letters and spaces. Sequences of numbers are replaced by a single 0.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
datalib		datalib
model		model
scripts		scripts
README.md		README.md
__init__.py		__init__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datalib

datalib

model

model

scripts

scripts

README.md

README.md

init.py

init.py

Repository files navigation

Text Matching using Siamese Char CNN in Pytorch

Context

Structure

About

Releases

Packages

Languages

AndreHeunisML/Deep-String-Matching

Folders and files

Latest commit

History

Repository files navigation

Text Matching using Siamese Char CNN in Pytorch

Context

Structure

About

Resources

Stars

Watchers

Forks

Languages