Skip to content

AndreHeunisML/Deep-String-Matching

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Matching using Siamese Char CNN in Pytorch

Context

Inspired by these papers

this is a system that takes a set of input characters (eg an employee name, a bank transaction text) and returns the most likely match from a set of options (eg a database of employees / company names).

Structure

The system is trained in a siamese fashion, trying to minimise the triplet loss between matching strings. The idea is to first pretrain a model on a large set of different but related strings (misspelled words, lemmatisation, etc) and then adapt it for use on whatever data is available for the use case.

It is currently set up for use with lower-case letters and spaces. Sequences of numbers are replaced by a single 0.

About

Siamese Char CNN

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages