classy

What

Easy lightweight text classification via Naive Bayes. Give it a small set of example data, and it will classify similar inputs with blazing speed and pretty good accuracy.

Why

Making a basic chatbot? Want to do basic auto-suggestion of misspelled commands? Don't want to bust out TensorFlow or scikit-learn? Classy's got you covered.

How

Classify some data (probably by hand) into a dict:

data = {'lights':['Could you turn my lights off?',
                  'Turn my lights off',
                  'Are my lights off?',
                  'All lights off, please',
                  'Turn some lights on',
                  'Which bulbs are on?'],
        'alarm': ['Set an alarm for tomorrow at 6:00',
                  'What time is my alarm?',
                  'When will I wake up tomorrow?',
                  'What time is wakeup tomorrow?']}

Create the Classifier object:

import classy
c = classy.Classifier(data)

To classify text, simply use .classify:

c.classify('Which of my lights are off?')

{'lights': 0.9981515711645101, 'alarm': 0.0018484288354898338}

Advanced

If you wish to only receive a single classified label, rather than a full dict of probabilities, set the .threshold property. The .classify() method will then return the label of the matched class, or None if all probabilities are below the threshold (ie, if the classifier is uncertain). Behavior for .threshold values <= 0.5 is undefined.

c = classy.Classifier(data)
c.threshold = 0.9
c.classify('Which of my lights are off?')
c.classify('Some words we've never seen before')

'lights'
None

Classy by default performs minimal preprocessing of incoming text, equivalent to:

def parse(text):
    # makes all uppercase characters lowercase
    text = text.lower()

    # removes all except alphanumerics and spaces
    text = re.sub(r'[^a-z0-9 ]',r'',text)

    # splits by spaces, and discards all empty strings
    return [i for i in text.split(' ') if i != '']

If you wish to supply a custom string parsing function, simply provide it as the f argument when creating a Classifier object:

def newParse(t):
    return [i for i in t.split(',') if i != '']

c = classy.Classifier(data,f=newParse)

Tricks

Want a bare-bones spellchecker over a (very) limited set of inputs?

def allSubsets(text):
    temp = []
    for a in range(len(text)):
        for b in range(a,len(text)):
            temp.append(text[a:b+1])
    return temp

data = {'push':['push'],
        'commit':['commit'],
        'pull':['pull'],
        'diff':['diff']}

c = Classifier(data,f=allSubsets,threshold=0.6)

c.classify('commot')
c.classify('cpmmot')
c.classify('pulll')
c.classify('diffg')

'commit'
'commit'
'pull'
'diff'

The allSubsets function enumerates all substrings of its input, which does a reasonably decent job of spellchecking when paired with Naive Bayes. This usage should be considered a ~~cool trick~~ temporary hackjob, as there are many, many better ways to do this, for example Levenshtein distance.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
src		src
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

tests

tests

.gitignore

.gitignore

.travis.yml

.travis.yml

Pipfile

Pipfile

Pipfile.lock

Pipfile.lock

README.md

README.md

pytest.ini

pytest.ini

Repository files navigation

classy

What

Why

How

Advanced

Tricks

About

Releases

Packages

Contributors 2

Languages

coreygirard/classy

Folders and files

Latest commit

History

Repository files navigation

classy

What

Why

How

Advanced

Tricks

About

Topics

Resources

Stars

Watchers

Forks

Languages