FuzzyWuzzy

Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

Always Case Sensitive ========= .. code:: bash >>> fuzz.ratio("this is a test", "this is a test!") 97 >>> fuzz.ratio("this is a test", "this is a TEST!") 69 >>> fuzz.ratio("this is a test".lower(), "this is a TEST!".lower()) 97

>>> fuzz.partial_ratio("this is a test", "this is a test!") 100 >>> fuzz.partial_ratio("this is a test", "this is a TEST!") 71 >>> fuzz.partial_ratio("this is a test".lower(), "this is a TEST!".lower()) 100

Requirements

Python 2.4 or higher
difflib
python-Levenshtein (optional, provides a 4-10x speedup in String Matching, though may result in differing results for certain cases)

Installation

Using PIP via PyPI

pip install fuzzywuzzy

or the following to install python-Levenshtein too

pip install fuzzywuzzy[speedup]

Using PIP via Github

pip install git+git://github.com/seatgeek/fuzzywuzzy.git@0.15.1#egg=fuzzywuzzy

Adding to your requirements.txt file (run pip install -r requirements.txt afterwards)

git+ssh://git@github.com/seatgeek/fuzzywuzzy.git@0.15.1#egg=fuzzywuzzy

Manually via GIT

git clone git://github.com/seatgeek/fuzzywuzzy.git fuzzywuzzy
cd fuzzywuzzy
python setup.py install

Usage

>>> from fuzzywuzzy import fuzz
>>> from fuzzywuzzy import process

Simple Ratio

>>> fuzz.ratio("this is a test", "this is a test!")
    97

Partial Ratio

>>> fuzz.partial_ratio("this is a test", "this is a test!")
    100

Token Sort Ratio

>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    91
>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    100

Token Set Ratio

>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    84
>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    100

Process

>>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
>>> process.extract("new york jets", choices, limit=2)
    [('New York Jets', 100), ('New York Giants', 78)]
>>> process.extractOne("cowboys", choices)
    ("Dallas Cowboys", 90)

You can also pass additional parameters to extractOne method to make it use a specific scorer. A typical use case is to match file paths:

>>> process.extractOne("System of a down - Hypnotize - Heroin", songs)
    ('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
>>> process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
    ("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)

Known Ports

FuzzyWuzzy is being ported to other languages too! Here are a few ports we know about:

Java: xpresso's fuzzywuzzy implementation
Java: fuzzywuzzy (java port)
Rust: fuzzyrusty (Rust port)
JavaScript: fuzzball.js (JavaScript port)

Name		Name	Last commit message	Last commit date
Latest commit History 343 Commits
data		data
fuzzywuzzy		fuzzywuzzy
.editorconfig		.editorconfig
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGES.rst		CHANGES.rst
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.rst		README.rst
benchmarks.py		benchmarks.py
release		release
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
test_fuzzywuzzy.py		test_fuzzywuzzy.py
test_fuzzywuzzy_hypothesis.py		test_fuzzywuzzy_hypothesis.py
test_fuzzywuzzy_pytest.py		test_fuzzywuzzy_pytest.py
tox.ini		tox.ini

License

ZhensongQian/fuzzywuzzy

Folders and files

Latest commit

History

Repository files navigation

FuzzyWuzzy

Requirements

Installation

Usage

Simple Ratio

Partial Ratio

Token Sort Ratio

Token Set Ratio

Process

Known Ports

About

Topics

Resources

License

Stars

Watchers

Forks

Languages