vespa 🛵

Document Relevancy Ranking and Similarity Scoring using Vector Space Model.

Supporting all modes described here.

Installation

To install directly from github, run:

pip install git+ssh://git@github.com/mauricesvp/vespa.git
# or
pip install git+https://git@github.com/mauricesvp/vespa.git

To install from source:

git clone git@github.com:mauricesvp/vespa.git
# or
git clone https://github.com/mauricesvp/vespa.git

cd vespa
pip install .

Usage

from vespa import Vespa

corpus = ["Example document."]  # corpus: list of documents (strings)
vsm = Vespa(corpus)

results = vsm.score("Example query")
# > (0.7071067811865475, 'Example document.')

results = vsm.k_score("Example query", k=1)
# > [(0.7071067811865475, 'Example document.')]

The default mode is lnc.ltc, which means lnc is applied to each corpus document, and ltc to each query document. You can either supply a different mode when initializing, or to k_score or score directly (this will change the mode for subsequent calls).

If you want to get the score of a specific document, you can use the additional document argument for score:

results = vsm.score(query="Your query", document="Some document in corpus")

Documents can be added to the corpus:

vsm.add("some new document")  # str or list of str

or the corpus can be rebuilt, removing all previous entries:

vsm.corpus(new_corpus)  # str or list of str

Modes

All available modes are noted below (more details).

	Term frequency		Document frequency		Document length normalization
b	Binary weight	n	Disregards the collection frequency	n	No document length normalization
n	Raw term frequency	f	Inverse collection frequency	c	Cosine normalization
a	Augmented normalized frequency	t	Inverse collection frequency	u	Pivoted unique normalization
l	Logarithm	p	Probabilistic inverse collection frequency	b	Pivoted characted length normalization
L	Average-term-frequency-based normalization
d	Double logarithm

Limitations

Vespa does not feature:

Lemmatization and Stemming
Stopword filtering
Spelling correction
Any kind of machine learning

Background

For further reading, please reference:

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
tests		tests
vespa		vespa
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
dev-requirements.txt		dev-requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests

tests

vespa

vespa

.gitignore

.gitignore

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

dev-requirements.txt

dev-requirements.txt

setup.cfg

setup.cfg

setup.py

setup.py

Repository files navigation

vespa 🛵

Installation

Usage

Modes

Limitations

Background

About

Languages

License

mauricesvp/vespa

Folders and files

Latest commit

History

Repository files navigation

vespa 🛵

Installation

Usage

Modes

Limitations

Background

About

Topics

Resources

License

Stars

Watchers

Forks

Languages