Item Similarity: content-based, schema-less recommendation service

A simple recommendation service which computes the similarity of items.

Since this is part of my ongoing MSc project, README will be improved by October.

Concept

Similarity Computation

The similarity between two items is computed as follows:

Given the following two JSON documents:

a = {
    "brand": "Addi",
    "model": "Speedy",
    "colors": ["black", "white"],
    "category": "Shoes",
    "size": 42
}
b = {
    "brand": "Prima",
    "model": "Kazak",
    "colors": ["red", "white"],
    "category": "Sweater",
    "sleeves": "long"
}

First, any item features which are not in both documents are discared:

a = {
    "brand": "Addi",
    "model": "Speedy",
    "colors": "black,white",
    "category": "Shoes",
}
b = {
    "brand": "Prima",
    "model": "Kazak",
    "colors": "red,white",
    "category": "Sweater",
}

Second, the documents are converted into lists with the keys as a prefix to the values:

a = ["brand_Addi", "model_Ayak", "colors_black", "colors_white", "category_Shoes"]
b = ["brand_Addi", "model_Kazak", "colors_red", "colors_white", "category_Sweater"]

Finally, the variant of the tanimoto coefficient is calculated:

nA = number of features in A
nB = number of features in B
nAB = number of intersecting features
score = nAB / (nA + nB - nAB)

Similarity index

The index is kept in a MongoDB collection with a document for each feature. This document also keeps track of its similarity score against other documents. Every time a new record is processed, the similarity to other documents is computed and stored. This score is then added to the other document as well. Thus when a similarity score is requested for a document, the end result is already pre-computed.

API

The index is managed by POST and DELETE requests. The score is fetched via GET.

The route prefix {index} allows maintaining more than one index within an instance.

POST /{index} Posts a document to the index and calculates the similarity score

DELETE /{index} Deletes a document

GET /{index}?itemIds=1,2,3 Returns similar items for the items in the GET parameter.

Installation

$ git clone https://github.com/halk/item-similarity
$ cd item-similarity
$ cp config/config.php.dist config/config.php

Please see recowise-vagrant for provisioning details.

Tests

$ cp phpunit.xml.dist phpunit.xml
$ phpunit

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
config		config
log		log
src		src
tests		tests
web		web
.gitignore		.gitignore
.scrutinizer.yml		.scrutinizer.yml
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
composer.json		composer.json
phpunit.xml.dist		phpunit.xml.dist

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

log

log

src

src

tests

tests

web

web

.gitignore

.gitignore

.scrutinizer.yml

.scrutinizer.yml

.travis.yml

.travis.yml

LICENSE

LICENSE

README.md

README.md

composer.json

composer.json

phpunit.xml.dist

phpunit.xml.dist

Repository files navigation

Item Similarity: content-based, schema-less recommendation service

Concept

Similarity Computation

Similarity index

API

Installation

Tests

About

Releases

Packages

Contributors 2

Languages

License

halk/item-similarity

Folders and files

Latest commit

History

Repository files navigation

Item Similarity: content-based, schema-less recommendation service

Concept

Similarity Computation

Similarity index

API

Installation

Tests

About

Resources

License

Stars

Watchers

Forks

Languages