Python Recommender system

A simple recommender system in Python.

Implemented Algorithms

Implemented algorithms are:
- ItemKNN
- UserKNN
- ItemAverage
- UserAverage
- UserItemAverage
- GlobalAverage
Similarity meseaurs:
- Item based nearest neighbor
  - Cosine
- User based nearest neighbor
  - Pearson

Test

The dataset comes in 5 folds. So without chaning how it looks, we ues it to perform a 5-fold cross-validation.

To meseaure the quality of the predictions we use:
- Prediction Coverage
- Mean absolute error MAE

Results

To see a set of results on a run check here.

How to use it:

Download

Clone the project using:

git clone https://github.com/ravexina/python-recommender-system.git

Or downloading it from here:

wget https://github.com/ravexina/python-recommender-system/archive/master.zip

Dataset

I wrote this specifically to work with movielens 100k dataset. You can get it from here:
http://files.grouplens.org/datasets/movielens/ml-100k.zip

Here is another link from archive.org if the link above does not work for you or it is unavailable:
https://web.archive.org/web/*/http://files.grouplens.org/datasets/movielens/ml-100k.zip

Extract the content of ml-100k within the zip file into the ./dataset directory. Install the necessary dependencies and you are good to go.

Dependencies

The libraries I've used in this project are mostly embedded in Python. The only ones you have to install are:

Numpy
Pandas

Install Pandas and Numpy will be installed as a dependency of Pandas:

Using pip

pip install --user pandas

Using pipenv

pipenv install pandas

Run the project

Run the main.py and you should start getting the results.

python3 main.py

or

pipenv run python3 main.py

Improve the runtime

As you might know, similiraty matrices takes some to calculate. Once you run the project it stores calculated matrices in form of pickles in ./pickles directory.

In the file Algorithms/ItemKNN.py and Algorithms/UserKNN.py there are two lines, which you can set an argument named load_matrices to true so next time you run the project it does not tries to recalculate the similarity matrices and uses the old one.

# UserKNN
cosine = Cosine(self.ratings, load_matrices=False, save_matrices=True, fold_id=self.fold_id)

# ItemKNN.py
pearson = Pearson(self.ratings, load_matrices=False, save_matrices=True, fold_id=self.fold_id)

Download the pre-calculated pickles

Will be added soon.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Algorithms		Algorithms
Lib		Lib
dataset		dataset
pickles		pickles
results		results
.gitignore		.gitignore
Dataset.py		Dataset.py
Experiments.py		Experiments.py
LICENSE		LICENSE
PredictionTrial.py		PredictionTrial.py
PredictorBase.py		PredictorBase.py
README.md		README.md
Ratings.py		Ratings.py
main.py		main.py

License

ravexina/python-recommender-system

Folders and files

Latest commit

History

Repository files navigation

Python Recommender system

Implemented Algorithms

Test

Results

How to use it:

Download

Dataset

Dependencies

Run the project

Improve the runtime

Download the pre-calculated pickles

About

Topics

Resources

License

Stars

Watchers

Forks

Languages