Skip to content

CollaborativeFiltering

rcurtin edited this page Dec 31, 2014 · 1 revision

Collaborative Filtering

Project Details

Student: Mudit Raj Gupta

E-mail: mudit.raaj.gupta@gmail.com

Project Overview: The project aims at developing a collaborative filtering(CF) framework for mlpack with a flexible API. This page is a collection of notes on implementation and usage of the CF framework.

Introduction

Recommendation systems try to recommend items (movies, music, webpages, products, etc) to interested potential customers, based on the information available about user, product and user preferences or ratings. Recommendation systems can be further segmented into two classes - content based systems and collaborative filtering based systems. Content-based approaches analyse the content (e.g., texts, meta-data, features) of the items to identify related items, while collaborative filtering uses the overall behaviour or taste of a large number of users to suggest relevant items to specific users.

Many algorithms for CF are proposed. Some of the known methods use Alternating-Least-Squares (ALS), Singular Value Decomposition (SVD), Stochastic gradient descent (SGD) etc. The project implements ALS-WR for collaborative filtering.

Implementation

The present implementation can be broken down into three components.

  • CF module
  • Optimizer
  • Neighbourhood Algorithm

The three components and their implementation is further discussed.

CF module

The collaborative filtering (cf) module is added in "src/methods". This module takes a comma separated values (csv) file which has (user,item,rating) values. The module further cleans the data to generates a sparsely populated User-Item table where each cell represents the rating that user has given to the item. Further, this matrix is fed to an optimizer to decompose the same to two separate "User" and "Item" matrices. The decomposed matrix is used to approximate missing values in the initial User-Item matrix. The module also remembers the initial items that the users rated. Further, the module uses a neighbourhood algorithm to generate the taste for Items in a User's neighbourhood and recommend items which were not rated by the user.

Optimizer

ALS

An ALS optimizer (als) is added in "src/core/optimizers" which performs decomposes the sparsely populated "User-Item-Preference" matrix into "User" and "Item" matrix. The code is a re-factored version of "src/methods/nmf" module with modifications. The modifications include extending support for sparse matrices, creating new namespaces, making it apt for the use as an optimizer and other minor changes to fit the use case.

QUIC SVD (To be added)

Another optimizer which is implemented along with ALS is QUIC SVD. A new approach "QUIC-SVD: Fast SVD Using Cosine Trees " was introduced by "Michael P. Holmes, Alexander G. Gray and Charles Lee Isbell, Jr.". The Singular Value Decomposition is a key operation in many machine learning methods. Its computational cost, however, makes it unscalable and impractical for applications involving large datasets or real-time responsiveness. In the above work the authors presented a way to for fast approximation of the whole-matrix SVD based on a new sampling mechanism called the cosine tree.

Neighbourhood Algorithm

Presently, allknn is used as a neighbourhood algorithm.

Data

Some of the input files are optional parameters. Similarly, some of the output files can be generated optionally.

Input

  • Input Data - User,Item,Rating file. A dummy file (gen.csv) is present in the attachments.
  • User File - The list of user for which recommendations are to be generated. A dummy file (qer.csv) is present in the attachments. (Optional)

Output

  • Recommendations - A csv file containing recommendations for queried users.
  • Ratings File - A csv file containing the whole ratings matrix. (Optional)
  • Users File - A csv file containing the User matrix. (Optional)
  • Items File - A csv file containing the User matrix. (Optional)

Command Line Interface

A command-line executable, cf, is provided as a part of the project to allow easy execution of the cf algorithm on data. Complete documentation of the executable can be found by typing {{{ $ cf --help }}}

Example

The following example demonstrates the use of CLI for cf

  • Ensure that you have mlpack with the modifications for cf module, build the mlpack code. A cf executable will be generated in "build/bin". This tutorial might be helpful.

  • Download 'gen.csv' and 'qer.csv' from the attachment section.

$ head -n 5 gen.csv
587,537,3
863,407,5
438,670,4
267,591,2
276,206,3
$ head qer.csv
1,2,3,4,5,6,7,8,9,10
  • You can checkout the documentation using the following command
$ cf --help
  • To generate recommendation, run the following command
$ cf -i gen.csv -q qer.csv -v

-i is for the input file and -q is used for queried users

  • This will generate the recommendations for 10 users (as mentioned in qer.csv)
$ head -n 5 recommendations.csv
443,290,270,506,77
935,65,601,598,503
935,65,601,598,503
443,290,270,506,77
935,65,601,598,503

The first row corresponds to set of items for the first user in the qer.csv file

Variations

  • Change the number of recommendations for each user. The following command generates 10 recommendations per user.
$ cf -i gen.csv -q qer.csv -r 10
  • Change the size of the neighbourhood. The following command changes the size of the neighbourhood to 7.
$ cf -i gen.csv -q qer.csv -n 7
  • Change the neighbourhood calculation algorithm
$ cf -i gen.csv -s knn

C++ Interface

The CF<> class provides a simple way to run cf using mlpack in C++. The default template parameters for CF<> will initialize the queried users to all the users in the data. By default, the algorithm chosen is ALS and for neighbourhood knn is used. The initial parameters of the algorithm are also fixed.

The simplest way to run it is to create a CF object with initial data as a parameter and use it to generate recommendations.

Example

The following is a simple code example which demonstrates the use of the CF class.

#include <mlpack/methods/cf/cf.hpp>

using namespace mlpack::cf;

// The user,item,preference dataset
extern arma::mat dataset;

//Set of queried users
extern arma::Col<size_t> users;

//Table to save recommendations
arma::mat recommendations;

//Create a CF object
CF<> c(dataset);

//Generating Recommendations, using default parameters
c.GetRecommendations(recommendations);

The recommendations are in present in the recommendations matrix.

This example can be further extended to have a better control over the parameters.