- Install Python libraries:
numpy
,scikit-learn
,pandas
. - Use data from the https://www.kaggle.com/datasets/prajitdatta/movielens-100k-dataset.
The data used includes user information (age, gender, occupation),movies(Title,Genre) and their ratings for movies. This data is split into training(ua.base) and test(ua.test) sets.
- Content Filtering :Suggest items based on the user's profile or based on the content/attributes of items similar to items the user has selected in the past.
- Collaborative Filtering: Suggest items based on similarity between users and/or items. It can be understood that this is a way to suggest a user based on users with similar behavior.
-
Content Filtering:
-
I created a vector representation for each movie using TF- IDF (item profiles).
-
I trained a ridge regression model for each user to learn the weights(user profiles).
-
I used item profiles and user profiles to predict and recommend movie ratings.
-
-
Collaborative Filtering:
-
I utilized two approaches: item-item and user-user.
-
I calculated cosine similarity between items or users.
-
I implemented a KNN model by selecting K similar users/items to predict rating scores.
-
-
Hybrid between collaborative filtering and content filtering
-
After predicting the rating in the test set, I combined the predicted rating in the two algorithm
-
I reevaluated using the RMSE measure
- Programming Language: Python
- Main Libraries: NumPy, scikit-learn,pandas
- Model: Ridge Regression, TF-IDF Transformer,KNN User-User,KNN Item-Item
- Utilize Root Mean Squared Error (RMSE) to assess the accuracy of the model on the test set.