Skip to content

Anjali001/Information-Retrieval-Data-Mining

Repository files navigation

Information-Retrieval-Data-Mining

Coursework 1 : Without so much data everywhere, Information retrieval systems are very common nowadays. Efforts are being made to develop more accurate retrieval systems that give relevant results when a query is given. In this report, I'll mention how I approached the problem of making a information retrieval system that gives ranked results when a query is given. For this, I've used cosine similarity for vector space models and BM25 (probabilistic model). At last, I've made query-likelihood model using laplace smoothing, Lindstone correction, dirichlet smoothing and compared them.

Coursework 2: The basic process of ranking in Information Retrieval is: Documents are indexed and stored. The user query is used to get top-k documents for a specific query. These k documents are sent to a ranking model that has been trained on similar data with help of a learning algorithm. After ranking, the results are displayed to user on a results page in a specific order. In this report, I've used similarities based on query and document embedding to rank documents when certain query is given. These rankings have been compared using Normalized Discounted Cumulative Gain (NDCG) and mean Average Precision (mAP). I've used cosine similarity, document length and query length as features for Logistic regression, LambdaMART and Neural Network models.