This repository contains a Python script for calculating precision, recall, F-measure, and R-precision for information retrieval tasks. The script processes relevant and retrieved document data to compute these metrics for each query.
- Read Relevant and Retrieved Data: Parse files containing relevant and retrieved documents for each query.
- Compute Metrics:
- Precision
- Recall
- F-Measure
- R-Precision
- Customizable Parameter: Specify the cutoff point (
Z
) for evaluating the top documents retrieved. - Output: Results are saved in a formatted table to a specified output file.
lab7.py
: Main Python script for processing input files and calculating metrics.rlv-ass
: Input file containing relevant documents for each query.NPL_tf_idf_rels.txt
: Input file containing retrieved documents and their scores.metrics_results.txt
: Output file containing the computed metrics in tabular format.
-
Load Relevant Documents: The script reads the
rlv-ass
file to create a mapping of relevant documents for each query. -
Load Retrieved Documents: The
NPL_tf_idf_rels.txt
file is parsed to retrieve the topZ
documents for each query. -
Compute Metrics: For each query, the script calculates:
- Precision: The ratio of relevant documents among the top
Z
retrieved documents. - Recall: The ratio of relevant documents retrieved to the total relevant documents.
- F-Measure: The harmonic mean of precision and recall.
- R-Precision: Precision when retrieving exactly
R
relevant documents.
- Precision: The ratio of relevant documents among the top
-
Output Results: Results are written to
metrics_results.txt
in a structured table:| Consulta | Precision | Recuerdo | Medida F | Precision R | |----------|-----------|----------|----------|-------------| | 1 | 0.7500 | 0.6000 | 0.6667 | 0.7500 |
-
Clone the repository:
git clone https://github.com/KPlanisphere/metrics-calculation-precision-recall.git cd metrics-calculation-precision-recall
-
Place the
rlv-ass
andNPL_tf_idf_rels.txt
files in the directory, or update the file paths inlab7.py
. -
Run the script:
python lab7.py
During execution, you will be prompted to enter the value of
Z
(cutoff point). -
Check the output in
metrics_results.txt
.
- Python 3.10+
- Ensure that the input files are correctly formatted. Each query's relevant and retrieved documents must follow the expected structure.
- The script includes error handling for empty queries or missing data.