On Survivorship Bias in MS MARCO

Online appendix for Prashansa Gupta and Sean MacAvaney. On Survivorship Bias in MS MARCO. SIGIR 2022.

Section 3: Analysis of Discarded Queries

Can discarded queries by answered by recent ranking models?

Annotations for this analysis are provided in annotations/discarded_rankings_by_model.tsv. This file is a TSV file (including a header row) with columns for the scoring model, the query ID, relevant document ID (or _ if none found), and rank of document for the model.

You can generate the results presented in Table 2 based on the annotation file using the following command:

bash table_2.sh

Distribution of unanswered queries by type.

You can generate the results presented in Table 3 using the following command:

python table_3.py

(Note that if you have not yet downloaded the MS MARCO QnA dataset with the ir_datasets package, it will take a bit of time to download and prepare.)

Section 4: Survivorship Bias in Evaluation

Last-Metre Reproduction.

If you want to double-check the values in Table 4, you can run the following command:

$ bash table_4.sh
runs/ance.10.run.gz qrels/dev-1.qrels
RR@10	0.240
NumQ	8192.000
runs/ance.10.run.gz qrels/dev-2.qrels
RR@10	0.244
NumQ	15532.000
...

This uses the same qrels and run files we used.

Generating Simulated Qrels Files.

If you want to generate the simulated qrels files at different annotation depths, you can use the following script:

python simulate_qrels.py

(Note that if you have not yet downloaded the MS MARCO QnA dataset with the ir_datasets package, it will take a bit of time to download and prepare.)

Section 5: Survivorship Bias in Training

Last-Metre Reproduction.

If you want to double-check the values in Figure 1, you can run the following command:

$ bash figure_1.sh

Last-Mile Reproduction.

We release the trained models using simulated versions of MS MARCO on Huggingface, allowing the reproduction of our ranked lists.

These models are compatible with PyTerrier's MonoT5ReRanker:

from pyterrier_t5 import MonoT5ReRanker
reranker = MonoT5ReRanker(model='macavaney/monot5-base-msmarco-sim1')

To generate the ranked lists we use in the paper, use the following script:

$ bash figure_1_runs.sh

Appendix

The paper A Non-Factoid Question-Answering Taxonomy (Bolotova et al) was also published at SIGIR 2022.

As an analogue to Table 3, which demonstrated that different question types are more likely to survive in MS MARCO, we explore whether some question categories from the above taxonomy are more likely to survive in MS MARCO's annotation process than others.

The following table can be reproduced by running app_nfcats.py:

Category	Total	Discarded	%
FACTOID	55,216	24,883	45.1%
EVIDENCE-BASED	35,446	13,657	38.5%
INSTRUCTION	5,727	4,423	77.2%
REASON	2,487	1,429	57.5%
DEBATE	1,075	554	51.5%
NOT-A-QUESTION	478	223	46.7%
COMPARISON	430	215	50.0%
EXPERIENCE	234	131	56.0%

We find that INSTRUCTION questions are far more likely to be discarded (77%) than other question types and are therefore under-represented in MS MARCO. This might be because instruction lists are not well-captured by MS MARCO's passage technique, or that individual passages do not capture a suffcient amount of the instructions. Meanwhile, EVIDENCE-BASED questions are far less likley to be discarded (39%) and are over-represented.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
annotations		annotations
qrels		qrels
runs		runs
.gitignore		.gitignore
README.md		README.md
app_nfcats.py		app_nfcats.py
app_qidfiles.py		app_qidfiles.py
figure_1.sh		figure_1.sh
figure_1_runs.py		figure_1_runs.py
figure_1_runs.sh		figure_1_runs.sh
requirements.txt		requirements.txt
simulate_qrels.py		simulate_qrels.py
table_2.sh		table_2.sh
table_3.py		table_3.py
table_4.sh		table_4.sh

seanmacavaney/sigir2022-survivor

Folders and files

Latest commit

History

Repository files navigation

On Survivorship Bias in MS MARCO

Section 3: Analysis of Discarded Queries

Section 4: Survivorship Bias in Evaluation

Section 5: Survivorship Bias in Training

Appendix

About

Resources

Stars

Watchers

Forks

Languages