SPLADE representations on BEIR dataset #49

CosimoRulli · 2024-01-04T14:21:26Z

Hi,
thank you for sharing and maintaining this repo! I am willing to generate the SPLADE representations both for documents and queries for all the datasets in BEIR, similarly to what it is possible to do with the create_anserini script for the MSMARCO dataset. I would like to do it both for splade-cocondenser-ensembledistil and efficient-splade-V-large.

I tried to run the following script,

export PYTHONPATH=$PYTHONPATH:$(pwd)
export SPLADE_CONFIG_NAME="config_splade++_cocondenser_ensembledistil"

for dataset in arguana fiqa nfcorpus quora scidocs scifact trec-covid webis-touche2020 climate-fever dbpedia-entity fever hotpotqa nq
do
    python3 -m splade.beir_eval \
        config.pretrained_no_yamlconfig=true \
        +beir.dataset=$dataset \
        +beir.dataset_path=data/beir \
        config.index_retrieve_batch_size=100
done

but I get NDCG=0.001 on the arguana dataset (then, I stopped the script because I guess that there is something wrong). What I am doing wrong? Also, does this script save the embeddings of each dataset? If not, how can I force it to save them?

The text was updated successfully, but these errors were encountered:

thibault-formal · 2024-01-29T10:00:34Z

Hi @CosimoRulli

Sorry for the late reply! I think the issue is due to not correctly loading the model ckpt. From the README, if you only want to evaluate the model from existing checkpoints, you should add the init line and run:

python3 -m splade.beir_eval \
       init_dict.model_type_or_dir=naver/splade-cocondenser-ensembledistil \
       config.pretrained_no_yamlconfig=true \
       +beir.dataset=$dataset \
       +beir.dataset_path=data/beir \
       config.index_retrieve_batch_size=100

Let me know if that works!
Best

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPLADE representations on BEIR dataset #49

SPLADE representations on BEIR dataset #49

CosimoRulli commented Jan 4, 2024

thibault-formal commented Jan 29, 2024

SPLADE representations on BEIR dataset #49

SPLADE representations on BEIR dataset #49

Comments

CosimoRulli commented Jan 4, 2024

thibault-formal commented Jan 29, 2024