Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPLADE representations on BEIR dataset #49

Open
CosimoRulli opened this issue Jan 4, 2024 · 1 comment
Open

SPLADE representations on BEIR dataset #49

CosimoRulli opened this issue Jan 4, 2024 · 1 comment

Comments

@CosimoRulli
Copy link

Hi,
thank you for sharing and maintaining this repo! I am willing to generate the SPLADE representations both for documents and queries for all the datasets in BEIR, similarly to what it is possible to do with the create_anserini script for the MSMARCO dataset. I would like to do it both for splade-cocondenser-ensembledistil and efficient-splade-V-large.

I tried to run the following script,

export PYTHONPATH=$PYTHONPATH:$(pwd)
export SPLADE_CONFIG_NAME="config_splade++_cocondenser_ensembledistil"

for dataset in arguana fiqa nfcorpus quora scidocs scifact trec-covid webis-touche2020 climate-fever dbpedia-entity fever hotpotqa nq
do
    python3 -m splade.beir_eval \
        config.pretrained_no_yamlconfig=true \
        +beir.dataset=$dataset \
        +beir.dataset_path=data/beir \
        config.index_retrieve_batch_size=100
done

but I get NDCG=0.001 on the arguana dataset (then, I stopped the script because I guess that there is something wrong). What I am doing wrong? Also, does this script save the embeddings of each dataset? If not, how can I force it to save them?

@thibault-formal
Copy link
Contributor

Hi @CosimoRulli

Sorry for the late reply! I think the issue is due to not correctly loading the model ckpt. From the README, if you only want to evaluate the model from existing checkpoints, you should add the init line and run:

python3 -m splade.beir_eval \
       init_dict.model_type_or_dir=naver/splade-cocondenser-ensembledistil \
       config.pretrained_no_yamlconfig=true \
       +beir.dataset=$dataset \
       +beir.dataset_path=data/beir \
       config.index_retrieve_batch_size=100

Let me know if that works!
Best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants