Skip to content


Repository files navigation

LAION RVS Fashion - Benchmark

Introduced in LRVS-Fashion: Extending Visual Search with Referring Instructions

Simon LepageJérémie MaryDavid Picard


GitHub release Apache License

Useful Links
Full DatasetTest setLeaderboard


  1. Install the lrvsf_benchmark library from this repository.
pip install git+
  1. Install any other library you might need (for example your favorite versions of transformers, torchvision, ...).


Dataset Preparation 📁

We provide a version of the test set on It contains :

  • products.parquet : query and target images for each product, alongside with categorical and textual conditioning. The images were downloaded with img2dataset, and are stored in JPG format with shortest side resized to 256, keeping the original aspect ratio.
  • distractors_urls.parquet : URLs of the 2M test distractors. We also provide a script using img2dataset to easily download the pictures.

The evaluation code expects this data to be structured in the following way, with distractors containing as many parquet files as you see fit. We recommend 10k-100k distractors per parquet file. Avoid too large files, as they will be entirely loaded into memory.

└ products.parquet
└ distractors/
    └ *.parquet
    └ ...

Model Evaluation ⚙️

1. Wrap your model

Models should implement the following interface, defining encode_queries, encode_targets and topk.

The encoding functions receive lists of items gathered from the parquet files (list of PIL.Image and str). It is your responsibility to define your own DataLoader to apply any preprocessing steps. These functions should return the embeddings as a torch.Tensor. You can use torch.nn.DataParallel to use multiple GPUs.

The topk function will be called with the embeddings produced previously. For each query embedding in q_emb, it must return the top-k indices sorting the target embeddings t_embs by similarity. A simple implementation could use faiss or torch.topk(...).indices.

class MyModel(lrvsf_benchmark.LRVSFModel):
    def __init__(self, ...):

    def encode_queries(self, imgs, conds):

    def encode_targets(self, imgs):

    def topk(self, q_embs, t_embs, topk):

2. Evaluate your model

Pass this model to lrvsf_benchmark.LRVSF class. It will generate the embeddings, and repeatedly call topk on bootstrapped subsets to compute the results.

evaluator = LRVSF(...)

Mandatory arguments:

  • data_root : the root of the test set. data/ in the previous example.
  • conditioning : either category or text.

Optional arguments:

  • dist_img_col (str): the name of the column containing the images in your parquet files. Default: "jpg"
  • aggregate_batches (int): controls the number of samples to aggregate before calling the encoding methods. Useful if you have small parquet files, as spawning a DataLoader for few images is inefficient. Default: 100_000
  • fragments_readahead (int): Passed to pyarrow.dataset.dataset. Controls the number of parquet files loaded in memory at the same time. Small for large parquet files, large for small parquet files. Default: 2
  • dev_run (bool): If set to True, will run the evaluation on a subset of the distractors. Default: False

Finally, run the evaluator to compute the metrics and export them as YAML:, "output_filename.yaml")

3. Update your model card

Copy the content of the produced YAML output to the header of your file. You should see the metrics appear in the model card. Refresh the Leaderboard.

Please refer to examples/ for an example of evaluation script. You can also look at the file of our CondViT-B16 for a YAML formatting reference.


To cite our work, please use the following BibTeX entry :

  title={LRVS-Fashion: Extending Visual Search with Referring Instructions},
  author={Lepage, Simon and Mary, Jérémie and Picard, David},


No description, website, or topics provided.







No releases published


No packages published