Introduced in LRVS-Fashion: Extending Visual Search with Referring Instructions
Simon Lepage — Jérémie Mary — David Picard
Useful Links
Full Dataset —
Test set —
Leaderboard
- Install the
lrvsf_benchmark
library from this repository.
pip install git+https://github.com/Simon-Lepage/LRVSF-Benchmark.git
- Install any other library you might need (for example your favorite versions of
transformers
,torchvision
, ...).
We provide a version of the test set on zenodo.org. It contains :
products.parquet
: query and target images for each product, alongside with categorical and textual conditioning. The images were downloaded with img2dataset, and are stored in JPG format with shortest side resized to 256, keeping the original aspect ratio.distractors_urls.parquet
: URLs of the 2M test distractors. We also provide a script using img2dataset to easily download the pictures.
The evaluation code expects this data to be structured in the following way, with distractors containing as many parquet
files as you see fit. We recommend 10k-100k distractors per parquet file. Avoid too large files, as they will be entirely loaded into memory.
data/
└ products.parquet
└ distractors/
└ *.parquet
└ ...
Models should implement the following interface, defining encode_queries
, encode_targets
and topk
.
The encoding functions receive lists of items gathered from the parquet files (list of PIL.Image
and str
). It is your responsibility to define your own DataLoader to apply any preprocessing steps. These functions should return the embeddings as a torch.Tensor
. You can use torch.nn.DataParallel
to use multiple GPUs.
The topk
function will be called with the embeddings produced previously. For each query embedding in q_emb
, it must return the top-k indices sorting the target embeddings t_embs
by similarity. A simple implementation could use faiss
or torch.topk(...).indices
.
class MyModel(lrvsf_benchmark.LRVSFModel):
def __init__(self, ...):
super().__init__()
pass
def encode_queries(self, imgs, conds):
pass
def encode_targets(self, imgs):
pass
def topk(self, q_embs, t_embs, topk):
pass
Pass this model to lrvsf_benchmark.LRVSF
class. It will generate the embeddings, and repeatedly call topk
on bootstrapped subsets to compute the results.
evaluator = LRVSF(...)
Mandatory arguments:
data_root
: the root of the test set.data/
in the previous example.conditioning
: eithercategory
ortext
.
Optional arguments:
dist_img_col
(str): the name of the column containing the images in your parquet files. Default: "jpg"aggregate_batches
(int): controls the number of samples to aggregate before calling the encoding methods. Useful if you have small parquet files, as spawning a DataLoader for few images is inefficient. Default: 100_000fragments_readahead
(int): Passed topyarrow.dataset.dataset
. Controls the number of parquet files loaded in memory at the same time. Small for large parquet files, large for small parquet files. Default: 2dev_run
(bool): If set to True, will run the evaluation on a subset of the distractors. Default: False
Finally, run the evaluator to compute the metrics and export them as YAML:
evaluator.run(my_model, "output_filename.yaml")
Copy the content of the produced YAML output to the header of your README.md
file. You should see the metrics appear in the model card. Refresh the Leaderboard.
Please refer to
examples/condvit_b16_cat.py
for an example of evaluation script. You can also look at the README.md file of our CondViT-B16 for a YAML formatting reference.
To cite our work, please use the following BibTeX entry :
@article{lepage2023lrvsf,
title={LRVS-Fashion: Extending Visual Search with Referring Instructions},
author={Lepage, Simon and Mary, Jérémie and Picard, David},
journal={arXiv:2306.02928},
year={2023}
}