Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] How to get candidate features after using pre-trained embedding tables? #1239

Open
hkristof03 opened this issue May 7, 2024 · 0 comments

Comments

@hkristof03
Copy link

❓ Questions & Help

Details

I am following the tutorial hereto include pre-computed embeddings when I train a Two Tower Retrieval model. Specifically, I am using this method to not to include the Embedding Table as part of the model:

loader = mm.Loader(
    train,
    batch_size=1024,
    transforms=[
        EmbeddingOperator(
            pretrained_movie_embs,
            lookup_key="movieId",
            embedding_name="pretrained_movie_embeddings",
        ),
    ],
)

I am trying to match this solution with the Retrieval Model tutorial here.

# Top-K evaluation
candidate_features = unique_rows_by_features(train, Tags.ITEM, Tags.ITEM_ID)
candidate_features.head()

topk = 20
topk_model = model.to_top_k_encoder(candidate_features, k=topk, batch_size=128)

# we can set `metrics` param in the `compile(), if we want
topk_model.compile(run_eagerly=False)

The problem is that loader.output_schema is different from loader.dataset.schema. The utility function unique_rows_by_features requires a dataset as the first argument, but passing loader.dataset doesn't work as this dataset doesn't contain the embedding vectors yet.

My question is, using the method to include pre-trained embeddings described above, how should one get the candidate_features, required by the Candidate Tower from the loader?

Thank you in advance if you take your time to answer!

@hkristof03 hkristof03 changed the title [QST] [QST] How to get candidate features after using pre-trained embedding tables? May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant