Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The length of text that the text encoder can handle #40

Open
song-wensong opened this issue Feb 22, 2024 · 1 comment
Open

The length of text that the text encoder can handle #40

song-wensong opened this issue Feb 22, 2024 · 1 comment

Comments

@song-wensong
Copy link

import torch
from languagebind import LanguageBindVideo, LanguageBindVideoTokenizer, LanguageBindVideoProcessor

pretrained_ckpt = 'LanguageBind/LanguageBind_Video_FT'  # also 'LanguageBind/LanguageBind_Video'
model = LanguageBindVideo.from_pretrained(pretrained_ckpt, cache_dir='./cache_dir')
tokenizer = LanguageBindVideoTokenizer.from_pretrained(pretrained_ckpt, cache_dir='./cache_dir')
video_process = LanguageBindVideoProcessor(model.config, tokenizer)

model.eval()
data = video_process(["your/video.mp4"], ['your text.'], return_tensors='pt')
with torch.no_grad():
    out = model(**data)

print(out.text_embeds @ out.image_embeds.T)

In this code, what is the maximum length of your text? If it exceeds 77, will it be truncated directly?

@LinB203
Copy link
Member

LinB203 commented Feb 23, 2024

Maximum text tokens is 77. If it exceeds 77, it will be truncated directly. This method we just follow CLIP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants