The length of text that the text encoder can handle #40

song-wensong · 2024-02-22T14:01:44Z

import torch
from languagebind import LanguageBindVideo, LanguageBindVideoTokenizer, LanguageBindVideoProcessor

pretrained_ckpt = 'LanguageBind/LanguageBind_Video_FT'  # also 'LanguageBind/LanguageBind_Video'
model = LanguageBindVideo.from_pretrained(pretrained_ckpt, cache_dir='./cache_dir')
tokenizer = LanguageBindVideoTokenizer.from_pretrained(pretrained_ckpt, cache_dir='./cache_dir')
video_process = LanguageBindVideoProcessor(model.config, tokenizer)

model.eval()
data = video_process(["your/video.mp4"], ['your text.'], return_tensors='pt')
with torch.no_grad():
    out = model(**data)

print(out.text_embeds @ out.image_embeds.T)

In this code, what is the maximum length of your text? If it exceeds 77, will it be truncated directly?

The text was updated successfully, but these errors were encountered:

LinB203 · 2024-02-23T02:06:49Z

Maximum text tokens is 77. If it exceeds 77, it will be truncated directly. This method we just follow CLIP.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The length of text that the text encoder can handle #40

The length of text that the text encoder can handle #40

song-wensong commented Feb 22, 2024

LinB203 commented Feb 23, 2024

The length of text that the text encoder can handle #40

The length of text that the text encoder can handle #40

Comments

song-wensong commented Feb 22, 2024

LinB203 commented Feb 23, 2024