Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about data processing in pretrain data. #434

Open
JJJYmmm opened this issue Mar 10, 2024 · 1 comment
Open

Question about data processing in pretrain data. #434

JJJYmmm opened this issue Mar 10, 2024 · 1 comment

Comments

@JJJYmmm
Copy link

JJJYmmm commented Mar 10, 2024

Why the following actions were taken. Is there anything special about cc12m I missed?

if type == 'caption' and dataset_name == 'cc12m':
target_item[:2] = self.src_dict.pad()
target_item[-1] = self.eos_item

Looking forward to your reply.

@JJJYmmm
Copy link
Author

JJJYmmm commented Mar 10, 2024

Another question.
When computing the loss AdjustLabelSmoothedCrossEntropyCriterion, sample_patch_num is added into the model input(sample[0], which I think is correspond to sample_v1, the vision-language data)

if self.sample_patch_num > 0:
sample[0]['net_input']['sample_patch_num'] = self.sample_patch_num

It seems that sample_patch_num can select fixed number of image features. So why it's just used in VL data?

if sample_patch_num is not None:
patch_orders = [
random.sample(range(image_num_patches), k=sample_patch_num)
for _ in range(patch_images.size(0))
]
patch_orders = torch.LongTensor(patch_orders).to(device)
image_embed = image_embed.gather(1, patch_orders.unsqueeze(
2).expand(-1, -1, image_embed.size(2)))
image_num_patches = sample_patch_num
image_padding_mask = image_padding_mask.gather(1, patch_orders)
image_position_ids = image_position_ids.gather(1, patch_orders)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant