[Question] `image_features` not matched to input text #11

sibosutd · 2024-04-08T11:31:25Z

LLaVA-UHD/llava_uhd/train/llava-uhd/adapt_llava.py

Lines 169 to 173 in 69e75d0

    
           if i < num_images: 
        
               for j in range(5): 
        
                   cur_image_features = image_features[cur_image_idx+j*16] 
        
                   cur_new_input_embeds.append(cur_image_features) 
        
                   cur_new_labels.append(torch.full((cur_image_features.shape[0],), IGNORE_INDEX, device=cur_labels.device, dtype=cur_labels.dtype))

In the code snippet above, I notice that the value of cur_image_idx doesn't change within a single batch. This implies that cur_image_features remain identical for images within the same batch, which seems unusual. Could you confirm if this is the intended behavior?
Another point of confusion I have pertains to the line for j in range(5): and the expression j*16. Based on the settings used in the Resampler, I would expect the image_features to have dimensions [batch_size*8, 64, 5120]. Can you clarify why the image features are selected using for j in range(5): and j*16?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] `image_features` not matched to input text #11

[Question] `image_features` not matched to input text #11

sibosutd commented Apr 8, 2024

[Question] image_features not matched to input text #11

[Question] image_features not matched to input text #11

Comments

sibosutd commented Apr 8, 2024

[Question] `image_features` not matched to input text #11

[Question] `image_features` not matched to input text #11