Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up Dino with DALI #5431

Open
1 task done
jpfeil opened this issue Apr 16, 2024 · 3 comments
Open
1 task done

Speed up Dino with DALI #5431

jpfeil opened this issue Apr 16, 2024 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@jpfeil
Copy link

jpfeil commented Apr 16, 2024

Describe the question.

Hello,

I'm trying to speed up the image augmentations in DINO with DALI. I'm trying to start with a naive solution which is just calling the augmentations from DALI. I'll optimize it once I get this running, but I can't get this to build. I'm getting this error:

TypeError: Illegal pipeline output type. The output 0 contains a nested `DataNode`. Missing list[/tuple](http://localhost:8888/tuple) expansion (*) is the likely cause.

Can you please help me find the issue with this code?

from torchvision import transforms
import nvidia.dali.plugin.pytorch as dalitorch
from nvidia.dali import pipeline_def
import nvidia.dali.fn as fn
import nvidia.dali.types as types
import matplotlib.gridspec as gridspec
import matplotlib.pyplot as plt
from nvidia.dali.backend import TensorListGPU
import numpy as np
from PIL import Image


flip_and_color_jitter = transforms.Compose([
            transforms.RandomHorizontalFlip(p=0.5),
            transforms.RandomApply(
                [transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.2, hue=0.1)],
                p=0.8
            ),
            transforms.RandomGrayscale(p=0.2),
        ])



normalize = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
        ])

def global_transform1(images):
    global_crops_scale=1.0
    func = transforms.Compose([
            transforms.RandomResizedCrop(224, scale=global_crops_scale, interpolation=Image.BICUBIC),
            flip_and_color_jitter,
            utils.GaussianBlur(1.0),
            normalize,
        ])
    return func(images)


def global_transform2(images):
    global_crops_scale=1.0
    func = transforms.Compose([
            transforms.RandomResizedCrop(224, scale=global_crops_scale, interpolation=Image.BICUBIC),
            flip_and_color_jitter,
            utils.GaussianBlur(0.1),
            utils.Solarization(0.2),
            normalize,
        ])
    return func(images)


def local_transform(images):
    local_crops_scale=0.5
    func = transforms.Compose([
            transforms.RandomResizedCrop(96, scale=local_crops_scale, interpolation=Image.BICUBIC),
            flip_and_color_jitter,
            utils.GaussianBlur(p=0.5),
            normalize,
        ])
    return func(images)


@pipeline_def(seed=seed, enable_conditionals=True)
def dino_dali_pipeline(image_dir, local_crops_number=8, device="mixed"):
    
    jpegs, labels = fn.readers.file(file_root=image_dir)

    decoded_jpegs = fn.decoders.image(jpegs, device=device, output_type=types.RGB)

    crops = []

    crops.append(dalitorch.fn.torch_python_function(decoded_jpegs, function=global_transform1, num_outputs=1))

    crops.append(dalitorch.fn.torch_python_function(decoded_jpegs, function=global_transform2, num_outputs=1))

    for _ in range(local_crops_number):
        crops.append(dalitorch.fn.torch_python_function(decoded_jpegs, function=local_transform, num_outputs=1))

    return crops, labels

Check for duplicates

  • I have searched the open bugs/issues and have found no duplicates for this bug report
@jpfeil jpfeil added the question Further information is requested label Apr 16, 2024
@JanuszL
Copy link
Contributor

JanuszL commented Apr 17, 2024

Hi @jpfeil,

Thank you for reaching out.
In your code crops is a list while DALI expects outputs to be plain types. So in your case, you need to either unpack the list:
return *crops, labels
or concatenate all into one tensor using cat or stack operators (as long as the tensors have uniform shapes samplewise).

@jpfeil
Copy link
Author

jpfeil commented Apr 17, 2024

Thank you, @JanuszL!

I've implemented the pipeline with your suggestions, but now I'm having issues with the iterator.

class Solarize:
    def __init__(self, threshold: int = 128) -> None:
        self._threshold = threshold

    def __call__(self, img):
        inverted_img = 255 - img
        mask = img >= self._threshold
        return mask * inverted_img + (True ^ mask) * img

solarize = Solarize()
            
@pipeline_def(seed=seed, batch_size=1, enable_conditionals=True)
def dino_dali_pipeline(image_dir, local_crops_number=8, device="mixed"):
    
    jpegs, _ = fn.readers.file(file_root=image_dir, random_shuffle=True)

    decoded_jpegs = fn.decoders.image(jpegs, device=device)

    cropped_jpegs = fn.crop(decoded_jpegs, crop=(16384, 16384))
    
    #
    # Global Transform 1
    #
    gt1 = fn.random_resized_crop(cropped_jpegs, size=224, random_aspect_ratio=(1.0, 1.0))

    ## Random Horizontal Flip
    coin = fn.random.coin_flip()
    if coin:
        gt1 = fn.flip(gt1, horizontal=1)

    ## Color Jitter
    coin = fn.random.coin_flip(probability=0.8)
    if coin:
        gt1 = fn.color_twist(gt1, brightness=0.4, contrast=0.4, saturation=0.2, hue=0.1)

    ## Random Grayscale
    coin = fn.random.coin_flip(probability=0.2)
    if coin:
        gt1 = fn.color_space_conversion(gt1, image_type=types.RGB, output_type=types.GRAY)

    ## Gaussian Blur
    gt1 = fn.gaussian_blur(gt1, sigma=(0.1, 2.0))

    ## Normalize 
    gt1 = fn.normalize(gt1)

    #
    # Global Transform 2
    #
    gt2 = fn.random_resized_crop(cropped_jpegs, size=224, random_aspect_ratio=(1.0, 1.0))

    ## Random Horizontal Flip
    coin = fn.random.coin_flip()
    if coin:
        gt2 = fn.flip(gt2, horizontal=1)

    ## Color Jitter
    coin = fn.random.coin_flip(probability=0.8)
    if coin:
        gt2 = fn.color_twist(gt2, brightness=0.4, contrast=0.4, saturation=0.2, hue=0.1)

    ## Random Grayscale
    coin = fn.random.coin_flip(probability=0.2)
    if coin:
        gt2 = fn.color_space_conversion(gt2, image_type=types.RGB, output_type=types.GRAY)

    ## Gaussian Blur
    coin = fn.random.coin_flip(probability=0.1)
    if coin:
        gt2 = fn.gaussian_blur(gt2, sigma=(0.1, 2.0))

    ## Solarize    
    coin = fn.random.coin_flip(probability=0.1)
    if coin:
        gt2 = fn.cast(solarize(gt2), dtype=types.UINT8)

    gt2 = fn.normalize(gt2)


    #
    # Local Transformations
    #
    
    crops = [gt1, gt2]

    for _ in range(local_crops_number):
        lt = fn.random_resized_crop(cropped_jpegs, size=96, random_aspect_ratio=(0.5, 0.5))

        ## Random Horizontal Flip
        coin = fn.random.coin_flip()
        if coin:
            lt = fn.flip(lt, horizontal=1)
    
        ## Color Jitter
        coin = fn.random.coin_flip(probability=0.8)
        if coin:
            lt = fn.color_twist(lt, brightness=0.4, contrast=0.4, saturation=0.2, hue=0.1)
    
        ## Random Grayscale
        coin = fn.random.coin_flip(probability=0.2)
        if coin:
            gt1 = fn.color_space_conversion(lt, image_type=types.RGB, output_type=types.GRAY)
    
        ## Gaussian Blur
        coin = fn.random.coin_flip(probability=0.5)
        if coin:
            lt = fn.gaussian_blur(lt, sigma=(0.1, 2.0))
    
        ## Normalize 
        lt = fn.normalize(lt)
    
    return *crops, 

This works and I can get augmented images out of it. The only issue is that I'm used to the pytorch representation using floats whereas in DALI it usually represents the image as uint8, but hopefully that doesn't influence training that much.

The problem I have now is passing the pipeline to the iterator:

pipe = dino_dali_pipeline(image_dir, batch_size=4, num_threads=4, device_id=0)
pipe.build()

iterator = DALIGenericIterator(
    pipelines=pipe,
    output_map=["gt1", "gt2", "lt1", "lt2", "lt3", "lt4", "lt5", "lt6", "lt7", "lt8"],
)

for i, (batch,) in enumerate(iterator):
    print(batch)
    break

This will run for a little while and then it throws this error:

RuntimeError: [[/opt/dali/dali/pipeline/data/tensor_list.cc:1012](http://localhost:8888/opt/dali/dali/pipeline/data/tensor_list.cc#line=1011)] Assert on "IsDenseTensor()" failed: The batch must be representable as a tensor - it must have uniform shape and be allocated in contiguous memory.
Stacktrace (88 entries):

This seems to be related to the global and local crops being different sizes. Is there a way to support this kind of data in DALI?

Thanks!

@JanuszL
Copy link
Contributor

JanuszL commented Apr 18, 2024

Hi @jpfeil,

The only issue is that I'm used to the pytorch representation using floats whereas in DALI it usually represents the image as uint8

You can use the crop_mirror_normalize operator and pass float as the output type, and 255 as the std to scale from uin8 to 0-1 float.

This seems to be related to the global and local crops being different sizes. Is there a way to support this kind of data in DALI?

Yes, the iterator expects that the batch of samples can be represented as the tensor where one of the dimensions is the batch size. In this case, you can either pad samples to have them uniform or try out PyTorch DALIRaggedIterator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants