Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract motion vectors #5363

Open
1 task done
rvandeghen opened this issue Mar 8, 2024 · 7 comments
Open
1 task done

Extract motion vectors #5363

rvandeghen opened this issue Mar 8, 2024 · 7 comments
Assignees
Labels
question Further information is requested

Comments

@rvandeghen
Copy link

Describe the question.

Hello,

I was wondering if there is a way to obtain the motion vectors you compute when you decode a video on GPU?

Thanks

Check for duplicates

  • I have searched the open bugs/issues and have found no duplicates for this bug report
@rvandeghen rvandeghen added the question Further information is requested label Mar 8, 2024
@JanuszL
Copy link
Contributor

JanuszL commented Mar 8, 2024

Hi @rvandeghen,

Thank you for reaching out. As far as I understand NVDEC doesn't expose this info and it is impossible to do that in DALI. What you can do instead is use the 'optical flow' operator.

@rvandeghen
Copy link
Author

@JanuszL thanks for the reply.

Do you know how to return both list of frames and list of OF ? I have an error which I guess comes from the fact that len(frames) = len(OF) + 1, thus the shapes mismatch.

The code I use is the following:

@pipeline_def
def create_video_reader_pipeline(sequence_length, files, crop_size, stride=1, shard_id=0, num_shards=1, seed=0):
    images = fn.readers.video(device="gpu",
                              filenames=files,
                              sequence_length=sequence_length,
                              normalized=False,
                              random_shuffle=False,
                              image_type=types.RGB,
                              dtype=types.UINT8,
                              initial_fill=16,
                              prefetch_queue_depth=2,
                              pad_last_batch=True,
                              name="Reader",
                              stride=stride,
                              enable_frame_num=False,
                              shard_id=shard_id,
                              num_shards=num_shards,
                              seed=seed,
                             )
    
    of = fn.optical_flow(images, output_grid=1)
    

    images = fn.crop_mirror_normalize(images,
                                      dtype=types.FLOAT,
                                      output_layout="FCHW",
                                      mean=[0.279*255, 0.452*255, 0.378*255],
                                      std=[0.188*255, 0.188*255, 0.171*255],
                                      mirror=False,#fn.random.coin_flip(),
                                      seed=seed
                                     )

    return images, of

class VideoDataset(pytorch.DALIGenericIterator):
    def __init__(self, *kargs, **kvargs):
        super().__init__(*kargs, **kvargs)

    def __next__(self):
        out, of = super().__next__()
        # DDP is used so only one pipeline per process
        # also we need to transform dict returned by DALIClassificationIterator to iterable
        # and squeeze the lables
        out = out[0]["data"]
        of = of[0]["data]

        B, F, C, H, W = out.size()
        out = out.view(B*F, C, H, W)
        return out, of

device_id = 0
shard_id = 0
num_shards = 1
batch_size = 1
sequence_length = 10


crop_size=(224, 224)
stride=5

pipeline = create_video_reader_pipeline(batch_size=batch_size,
                                        sequence_length=sequence_length,
                                        num_threads=10,
                                        device_id=device_id,
                                        shard_id=shard_id,
                                        num_shards=num_shards,
                                        files=container_files,
                                        crop_size=crop_size,
                                        stride=stride,
                                        )

train_loader = VideoDataset(pipeline,
                            ["data"],
                            reader_name="Reader",
                            auto_reset=True,
                            last_batch_policy=pytorch.LastBatchPolicy.FILL
                            )

Error:

IndexError                                Traceback (most recent call last)
Cell In[40], line 22
      9 stride=5
     11 pipeline = create_video_reader_pipeline(batch_size=batch_size,
     12                                         sequence_length=sequence_length,
     13                                         num_threads=10,
   (...)
     19                                         stride=stride,
     20                                         )
---> 22 train_loader = VideoDataset(pipeline,
     23                             ["data"],
     24                             reader_name="Reader",
     25                             auto_reset=True,
     26                             last_batch_policy=pytorch.LastBatchPolicy.FILL
     27                             )

Cell In[39], line 37, in VideoDataset.__init__(self, *kargs, **kvargs)
     36 def __init__(self, *kargs, **kvargs):
---> 37     super().__init__(*kargs, **kvargs)

File ~/micromamba/envs/sn_mae/lib/python3.10/site-packages/nvidia/dali/plugin/pytorch.py:194, in DALIGenericIterator.__init__(self, pipelines, output_map, size, reader_name, auto_reset, fill_last_batch, dynamic_shape, last_batch_padded, last_batch_policy, prepare_first_batch)
    192 if self._prepare_first_batch:
    193     try:
--> 194         self._first_batch = DALIGenericIterator.__next__(self)
    195         # call to `next` sets _ever_consumed to True but if we are just calling it from
    196         # here we should set if to False again
    197         self._ever_consumed = False

File ~/micromamba/envs/sn_mae/lib/python3.10/site-packages/nvidia/dali/plugin/pytorch.py:220, in DALIGenericIterator.__next__(self)
    218 # segregate outputs into categories
    219 for j, out in enumerate(outputs[i]):
--> 220     category_outputs[self.output_map[j]] = out
    222 # Change DALI TensorLists into Tensors
    223 category_tensors = dict()

IndexError: list index out of range

@JanuszL
Copy link
Contributor

JanuszL commented Mar 8, 2024

Hi @rvandeghen,

I think your pipeline returns more than the iterator consumes. Can you try:

train_loader = VideoDataset(pipeline,
                            ["images", "of"],
                            reader_name="Reader",
                            auto_reset=True,
                            last_batch_policy=pytorch.LastBatchPolicy.FILL
                            )

@rvandeghen
Copy link
Author

Hi @JanuszL,

Indeed the optical flow gives good results at barely no extra cost. However, I found in the blogpost the following information and I would like to know if DALI exposes this buffer ?

The Optical Flow API returns a buffer consisting of confidence levels (called cost) for each of the flow vectors to deal with these situations. The application can use this cost buffer to selectively accept or discard regions of the flow vector map.

Renaud

@JanuszL
Copy link
Contributor

JanuszL commented Mar 12, 2024

@rvandeghen
Copy link
Author

Hi @JanuszL,

Do you know if I should expect huge/small changes in the output depending on the value I set to hint_grid ?

I did some comparisons between the NVIDIA OF and RAFT (torchvision version: https://pytorch.org/vision/main/models/raft.html) and the output was much smoother with RAFT.

I also found that changing the value of hint_grid from 1 to 8 does not change anything in the output values.

FYI, I'm using a A100 and my OF is defined as:

of = fn.optical_flow(images,
                     hint_grid=1, # change from 1 to 8
                     output_grid=1,
                    )

Is it the correct behavior ? I know that you are not directly related to NVOF, so if you know someone relevant to answer my questions, do not hesitate to share it with me.

@JanuszL
Copy link
Contributor

JanuszL commented Apr 22, 2024

Hi @rvandeghen,

To my knowledge, the behavior of NVIDIA OF depends on the driver version and the GPU available. It is probably best to ask on the NVIDIA forum.
Also, DALI doesn't use the latest OF API (upgrading it is on our ToDo list but has low priority for now), you may check the relevant OpenCV interface and compare the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants