Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source_info tensor not guaranteed to contain correct data #5377

Open
1 task done
Tomsen1410 opened this issue Mar 15, 2024 · 1 comment
Open
1 task done

source_info tensor not guaranteed to contain correct data #5377

Tomsen1410 opened this issue Mar 15, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@Tomsen1410
Copy link

Version

1.35

Describe the bug.

I am using a video reader pipeline as follows:

@pipeline_def
def read_decode_pipe(filenames, device='cpu'):
    video = fn.readers.video(
        sequence_length=384,
        filenames=filenames,
        pad_sequences=True,
        device=device
    )
    source_info = fn.get_property(video, key="source_info")
    return video, source_info

And I retrieve data from it using a DALIRaggedIterator:

pipe = read_decode_pipe(
    files,
    batch_size=batch_size,
    device=device,
    device_id=device_id,
    num_threads=n_threads,
)
pipe.build()
it = DALIRaggedIterator(
    pipe,
    output_map=['snippets', 'paths'],
    output_types=[DALIRaggedIterator.SPARSE_LIST_TAG, DALIRaggedIterator.SPARSE_LIST_TAG],
    auto_reset=False,
    last_batch_policy=LastBatchPolicy.PARTIAL
)

for data in it:
   snippets = data[0]['snippets']
   bytes_paths = data[0]['paths'] # <--- might not yet be filled with data
   str_paths = [path.cpu().numpy().tobytes().decode() for path in bytes_paths ]

Occasionally it happens that these encoded paths still hold no value at the time of decoding. Essentially they are tensors filled with zeros and the decoded path string is useless. Interestingly, when I set a breakpoint at that location and then apply the exact same decoding operation in the debug console, the strings are properly decoded all of a sudden. Probably because enough time has passed so that the tensors got filled with the actual data. This suggests that the source_info tensors get filled with data asynchronously. This is definitely unexpected behavior. The pipeline should await the data until it gets forwarded to the for loop.

Minimum reproducible example

No response

Relevant log output

No response

Other/Misc.

No response

Check for duplicates

  • I have searched the open bugs/issues and have found no duplicates for this bug report
@Tomsen1410 Tomsen1410 added the bug Something isn't working label Mar 15, 2024
@JanuszL
Copy link
Contributor

JanuszL commented Mar 15, 2024

Hi @Tomsen1410,

Can you provide a standalone and selfcontained repro? Something like this works for me:

docker run --rm -ti --gpus 'all,"capabilities=compute,utility,video"' ubuntu:22.04

apt update && apt install -y vim wget python3-pip
pip install --extra-index-url https://pypi.nvidia.com/ --upgrade nvidia-dali-cuda120 torch numpy
wget https://github.com/NVIDIA/DALI_extra/raw/main/db/video/sintel/sintel_trailer-720p.mp4
python3 test.py

import nvidia.dali.fn as fn
from nvidia.dali import pipeline_def
from nvidia.dali.plugin.pytorch import DALIRaggedIterator, LastBatchPolicy

files = ["sintel_trailer-720p.mp4"]
batch_size = 3
device = "gpu"
device_id = 0
n_threads = 4

@pipeline_def
def read_decode_pipe(filenames, device="cpu"):
    video = fn.readers.video(
        sequence_length=3, filenames=filenames, pad_sequences=True, device=device
    )
    source_info = fn.get_property(video, key="source_info")
    return video, source_info


pipe = read_decode_pipe(
    files,
    batch_size=batch_size,
    device=device,
    device_id=device_id,
    num_threads=n_threads,
)
pipe.build()
it = DALIRaggedIterator(
    pipe,
    output_map=["snippets", "paths"],
    output_types=[DALIRaggedIterator.SPARSE_LIST_TAG, DALIRaggedIterator.SPARSE_LIST_TAG],
    auto_reset=False,
    last_batch_policy=LastBatchPolicy.PARTIAL,
)

for data in it:
    snippets = data[0]["snippets"]
    bytes_paths = data[0]["paths"]  # <--- might not yet be filled with data
    str_paths = [path.cpu().numpy().tobytes().decode() for path in bytes_paths]
    print(str_paths)

@jantonguirao jantonguirao assigned JanuszL and unassigned jantonguirao Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants