Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pyav audio support #962

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Azleur
Copy link

@Azleur Azleur commented Mar 11, 2023

See issue #940

Copy link
Contributor

@FirefoxMetzger FirefoxMetzger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @Azleur 🚀

I think this is pretty much complete and only needs some minor tweaking to make the API consistent with the rest of ImageIO.

There is one feature I was thinking about while reviewing that I think makes a lot of sense, so I wonder what you think about it: In my mind, the default use-case for reading audio is to read it alongside some video frames/stream and to then pass both the frame and the corresponding audio into some ML pipeline or deep learning model. Currently, there is no easy way for a user to figure out which segment of audio corresponds to a video frame stored at a given index. Do you think it would be useful to add code that allows reading/iterating over the audio in batches that correspond to the start and duration at which individual frames are shown?

*,
stream_index: int = 0,
) -> np.ndarray:
"""Yield audio frames from the video's audio stream.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Yield audio frames from the video's audio stream.
"""Yield audio frames from an audio stream in the ImageResource.

Not 100% happy with this phrasing yet. The "from an audio stream in the ImageResource" sounds a bit generic and doesn't add much value. Should we just cut it and go with "yield audio frames"?

stream_index: int = 0,
frame_index: int = None,
) -> np.ndarray:
"""Read audio data from the video.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Read audio data from the video.
"""Read audio frames from an audio stream in the ImageResource.

Same as with yield. Should we just cut the clutter?

Comment on lines +1108 to +1110
frame_index : int
Select which audio frame to read. If None (default), read all audio
frames. Each audio frame is a batch of contiguous samples.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this one the first argument and rename it to index? This would mirror the way images are read and allow users to do

with iio.imopen("...", "r") as file:
    file.read_audio(index=6)

Also, the value for index that instructs ImageIO to read and batch all images in a file is Ellipsis or ... for short. It would make sense to match this syntax when reading audio. Could you switch the default value to ... (while keeping the behavior of batch-reading by default)?

We use index=None on plugins that support multiple container formats and need to specify different default behavior depending on the format being read. For example, for JPEG pillow reads the first (and only) image in the file (index=0), but for GIF it reads all frames and returns a batch (index=...).

Comment on lines +1115 to +1116
A numpy array with shape ``(channels, samples)`` containing the
requested audio data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A numpy array with shape ``(channels, samples)`` containing the
requested audio data.
A numpy array containing the requested audio data.

When reading a batch we also get a batch dimension in the result. Either we document this here or we just don't make any claim on the returned shape (since it depends on the input arguments).

Also, what happens if data is stored interleaved rather than planar. In this case we would get a (samples,) shaped array back, right?

requested audio data.
"""

self._container.seek(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried that this will mess with reading video data because seeking on the container moves all streams to the desired pts. If I do

with iio.imopen("...", "r") as file:
    img = file.read(index=5)
    audio = file.read_audio(index=95)
    img = file.read(index=6)

will I (a) still get the right frame, and (b) still benefit from the recent performance optimizations that avoid seeking and spooling the file if we read consecutive video frames? (For large videos this gives a 10x speedup)

I think we probably want a unit test that interacts with both images and video at the same time. I'd imagine this to be the default use case and it will be good to have certainty that video reading and audio reading works well when interleaved.


def test_write_audio(test_images: Path):
in_path = str(test_images / "realshort.mp4")
out_path = str(test_images / "realshort.mp3")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We treat the test_images dir as read-only to keep tests independent of each other. (They share the directory to avoid having to copy a lot of data around when setting up a test).

assert np.allclose(expected_frame, actual_frame)


def test_write_audio(test_images: Path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_write_audio(test_images: Path):
def test_write_audio(test_images: Path, tmp_path: Path):

tmp_path is a pytest fixture (test-level) that gives you a unique, temporary folder for a unit test. With it you can get a directory that you can mess around in without influencing other tests. Pytest will take care of the cleanup.


def test_write_audio(test_images: Path):
in_path = str(test_images / "realshort.mp4")
out_path = str(test_images / "realshort.mp3")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
out_path = str(test_images / "realshort.mp3")
out_path = str(tmp_path / "realshort.mp3")

Comment on lines +622 to +629
with iio.imopen(in_path, "r", plugin="pyav") as in_file:
in_frames = [frame for frame in in_file.iter_audio()]

with iio.imopen(out_path, "w", plugin="pyav") as out_file:
out_file.init_audio_stream("mp3", format="fltp", layout="mono")
for frame in in_frames:
# This would fail if the frame was float64.
out_file.write_audio_frame(frame)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can combine these using a contextlib.ExitStack and avoid keeping the full audio in memory uncompressed.

out_data = np.concatenate(out_frames, axis=1)

assert in_data.shape == out_data.shape
assert np.allclose(in_data, out_data, atol=0.005) # Lossy conversion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert np.allclose(in_data, out_data, atol=0.005) # Lossy conversion.
assert np.allclose(in_data, out_data, atol=0.005) # Lossy compression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants