Add pyav audio support #962

Azleur · 2023-03-11T15:11:58Z

See issue #940

…lude a stereo video. WARNING: video not commited, unclear how to handle it.

FirefoxMetzger

Nice work @Azleur 🚀

I think this is pretty much complete and only needs some minor tweaking to make the API consistent with the rest of ImageIO.

There is one feature I was thinking about while reviewing that I think makes a lot of sense, so I wonder what you think about it: In my mind, the default use-case for reading audio is to read it alongside some video frames/stream and to then pass both the frame and the corresponding audio into some ML pipeline or deep learning model. Currently, there is no easy way for a user to figure out which segment of audio corresponds to a video frame stored at a given index. Do you think it would be useful to add code that allows reading/iterating over the audio in batches that correspond to the start and duration at which individual frames are shown?

FirefoxMetzger · 2023-03-18T05:54:45Z

imageio/plugins/pyav.py

+        *,
+        stream_index: int = 0,
+    ) -> np.ndarray:
+        """Yield audio frames from the video's audio stream.


Suggested change

"""Yield audio frames from the video's audio stream.

"""Yield audio frames from an audio stream in the ImageResource.

Not 100% happy with this phrasing yet. The "from an audio stream in the ImageResource" sounds a bit generic and doesn't add much value. Should we just cut it and go with "yield audio frames"?

FirefoxMetzger · 2023-03-18T06:03:08Z

imageio/plugins/pyav.py

+        stream_index: int = 0,
+        frame_index: int = None,
+    ) -> np.ndarray:
+        """Read audio data from the video.


Suggested change

"""Read audio data from the video.

"""Read audio frames from an audio stream in the ImageResource.

Same as with yield. Should we just cut the clutter?

FirefoxMetzger · 2023-03-18T06:10:02Z

imageio/plugins/pyav.py

+        frame_index : int
+            Select which audio frame to read. If None (default), read all audio
+            frames. Each audio frame is a batch of contiguous samples.


Can you make this one the first argument and rename it to index? This would mirror the way images are read and allow users to do

with iio.imopen("...", "r") as file: file.read_audio(index=6)

Also, the value for index that instructs ImageIO to read and batch all images in a file is Ellipsis or ... for short. It would make sense to match this syntax when reading audio. Could you switch the default value to ... (while keeping the behavior of batch-reading by default)?

We use index=None on plugins that support multiple container formats and need to specify different default behavior depending on the format being read. For example, for JPEG pillow reads the first (and only) image in the file (index=0), but for GIF it reads all frames and returns a batch (index=...).

FirefoxMetzger · 2023-03-18T06:12:11Z

imageio/plugins/pyav.py

+            A numpy array with shape ``(channels, samples)`` containing the
+            requested audio data.


Suggested change

A numpy array with shape ``(channels, samples)`` containing the

requested audio data.

A numpy array containing the requested audio data.

When reading a batch we also get a batch dimension in the result. Either we document this here or we just don't make any claim on the returned shape (since it depends on the input arguments).

Also, what happens if data is stored interleaved rather than planar. In this case we would get a (samples,) shaped array back, right?

FirefoxMetzger · 2023-03-18T06:16:30Z

imageio/plugins/pyav.py

+            requested audio data.
+        """
+
+        self._container.seek(0)


I'm a bit worried that this will mess with reading video data because seeking on the container moves all streams to the desired pts. If I do

with iio.imopen("...", "r") as file: img = file.read(index=5) audio = file.read_audio(index=95) img = file.read(index=6)

will I (a) still get the right frame, and (b) still benefit from the recent performance optimizations that avoid seeking and spooling the file if we read consecutive video frames? (For large videos this gives a 10x speedup)

I think we probably want a unit test that interacts with both images and video at the same time. I'd imagine this to be the default use case and it will be good to have certainty that video reading and audio reading works well when interleaved.

FirefoxMetzger · 2023-03-18T06:51:27Z

tests/test_pyav.py

+
+def test_write_audio(test_images: Path):
+    in_path = str(test_images / "realshort.mp4")
+    out_path = str(test_images / "realshort.mp3")


We treat the test_images dir as read-only to keep tests independent of each other. (They share the directory to avoid having to copy a lot of data around when setting up a test).

FirefoxMetzger · 2023-03-18T06:53:00Z

tests/test_pyav.py

+        assert np.allclose(expected_frame, actual_frame)
+
+
+def test_write_audio(test_images: Path):


Suggested change

def test_write_audio(test_images: Path):

def test_write_audio(test_images: Path, tmp_path: Path):

tmp_path is a pytest fixture (test-level) that gives you a unique, temporary folder for a unit test. With it you can get a directory that you can mess around in without influencing other tests. Pytest will take care of the cleanup.

FirefoxMetzger · 2023-03-18T06:53:11Z

tests/test_pyav.py

+
+def test_write_audio(test_images: Path):
+    in_path = str(test_images / "realshort.mp4")
+    out_path = str(test_images / "realshort.mp3")


Suggested change

out_path = str(test_images / "realshort.mp3")

out_path = str(tmp_path / "realshort.mp3")

FirefoxMetzger · 2023-03-18T06:55:27Z

tests/test_pyav.py

+    with iio.imopen(in_path, "r", plugin="pyav") as in_file:
+        in_frames = [frame for frame in in_file.iter_audio()]
+
+    with iio.imopen(out_path, "w", plugin="pyav") as out_file:
+        out_file.init_audio_stream("mp3", format="fltp", layout="mono")
+        for frame in in_frames:
+            # This would fail if the frame was float64.
+            out_file.write_audio_frame(frame)


We can combine these using a contextlib.ExitStack and avoid keeping the full audio in memory uncompressed.

FirefoxMetzger · 2023-03-18T06:56:17Z

tests/test_pyav.py

+    out_data = np.concatenate(out_frames, axis=1)
+
+    assert in_data.shape == out_data.shape
+    assert np.allclose(in_data, out_data, atol=0.005)  # Lossy conversion.


Suggested change

assert np.allclose(in_data, out_data, atol=0.005) # Lossy conversion.

assert np.allclose(in_data, out_data, atol=0.005) # Lossy compression.

Azleur and others added 6 commits February 24, 2023 15:05

Add audio reading interface to the PyAV plugin.

c1c831f

Add support for frame index in read_audio() and expand testing to inc…

42069c6

…lude a stereo video. WARNING: video not commited, unclear how to handle it.

Add audio writing interface to the PyAV plugin.

3ca4518

blackify

38fdbc8

rewrap comment blocks to 80 chars per line

d4a627a

Merge branch 'master' into add-pyav-audio-support

bc84ff0

FirefoxMetzger requested changes Mar 18, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pyav audio support #962

Add pyav audio support #962

Azleur commented Mar 11, 2023

FirefoxMetzger left a comment

FirefoxMetzger Mar 18, 2023

FirefoxMetzger Mar 18, 2023

FirefoxMetzger Mar 18, 2023

FirefoxMetzger Mar 18, 2023

FirefoxMetzger Mar 18, 2023

FirefoxMetzger Mar 18, 2023

FirefoxMetzger Mar 18, 2023

FirefoxMetzger Mar 18, 2023

FirefoxMetzger Mar 18, 2023

FirefoxMetzger Mar 18, 2023

	"""Yield audio frames from the video's audio stream.
	"""Yield audio frames from an audio stream in the ImageResource.

	"""Read audio data from the video.
	"""Read audio frames from an audio stream in the ImageResource.

		A numpy array with shape ``(channels, samples)`` containing the
		requested audio data.

	A numpy array with shape ``(channels, samples)`` containing the
	requested audio data.
	A numpy array containing the requested audio data.

		assert np.allclose(expected_frame, actual_frame)


		def test_write_audio(test_images: Path):

	def test_write_audio(test_images: Path):
	def test_write_audio(test_images: Path, tmp_path: Path):

	out_path = str(test_images / "realshort.mp3")
	out_path = str(tmp_path / "realshort.mp3")

	assert np.allclose(in_data, out_data, atol=0.005) # Lossy conversion.
	assert np.allclose(in_data, out_data, atol=0.005) # Lossy compression.

Add pyav audio support #962

Are you sure you want to change the base?

Add pyav audio support #962

Conversation

Azleur commented Mar 11, 2023

FirefoxMetzger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment