Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting the exact number of frames from a stream #62

Open
mre-ableton opened this issue Sep 11, 2023 · 1 comment
Open

Getting the exact number of frames from a stream #62

mre-ableton opened this issue Sep 11, 2023 · 1 comment

Comments

@mre-ableton
Copy link

mre-ableton commented Sep 11, 2023

Hi There !

We're investigating how to handle loading wav files whose size - as declared in the RIFF header - is incorrect and a lot larger than the data actually contained in the file.

In such cases, we would like to crop the sound to the actual data and ideally, be able to determine the exact size without having to read the whole file upfront.

As expected, ifstream::num_frames will return the value read in the header.

However, the documentation states:

num_frames may differ from the actual number of frames in the stream
as this information relies on the codec. The only way to obtain the exact
number of frames is by seeking to the end of stream and retrieving the frame position.

So, we're trying to use ifstream::frame_tellg but it seems to return the same value as num_frame.

Here's an example code that evaluates the number of frame using num_frames, frame_seekg and by reading the data,.

  // Build a ifstream
  const auto filePath = makeTestFilePath("BB3_100_drum_break_paprika.wav");
  auto stream = audio::ifstream{filePath.string()};

  // Get the number of frames reported
  const auto reportedFrameNum = stream.info().num_frames();

  // Position the stream at the end
  stream.frame_seekg(0, std::ios_base::end);
  const auto seekedNumFrame = size_t(stream.frame_tellg());

  // Read the data until exhaustion to get the actual data contained in the file
  const auto dataFrameNum = [&]() {

    stream.frame_seekg(0, std::ios_base::beg);
    size_t frameCount{0};

    constexpr std::size_t kNiMediaReadChunkSize = 4096 / sizeof(float);
    const auto samplesPerChunk =
      std::min<std::size_t>(kNiMediaReadChunkSize, reportedFrameNum);
    std::vector<float> data(samplesPerChunk * stream.info().num_channels(), 0.0f);

    while (stream.read((char*)data.data(), std::streamsize(samplesPerChunk)))
    {
      frameCount += stream.frame_gcount();
    }

    return frameCount;
  }();

  std::cout << "Number of frames returning by num_frames " << reportedFrameNum
            << std::endl;
  std::cout << "Number of frames in the file " << dataFrameNum << std::endl;
  std::cout << "Number of frames from seeking " << seekedNumFrame << std::endl;

  CHECK(seekedNumFrame == dataFrameNum);

The output will be

Number of frames returning by num_frames 423360
Number of frames in the file 98090
Number of frames from seeking 423360

our expectation would be that seeking would also return 98090.

Is this the api we're supposed to use ? is there another way ?

Cheers.

PS: Here's the file this test was ran with BB3_100_drum_break_paprika.zip

@wro-ableton
Copy link
Contributor

I looked into this. It seems that indeed using seekg and tellg doesn't provide different information that num_frames.

The reason is that the subview_device used in wav_source and other sources to create a stream view on the data portion of the file trusts that the provided view end position is accurate. It doesn't check if it is located past the end of the file. To my understanding, at least on Windows, the internal call to SetFilePointer will also allow moving the pointer past the end of file and the consecutive tellg will simply return the pointer offset.

I think it's interesting that ni-media allows loading wav files which proclaim they are longer than they actually are. So I would expect some file size check to ensure the RIFF and data chunk lengths are valid and an exception if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants