Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mirrored memory ring buffer #671

Open
dsego opened this issue May 17, 2023 · 7 comments
Open

Mirrored memory ring buffer #671

dsego opened this issue May 17, 2023 · 7 comments

Comments

@dsego
Copy link

dsego commented May 17, 2023

From what I understand, reading the ring buffer currently requires a loop to make sure all the required frames are read. This is stated in the docs:

If the read or write pointer is positioned such that the number of frames requested will require a loop, it will be clamped to the end of the buffer. Therefore, the number of frames you're given may be less than the number you requested.

However, I have found a mirroring method described online which uses a copy of the data to create a contiguous block and avoids wrapping around when reading the buffer.

This pointer can be passed directly to a read call or anything else that wants to write into a single block of memory.

https://mikeash.com/pyblog/friday-qa-2012-02-03-ring-buffers-and-mirrored-memory-part-i.html

Would something like this be possible to implement?

@orcmid
Copy link

orcmid commented May 17, 2023

@dsego

Would something like this be possible to implement?

Are you asking for it to be done in miniaudio or are you askiing if this is something feasible for you to do?

Can you say more about your use case -- what is the situation where you consider a ring buffer to be helpful?

Off-the-top-of-my-head:

The solution you linked to involves operating-system and hardware level messing with the mapping of virtual memory. I don't know what that would do for threading issues. I shudder to even think about it.

Generally, the reading and writing in a ring buffer is one fixed-size element at a time. It is perfect for that.

It does mean there has to be wrap-around checking on each read or write, and there also has to be collision detection -- i.e., the buffer is full and can't take more yet, or it is empty, and can't be read more yet.

There are ways to accelerate this by working in larger chunks. The computations
become trickier for wrapping properly and also avoiding collisions.

A compromise might be by increasing the size of elements, doing reading and writing in bigger fixed-size chunks in effect.

Yet another alternative might be having fixed-size elements in the buffer but each element has a separately-allocated chunk that hangs off of it. This gets complicated too.

I will stop speculating. It is important to understand the use case, rather than start with ring buffer as a solution.

@dsego
Copy link
Author

dsego commented May 17, 2023

Asking for it to be done in miniaudio. I think the use-case would be to simplify the ring buffer interface.

For example, the way I understand it, there might be 500 samples to read in the buffer, but the first time I call pcm_rb_acquire_read it only gives me 300, because we hit the end of the buffer. Than I need to handle it and call pcm_rb_acquire_read again to have it wrap-around and give me the 200 at the beginning.

With a mirrored buffer as long as I want to read or write less than current capacity, it should work, no need to think about the internal state of the buffer. In this case I would get those 500 samples in one contiguous memory pointer.

It seems that libsoundio does something like this.
https://github.com/andrewrk/libsoundio/blob/master/src/ring_buffer.c#L85-L94

Portaudio if I'm not mistaken, doesn't mirror but instead keeps two regions and has some internal logic hidden inside user-facing methods.
https://github.com/PortAudio/portaudio/blob/master/src/common/pa_ringbuffer.c#L104

Feel free to close the request if it's not something that can be entertained at this point.

@orcmid
Copy link

orcmid commented May 17, 2023

@dsego You are correct. The ma_pcm_rb_acquire_read() gives you a pointer and a count that will not be past the end of the ring buffer vector.

You also have to do ma_pcm_rb_commit_read() to let the ring buffer scheme know that you are done with what you acquired, so the space becomes available for reuse by the writer.

The key factor in all of this is that when you acquire_read() you are returned a point to an array of a returned size that you then need to consume yourself. This is storage in the ring buffer itself and it is effectively locked against the writer until your code reports that the data is no longer needed there and the array cells are available for the writer to use.

PONDERING FURTHER

I'm unclear why this is a bother. It is really no different than if you caught up with the writer or the writer was working so rapidly that it had wrapped around and was hot on the reader's heals, if not breathing down its neck. If you always want to grab everything that the writer has buffered when you do a read, expecting to find that in a single block that you can index through is simply contrary to how ring buffers work.

The alternative is to quickly copy available material from the ring buffer into an array of contiguous elements doing at most two copies. Then the writer is free to keep going while your program digests the sequential buffer you created for yourself. At this point, I think there is more appeal to a buffering scheme that does not use a (conventional) ring buffer.

@mackron
Copy link
Owner

mackron commented May 17, 2023

@dsego Thanks for this. You're correct in miniaudio's ring buffer. When I wrote the ring buffer, I just wrote it off the top of my head without a reference, but that mirrored technique is rather clever. I don't think I would ever have thought of that. It looks like the only disadvantage is that it takes up double the memory and will require two writes instead of one. I wonder if I could do a ma_mirrored_rb or something. Certainly I wouldn't change the existing ring buffer because that might not be an acceptable change to some users. I can certainly see the advantages in this though.

I like this idea and will have a look at it when I get the chance, but it might be a while before I get to it. Let's leave this ticket open.

@orcmid
Copy link

orcmid commented May 18, 2023

@mackron That is clever. So, once anything in the upper copy is committed, the read pointer comes back down to a lower next-read point in the buffer it never leaves. That looks like there is no thread difficulty. Of course, the writer is always writing in two places just to keep readers happy. Some other calculations are a little tricky, I suppose, and fun to work out.

The cost is doubling the space and doubling the number of writes so that reader can use read_acquire to process consecutive entries right in the buffer rather than quickly freeing buffer space ahead of the writer by grabbing copies. And the writer still has to wrap as usual.

A mystery.

@orcmid
Copy link

orcmid commented May 20, 2023

@mackron I thought about this some more and realized it is far better to transfer to user-supplied blocks of contiguous memory. Now there's memcpy speeds and freeing of space in the ring buffer very quickly. This is done in at most two chunks into consecutive locations. This gets rid of the acquire_read()and commit_read() altogether. The user still has to be prepared for getting less data than provided for in the user-supplied buffer.

Another nice feature is this can be done with a get-type function that uses the existing aquire()/commit() mechanism under the covers. So instead of ma_morrored_rb, just add ma_rb_read(). Hmm, so a standalone prototype of the function can be used without having to add anything to miniaudio first.

PS: I should have thought of this sooner. I did something like this in an API for a minicom;puter terminal-interface service long ago.

@orcmid
Copy link

orcmid commented Jun 24, 2023

@mackron

https://lo.calho.st/posts/black-magic-buffer/
I just ran across that article when looking through an interesting profile on Mastodon, (https://mastodon.acm.org/@phf). There is some interesting performance data.

https://nullprogram.com/blog/2016/04/10/
is apparently a good look at the virtual-memory ping-pong mapping case.

This is not going to work well on an OS (or processor) that does not support this though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants