Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add frames to vec example #17

Open
polarathene opened this issue Apr 11, 2018 · 4 comments
Open

Add frames to vec example #17

polarathene opened this issue Apr 11, 2018 · 4 comments

Comments

@polarathene
Copy link

polarathene commented Apr 11, 2018

I've noticed in my loop that while I can capture frames at 1ms roughly when I add some more code(SDL texture.update() + canvas.copy() + canvas.present(), vulkano code to create/upload texture and update frame, or even thread sleep for around 16ms) that my screen capture can take as long as 4ms instead.

Presumably something to do with CPU cache being used when the loop is only involving the screen capture, I'm not sure. I thought to try capture some frames first into a vec and then iterate through those in my loop, it reduced the times a little, but for my real-time case wasn't worthwhile after all.

Anyway, capturing frames in a loop didn't seem to play well due to lifetimes, I couldn't seem to move the frame data to store elsewhere for some reason. Here's some solutions I came across:

This managed to do 4 iterations a loop in 13ms and 8 in about 23ms. data_vec is Vec<Vec> declared before the wrapping loop of the codeblock below begins.

// Initially 1st loop is skipped, 2nd loop then fills up data_vec, 
// subsequent loops will then only run the 1st loop to update the
// existing data_vec elements avoiding new allocations to_vec() causes
let mut n = 0..8;

for (v, _) in data_vec.iter_mut().zip(&mut n) {
    v.clear();
    v.extend_from_slice(&scrap.frame().unwrap());
}
        
for _ in n {
    &data_vec.push(scrap.frame().unwrap().to_vec());
}

I found copy_from_slice() faster than to_vec()(without iterations, or any other code, took the 1ms operation to 4ms, and copy_from_slice() managed 2ms) probably because of the allocation being avoided. Didn't compare with extend_from_slice(). Pushing the frame/&[u8] data from scrap.frame() to move it to a vec would be nice, but as stated that didn't seem to be an option requiring a copy/allocation like to_vec() :(

Any chance an API update could better handle that? Or is the above the best way to go about it?

@quadrupleslap
Copy link
Owner

If I'm understanding you correctly, the reason lifetimes stop you from moving frames is because that buffer's the same one that gets used for the next frame. Also, being able to directly push to an &mut [u8] would be nice, but the OS only allows sharing specially allocated memory, so I'm not sure how that would work.

Maybe allowing the user to specify the number of buffers allocated would be a good idea - it's currently 3 on macOS, 2 on Linux, and I forgot how the Windows API worked.

@polarathene
Copy link
Author

I don't personally need one anymore as I don't think it helps my use-case, but I thought I should raise an issue about considering how scrap could accommodate to moving/storing frames into a buffer from the user that minimises overhead.

Doing the 2nd loop with a push would be a common way to first approach it I think, without to_vec() it won't be happy however.

@quadrupleslap
Copy link
Owner

Sorry, I don't understand what you mean by "moving/storing frames into a buffer" - when would I need to do store multiple frames? Also, is this a circular buffer with a fixed number of frames, or an expanding buffer?

Thanks to shared memory, I don't know of a way to minimize overhead (the "extra" memcpy, os -> scrap -> user's buffer) besides either of the following:

  • letting the user allocate their own shared buffers.
  • rewriting the entire thing to use the macOS-style event-driven approach where you get an event whenever a new frame is available, and letting the user just take/return ownership of the frame.
  • letting the user choose how many frames get buffered, and maintaining a ring of frames that the user can index.

I think anything else would just be equivalent to memcpy (copy_from_slice). Is there anything else that "minimizing overhead" could involve?

@polarathene
Copy link
Author

when would I need to do store multiple frames?

Well, as I said earlier. I had a loop that was passing the frames off to a display(SDL2 or Vulkano based) and noticed when I added that extra logic in the loop my 1ms captures increased up to 4ms(just the capturing code, extra logic was about 2-3ms more).

So I thought maybe if I captured a few frames to a buffer first I could reduce that increase in capture time(I assume cache misses or something equally low level was causing it). That doesn't work too well at first since the compiler wasn't happy about using scrap more than once in the main loop iteration to hold multiple frames, so to_vec() seemed to be required. Someone then pointed out afterwards extend_from_slice() could be used and that seems to have helped.

I'm not sure if scrap can handle that better internally, requiring the user to do less extra work themselves(where a naive loop might work with a push or method call?).

macOS-style event-driven approach where you get an event whenever a new frame is available

Reminds me of Reactive eXtensions(RX), or useful for that kind of functional approach :)

letting the user choose how many frames get buffered, and maintaining a ring of frames that the user can index.

Sounds most appropriate?


I could be mistaken, but it seemed like how I was approaching it was involving an additional copy after scrap had captured the frame. I was under the impression that using the data from scrap when just dealing with a single frame allowed for moving/taking ownership of the data avoiding that extra copy?

If there isn't much interest from other users for such a feature, no need to spend time on it :) My attempt didn't help reduce the capture time back to what I was getting before any extra code was involved(even a thread sleep of 5ms or more was slowing down the speed of the capture call in the loop).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants