New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Computer vison streaming demo #392
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## main #392 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 31 31
Lines 5492 5492
=========================================
Hits 5492 5492 Continue to review full report at Codecov.
|
So, I want to be able to have a low-bandwidth solution *and* guaranteed synchronicity between my frames and the inference results. (I know, it won't be low latency as it'll be delayed by at least the model runtime, but hey.) The first constraint means video ... and when you display video the modern way, it's very hard to know the actual frame that's currently being displayed - i.e. there's no frame number attribute, which would otherwise make this easy, as then you could say "OK, frame 123 arrived so let's draw the model results on frame 123". I see two potential approaches, and I'd love if someone implemented one (or another you know of) ... | ||
|
||
1. Use the WebRTC stats which, as you've seen, has the number of frames received. This is effectively the frame number, right? My main concern here is getting it in sync (i.e. knowing exactly which frame is the first that gets displayed and increments the count to one), and how poor connection affects things (e.g. if some frames are dropped, and the count not incremented, then everything will be out). And, I guess, how it might vary from browser-to-browser, etc. | ||
2. Do some steganography! Basically, encode the frame number in the pixels of the image, and then decode it on the frontend. The challenges on this front: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to interleave metadata or other non-image data (side data) in the encoded video stream? (I'm rusty on this, but IIRC it should be possible.)
Edit: Something like this - apparently ffmpeg/pyav supports frame side_data though I've not worked with it -- also, not sure how a browser client would access it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question! In my experience, I haven't seen anything that works easily. The trick is, like you say, having access to it when you write and being able to get it out when you render it in the browser. I think I've seen side_data
before but never looked in detail, so it definitely could work. (Likewise I've seen some things done with the subtitle.) Anyway - I would be very happy if you could figure it out - that'd be awesome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh, a colleague just found this https://github.com/w3c/webrtc-insertable-streams/blob/master/explainer.md Which might be the trick for this last bit. I'll update now.
Do you plan to complete this PR or should I close it for now? |
Hmmm I think it was actually finished IIRC. I'm probably not going to put any more time into it - up to you whether it's useful as-is for an example (even if not polished). |
@jlaine can we close this PR? |
I won't write too much here as there's a bunch in the readme.md and you should be able run the demo code easily to see what it's talking about. But I think it's pretty cool (naturally) and will be useful to a bunch of people.
Status: