WIP: Computer vison streaming demo #392

kodonnell · 2020-07-14T12:13:46Z

I won't write too much here as there's a bunch in the readme.md and you should be able run the demo code easily to see what it's talking about. But I think it's pretty cool (naturally) and will be useful to a bunch of people.

Status:

Code is working aside from some heisenbugs (as documented in the readme). I wouldn't mind someone who knows about webrtc/aiortc to run their eyes over it to see if I'm doing it the "right" way, or if there are more optimal ways. Possibly @jlaine (partly as I'm also hoping it'll tickle your fancy).
Docs etc. need to be updated with some screenshots etc. I've written them from my perspective (e.g. "I'm a novice and don't trust me") which we might want updated if it's going in here (e.g. "less novice people have reviewed this and OK'd it").

codecov · 2020-07-14T12:18:24Z

Codecov Report

Merging #392 into main will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##              main      #392   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           31        31           
  Lines         5492      5492           
=========================================
  Hits          5492      5492

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c0504b6...8daa20e. Read the comment docs.

inactivist · 2020-07-14T13:13:19Z

examples/computer-vision-server-side-streaming/readme.md

+So, I want to be able to have a low-bandwidth solution *and* guaranteed synchronicity between my frames and the inference results. (I know, it won't be low latency as it'll be delayed by at least the model runtime, but hey.) The first constraint means video ... and when you display video the modern way, it's very hard to know the actual frame that's currently being displayed - i.e. there's no frame number attribute, which would otherwise make this easy, as then you could say "OK, frame 123 arrived so let's draw the model results on frame 123". I see two potential approaches, and I'd love if someone implemented one (or another you know of) ...
+
+1. Use the WebRTC stats which, as you've seen, has the number of frames received. This is effectively the frame number, right? My main concern here is getting it in sync (i.e. knowing exactly which frame is the first that gets displayed and increments the count to one), and how poor connection affects things (e.g. if some frames are dropped, and the count not incremented, then everything will be out). And, I guess, how it might vary from browser-to-browser, etc.
+2. Do some steganography! Basically, encode the frame number in the pixels of the image, and then decode it on the frontend. The challenges on this front:


Is it possible to interleave metadata or other non-image data (side data) in the encoded video stream? (I'm rusty on this, but IIRC it should be possible.)

Edit: Something like this - apparently ffmpeg/pyav supports frame side_data though I've not worked with it -- also, not sure how a browser client would access it.

Great question! In my experience, I haven't seen anything that works easily. The trick is, like you say, having access to it when you write and being able to get it out when you render it in the browser. I think I've seen side_data before but never looked in detail, so it definitely could work. (Likewise I've seen some things done with the subtitle.) Anyway - I would be very happy if you could figure it out - that'd be awesome!

Oooh, a colleague just found this https://github.com/w3c/webrtc-insertable-streams/blob/master/explainer.md Which might be the trick for this last bit. I'll update now.

jlaine · 2021-03-02T15:58:29Z

Do you plan to complete this PR or should I close it for now?

kodonnell · 2021-03-02T21:24:11Z

Hmmm I think it was actually finished IIRC. I'm probably not going to put any more time into it - up to you whether it's useful as-is for an example (even if not polished).

rprata · 2023-12-01T12:50:43Z

@jlaine can we close this PR?

added computer vison streaming demo - docs etc. are WIP

2148df0

inactivist reviewed Jul 14, 2020

View reviewed changes

updated docs with more options

8daa20e

jlaine force-pushed the main branch from a2a8d58 to 5036f69 Compare February 10, 2021 13:03

jlaine force-pushed the main branch from e321f9f to 71cd933 Compare March 1, 2021 21:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Computer vison streaming demo #392

WIP: Computer vison streaming demo #392

kodonnell commented Jul 14, 2020

codecov bot commented Jul 14, 2020 •

edited

inactivist Jul 14, 2020 •

edited

kodonnell Jul 14, 2020

kodonnell Jul 15, 2020

jlaine commented Mar 2, 2021

kodonnell commented Mar 2, 2021

rprata commented Dec 1, 2023

WIP: Computer vison streaming demo #392

Are you sure you want to change the base?

WIP: Computer vison streaming demo #392

Conversation

kodonnell commented Jul 14, 2020

codecov bot commented Jul 14, 2020 • edited

Codecov Report

inactivist Jul 14, 2020 • edited

Choose a reason for hiding this comment

kodonnell Jul 14, 2020

Choose a reason for hiding this comment

kodonnell Jul 15, 2020

Choose a reason for hiding this comment

jlaine commented Mar 2, 2021

kodonnell commented Mar 2, 2021

rprata commented Dec 1, 2023

codecov bot commented Jul 14, 2020 •

edited

inactivist Jul 14, 2020 •

edited