Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Computer vison streaming demo #392

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

kodonnell
Copy link

I won't write too much here as there's a bunch in the readme.md and you should be able run the demo code easily to see what it's talking about. But I think it's pretty cool (naturally) and will be useful to a bunch of people.

Status:

  • Code is working aside from some heisenbugs (as documented in the readme). I wouldn't mind someone who knows about webrtc/aiortc to run their eyes over it to see if I'm doing it the "right" way, or if there are more optimal ways. Possibly @jlaine (partly as I'm also hoping it'll tickle your fancy).
  • Docs etc. need to be updated with some screenshots etc. I've written them from my perspective (e.g. "I'm a novice and don't trust me") which we might want updated if it's going in here (e.g. "less novice people have reviewed this and OK'd it").

@codecov
Copy link

codecov bot commented Jul 14, 2020

Codecov Report

Merging #392 into main will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##              main      #392   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           31        31           
  Lines         5492      5492           
=========================================
  Hits          5492      5492           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c0504b6...8daa20e. Read the comment docs.

So, I want to be able to have a low-bandwidth solution *and* guaranteed synchronicity between my frames and the inference results. (I know, it won't be low latency as it'll be delayed by at least the model runtime, but hey.) The first constraint means video ... and when you display video the modern way, it's very hard to know the actual frame that's currently being displayed - i.e. there's no frame number attribute, which would otherwise make this easy, as then you could say "OK, frame 123 arrived so let's draw the model results on frame 123". I see two potential approaches, and I'd love if someone implemented one (or another you know of) ...

1. Use the WebRTC stats which, as you've seen, has the number of frames received. This is effectively the frame number, right? My main concern here is getting it in sync (i.e. knowing exactly which frame is the first that gets displayed and increments the count to one), and how poor connection affects things (e.g. if some frames are dropped, and the count not incremented, then everything will be out). And, I guess, how it might vary from browser-to-browser, etc.
2. Do some steganography! Basically, encode the frame number in the pixels of the image, and then decode it on the frontend. The challenges on this front:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to interleave metadata or other non-image data (side data) in the encoded video stream? (I'm rusty on this, but IIRC it should be possible.)

Edit: Something like this - apparently ffmpeg/pyav supports frame side_data though I've not worked with it -- also, not sure how a browser client would access it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! In my experience, I haven't seen anything that works easily. The trick is, like you say, having access to it when you write and being able to get it out when you render it in the browser. I think I've seen side_data before but never looked in detail, so it definitely could work. (Likewise I've seen some things done with the subtitle.) Anyway - I would be very happy if you could figure it out - that'd be awesome!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh, a colleague just found this https://github.com/w3c/webrtc-insertable-streams/blob/master/explainer.md Which might be the trick for this last bit. I'll update now.

@jlaine
Copy link
Collaborator

jlaine commented Mar 2, 2021

Do you plan to complete this PR or should I close it for now?

@kodonnell
Copy link
Author

Hmmm I think it was actually finished IIRC. I'm probably not going to put any more time into it - up to you whether it's useful as-is for an example (even if not polished).

@rprata
Copy link
Contributor

rprata commented Dec 1, 2023

@jlaine can we close this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants