You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm Martin from the BIIGLE annotation tool, which also includes a basic feature for video annotation. I've recently come across an issue with how we identify annotated video frames and wanted to ask you guys if have any experience with that. The issue boils down to the fact that apparently you can't use currentTime of a video to identify the video frame that is displayed. Often, if you set video.currentTime = video.currentTime, the displayed video frame is even updated.
Now, Tator is quite advanced in this regard and from what I can gather from your code, you use a custom video playback implementation based on frame numbers (which can't be done directly with the provided web APIs). In your experience, is your implementation more robust regarding the issue I described above? Also, have you found time/fps (plus some magic) to be reliable to calculate the correct frame number on the browser? I'd be happy if you'd be willing to share some of your experiences!
We found "adding the magic" to the currentTime reliably allowed the video element to seek to the desired frame accurately (specifically after a canPlay event). The relevant code for the conversion is here:
We transcode our videos to a fixed size GOP. This helps with seek-ability and potentially impacts that constant factor we add to land at an time between frames.
I believe it works reliably because if the presentation time of two neighboring frames (milliseconds) is 12800 and 13600 and a request for the frame at 13200 comes in, the decode stack always 'rounds down'. This is my suspicion because when asking for the frame exactly at 12800 you sometimes get the preceding frame.
Other pitfalls we ran into using the video tag and Media Source Extensions relate to some sparsely documented limits in the browser implementations. Specifically, on chrome each video tag can only buffer 150 Mb of data, on Safari I believe it is 90Mb. Each browser also recently added limits to the amount of video tags available. Both of these run against implementing the type of "video editor" that feel natural in annotation tools.
Thanks a lot for sharing your insight, Brian! So you basically run your own timer to track the video progress and don't use the values returned by currentTime at all?
Correct. For frame availability we utilize the canPlay event that the underlying video tag emits after the seek completes. We utilize setting the currentTime but never use the returned result.
Playback is a bit more involved than just a frame accurate seek and display due to the real time deadlines involved in non-jittery playback. We utilize the same strategy for playback as seeking, but have to utilize a couple of different timers to manage playback. This allows us to show each frame vs play at rate (e.g. youtube) if things go awry.
Since we developed this leg of code, there is a new video method that appears to be right up the alley for tools like these, but I haven't reworked the code to make use of it: https://web.dev/requestvideoframecallback-rvfc/. That method would make "happy path" onFrame events a lot less code than we put together.
Thanks again, that was incredibly helpful! I also came across requestVideoFrameCallback but it seems to be non-standard and browser support isn't that great yet.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi, I'm Martin from the BIIGLE annotation tool, which also includes a basic feature for video annotation. I've recently come across an issue with how we identify annotated video frames and wanted to ask you guys if have any experience with that. The issue boils down to the fact that apparently you can't use
currentTime
of a video to identify the video frame that is displayed. Often, if you setvideo.currentTime = video.currentTime
, the displayed video frame is even updated.Now, Tator is quite advanced in this regard and from what I can gather from your code, you use a custom video playback implementation based on frame numbers (which can't be done directly with the provided web APIs). In your experience, is your implementation more robust regarding the issue I described above? Also, have you found
time/fps
(plus some magic) to be reliable to calculate the correct frame number on the browser? I'd be happy if you'd be willing to share some of your experiences!Beta Was this translation helpful? Give feedback.
All reactions