Fail gracefully when encountering seeking issues during inference #1711

roomrys · 2024-03-15T17:39:21Z

The random access seeking issue has been longstanding and a major pain point.

We often tell our users to reencode their videos, but this is a pain, increases disk footprint, requires an extra processing step and etc. It's also buried deep in the docs, so most people don't find it. Finally, it's a terrible user experience when you run inference on an entire video (which may take hours!) only to have it crash on the very last frame...

In some cases, the same video file can be seeked on one platform but not another due to OS, ffmpeg and other layers of platform-dependent implementation differences.

See #932 and #945 for an in-depth analysis of the root problem.

Since there doesn't seem to be a very good universal solution, one thing we could do is to add a try/except in the inference block (something like we do in this gist).

(@roomrys: This is a good dataset that when git cloned seems to always throw the KeyError. -- @talmo: I can't reproduce on my end :()

Other relevant issues/discussions:

The text was updated successfully, but these errors were encountered:

talmo · 2024-03-17T01:58:11Z

#1712 implements the try-except version of this solution.

There's still some problems that we might need to address moving forward:

Are there others places in the code that this affects?
Do we get seeking issues earlier in videos? In this case, they'd be truncated as soon as the error happens.
Do we get misaligned poses and video frames? This PR might mask the underlying problem in these cases.
Why does this not happen during training or from the GUI when seeking to the same frame?

The last point was something we were trying to address to try and get at the root cause. Namely, the suspect is tf.data.Dataset and how it wraps the VideoReader provider which is doing the calls to the actual sleap.Video and backends (e.g., OpenCV).

Here's a little exploration Colab on comparing different ways to access videos with and without tf.data.Dataset.

And here's a Gist implementing a standalone sequential inference script. This tries to use threading to read async from the inference thread, but in general it works very similarly to the sleap-track CLI (intended to be a nearly drop-in replacement). A couple of interesting observations from this experiment:

We still get the seeking error in some cases it seems, even without tf.data.Dataset.
Using multiprocessing instead of threading results in weird errors with OpenCV -- maybe the out-of-process stuff + OpenCV is the root culprit?

In any case, the fix in #1712 should move us forward and we can revisit to address the above concerns as they come up, or punt it to sleap-io.

roomrys added the bug Something isn't working label Mar 15, 2024

talmo mentioned this issue Mar 17, 2024

Graceful failing with seeking errors #1712

Merged

11 tasks

talmo closed this as completed Mar 17, 2024

talmo added the fixed in future release Fix or feature is merged into develop and will be available in future release. label Mar 17, 2024

talmo reopened this Mar 17, 2024

talmo mentioned this issue Mar 17, 2024

Still receive KeyError even after re-encoding #1707

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail gracefully when encountering seeking issues during inference #1711

Fail gracefully when encountering seeking issues during inference #1711

roomrys commented Mar 15, 2024 •

edited by talmo

talmo commented Mar 17, 2024

Fail gracefully when encountering seeking issues during inference #1711

Fail gracefully when encountering seeking issues during inference #1711

Comments

roomrys commented Mar 15, 2024 • edited by talmo

talmo commented Mar 17, 2024

roomrys commented Mar 15, 2024 •

edited by talmo