Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail gracefully when encountering seeking issues during inference #1711

Open
roomrys opened this issue Mar 15, 2024 · 1 comment
Open

Fail gracefully when encountering seeking issues during inference #1711

roomrys opened this issue Mar 15, 2024 · 1 comment
Labels
bug Something isn't working fixed in future release Fix or feature is merged into develop and will be available in future release.

Comments

@roomrys
Copy link
Collaborator

roomrys commented Mar 15, 2024

The random access seeking issue has been longstanding and a major pain point.

We often tell our users to reencode their videos, but this is a pain, increases disk footprint, requires an extra processing step and etc. It's also buried deep in the docs, so most people don't find it. Finally, it's a terrible user experience when you run inference on an entire video (which may take hours!) only to have it crash on the very last frame...

In some cases, the same video file can be seeked on one platform but not another due to OS, ffmpeg and other layers of platform-dependent implementation differences.

See #932 and #945 for an in-depth analysis of the root problem.

Since there doesn't seem to be a very good universal solution, one thing we could do is to add a try/except in the inference block (something like we do in this gist).

(@roomrys: This is a good dataset that when git cloned seems to always throw the KeyError. -- @talmo: I can't reproduce on my end :()


Other relevant issues/discussions:

@roomrys roomrys added the bug Something isn't working label Mar 15, 2024
@talmo
Copy link
Collaborator

talmo commented Mar 17, 2024

#1712 implements the try-except version of this solution.

There's still some problems that we might need to address moving forward:

  • Are there others places in the code that this affects?
  • Do we get seeking issues earlier in videos? In this case, they'd be truncated as soon as the error happens.
  • Do we get misaligned poses and video frames? This PR might mask the underlying problem in these cases.
  • Why does this not happen during training or from the GUI when seeking to the same frame?

The last point was something we were trying to address to try and get at the root cause. Namely, the suspect is tf.data.Dataset and how it wraps the VideoReader provider which is doing the calls to the actual sleap.Video and backends (e.g., OpenCV).

Here's a little exploration Colab on comparing different ways to access videos with and without tf.data.Dataset.

And here's a Gist implementing a standalone sequential inference script. This tries to use threading to read async from the inference thread, but in general it works very similarly to the sleap-track CLI (intended to be a nearly drop-in replacement). A couple of interesting observations from this experiment:

  • We still get the seeking error in some cases it seems, even without tf.data.Dataset.
  • Using multiprocessing instead of threading results in weird errors with OpenCV -- maybe the out-of-process stuff + OpenCV is the root culprit?

In any case, the fix in #1712 should move us forward and we can revisit to address the above concerns as they come up, or punt it to sleap-io.

@talmo talmo closed this as completed Mar 17, 2024
@talmo talmo added the fixed in future release Fix or feature is merged into develop and will be available in future release. label Mar 17, 2024
@talmo talmo reopened this Mar 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed in future release Fix or feature is merged into develop and will be available in future release.
Projects
None yet
Development

No branches or pull requests

2 participants