Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RGB synchronization fails if there are more IR than RGB frames #54

Open
LukasBommes opened this issue Aug 19, 2021 · 4 comments
Open

RGB synchronization fails if there are more IR than RGB frames #54

LukasBommes opened this issue Aug 19, 2021 · 4 comments

Comments

@LukasBommes
Copy link

I wanted to point out a bug in the split_seqs script. The synchronization between IR and RGB works fine as long as there are more RGB than IR frames. However, if there are more Ir than RGB frames, the synchronization logic fails.

I ran into this issue after switching from the 8 Hz Flir Duo Pro R to the 30 Hz version. As the visual stream is at 29.87 Hz, there are more IR than RGB frames generated.

@jveitchmichaelis
Copy link
Contributor

jveitchmichaelis commented Aug 20, 2021

Thanks for this, I'll take a look! The split code was built and tested with the 30Hz version in mind, so this may be a regression bug.

@LukasBommes
Copy link
Author

If you want, I can send you my code in about two weeks (I'm currently on holiday). The logic is the same as in your code, just that I invert everything in case n_IR > n_RGB.

However, even after this fix snychronization is still rather poor. For long sequences (~20k frames) the IR and RGB streams are up to 10 frames off in the middle of the sequence. Also, for my camera (Zenmuse XT2) the IR stream hangs once in a while when the camera performs recalibration.

I was thinking of using the frame timestamps (from EXIF tags) for synchronization. In the TIFF stack (which I use instead of SEQs) each frame has a millisecond-accurate walltime associated to it. For the RGB stream there is only a millisecond-accurate relative timestamp starting from zero. However, I am in doubt about the timestamps of the IR stream as the recalibration procedure does not show up here.

It would be interesting to know whether you are able to synchronize your streams properly. So far, the problem seems quite tough. I was even thinking of extracting descriptors from both IR and RGB stream and matching descriptors of each IR frame to temporally neighbouring RGB streams. Something like this: https://la.disneyresearch.com/publication/actionsnapping/ The main difficulty is that we have two different modalities, which makes typical feature descriptors, such as ORB, SIFT, and Bag-of-words unsuitable.

@jveitchmichaelis
Copy link
Contributor

jveitchmichaelis commented Aug 29, 2021 via email

@LukasBommes
Copy link
Author

LukasBommes commented Aug 30, 2021

After my holiday, I will take closer look into the synchronization. I was thinking of doing it the following way:

  1. calibrate intrinsics of both cameras
  2. undistort IR and RGB frames
  3. find a homography which maps the IR onto the RGB frame (in my case working distance is 10..20 meters while the baseline is a few centimeters, so a homography should be a reasonable approximation)
  4. coarsely align streams using your code, i.e. assuming constant frame rates and zero starting-offset
  5. performing fine-grained matching of IR and RGB frames in a local neighborhood (e.g. +-20 frames) based on an image-level similarity metric, such as mutual information, cross-correlation, etc. (maybe after applying a low-pass filter, histogram equalization, ...)

The latter step would certainly require some experimentation. Alternatives would be feature-based similarity metrics or extraction and matching of shapes, such as line segments. May I ask, which IR-RGB descriptors you tried out? I found this one, which looks promising: https://www.mdpi.com/1424-8220/20/18/5105 Another way would be to extract keypoints from IR and RGB and find matches based on a geometric constraint (e.g. the homography or more generally the fundamental matrix). The frame with lowest median spatial distance between matched keypoints would then be selected as match.

A CNN is probably also an option, but it would have to be done in an un- or self-supervised manner since I have no idea how to acquire the ground-truth for synchronization (maybe some lightbulb blinking pattern which encodes walltime...). CNN-based snychronization was attempted here: https://arxiv.org/pdf/1610.05985.pdf Even though nowadays one would probably want to use an N-pair loss instead of the triplet loss.

Did you also notice that the frame rate of the MOV file differs between videos? For my cameras it is 30 Hz, 29.87 Hz or 29.xx Hz for different MOV files (read out with ffprobe).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants