Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to adjust alignment #32

Open
Funkcorner opened this issue Oct 14, 2020 · 19 comments
Open

Ability to adjust alignment #32

Funkcorner opened this issue Oct 14, 2020 · 19 comments

Comments

@Funkcorner
Copy link

The depth channel and the color channel are not fully aligned, or not on my set up. You can see the edges on the right of my body are very well followed, but on the left, they're not close. The wooden block I'm holding, and my hand, shows the difference between the two.

image

@Funkcorner
Copy link
Author

Funkcorner commented Oct 14, 2020

Just found the previous Issue about occlusion shadow.
Is this the same with the Kinect v2? (That's what I'm using)

@Funkcorner
Copy link
Author

I can see it must be an occlusion shadow - that makes sense now. Presumably the red lights in the centre of the Kinect v2 are the IR source.
If you can work out any way to apply a filter to left-hand edges, that would be awesome.

@SirLynix
Copy link
Owner

Hello,

Yes, what you're experiencing is a case of occlusion shadows, due to IR emitter and IR camera being one centimeter apart (and the same for IR camera and color camera).

I would love to filter this out, but it's not quite easy and to be honest it's also above my skillset. If you know any project using Kinect and filtering this out, please let me know, so I can look how they do this and do the same.

@Funkcorner
Copy link
Author

Hi,
If this page is right, the depth sensor reports 512 horizontal values per row, with each degree of vision being covered by around 7 pixels, or Z values.
I have a plan of how you could assign an artificial Z value to the areas in the occlusion shadow, which would help make the problem go away. It's similar to your depth-lag idea in 0.3, but instead of reaching back in time it would look a certain distance to the pixel's left and copy the z value from there in the current frame. How far to the left? I have a formula for that too, or at least I know how I'll come up with the formula... maybe I'll wait until I know it's possible before I sit down and write it! :D
I wrote out a load of "Can you ..." questions, but I guess the depth-lag function shows me that you can identify occlusion shadow pixels and you can assign an alternative z value to them on the fly.
I don't know C# (I think that's what you're using?), but maybe I can come up with the theory in English and you can code it?

@SirLynix
Copy link
Owner

I'm using C++, but yes I can try to implement whatever algorithm you're describing.

However, I'm not directly working with depth values, instead I'm working with depth values in the view space (or color space), using Kinect SDK functions to project depth pixels in the color range, so I'm not sure if this is applicable here.

You're right with the fact I can change Z values on the fly, however I can't identify occlusion shadow pixels (they're reported as "invalid depth", which happens a lot in the background or on some objects edge)

@Funkcorner
Copy link
Author

Does your depth-lag function apply to all pixels then, not just shadow pixels?
Looking at the z values live in Kinect Studio I can see that they 'fizz' in the shadow areas, there's noise - especially in the areas where the greenscreen function struggles the most. In my set up it's around my neck for some reason, but I don't know why there more than other places. So we probably need to identify the fizzing areas by looking back across the last 3 or 4 frames, like depth-lag does, to spot pixels that have rapid changes in z value.
Can you point me to the SDK documentation for the APIs/functions you're using so I can read up a bit?

@SirLynix
Copy link
Owner

No it applies to depth pixels that don't have a value for this frame.

The problem is that all depth pixels fluctuate, not just those on the edge, background pixels do that a lot. I can't think of a good way to identify them.

The Kinect SDK 2.0 documentation is hard to understand, I suggest reading some tutorials first.
https://homes.cs.washington.edu/~edzhang/tutorials/kinect2/kinect1.html
https://docs.microsoft.com/en-us/previous-versions/windows/kinect/dn758674(v=ieb.10)

@Funkcorner
Copy link
Author

Can we use that to identify them then? A fluctuating pixel, when considered over a period of 5 frames, will have a wide range of values in way that most pixels wouldn't. That would introduce lag I guess, but maybe not much.
Thanks for the links. If I'm reading it right, the z data is a float value? What is it when there's no value, -1?

Some of my logic might work best for static subjects, like people sat at desks on video calls or streaming gameplay. The function may not be good for scenes with lots of movement, but that doesn't mean it's not worth doing I reckon.

@SirLynix
Copy link
Owner

The problem is that fluctuating pixels are everywhere in the background, the further you are from the sensor, the more fluctuating pixels are. That's not an easy thing.

To be clear, here's what I have:

  • A color texture, BGRA8, classical stuff.
  • A depth texture, R16, each value is the distance between the sensor and the pixel in millimeters.

However those are not from the same perspective, so I have to generate a depth to color texture, mapping each color pixel to a depth pixel. This is a RG32F texture (each pixel has a floating value for X and Y, which results in -infinity if no mapping can be done for that pixel).

This is where depth occlusion comes from, from color pixels that don't have a valid depth. This is the case for the side of your hand/wooden block, but also the case for objects edges because depth is fluctuating (and this is also the case for the background).

So your idea isn't bad, I'm not telling that. It's just that I think it will detect the very fluctuating background pixels with the edge ones.

@Funkcorner
Copy link
Author

Thanks for the info, that's helpful.
I still think it's worth a try, because wherever a pixel has no z value, we could give it one that might do well enough, certainly for most green screening.

As a test, could you try:

If pixel has no z value {
If pixel X coordinate > 21 {
Z value = z value of pixel 21 places to the left
} else {
Z value = z value of first pixel in the row
}
}

21 pixels is approx 3° of angle on the depth sensor. That might be the wrong value but it's a first attempt.

I can work out more intelligent ways to set the fake z value, but this would prove (or disprove) the concept.

@SirLynix
Copy link
Owner

SirLynix commented Oct 18, 2020 via email

@Funkcorner
Copy link
Author

Interpolating may place pixels without a z value half way between the foreground and the background, which would maybe make the background algorithm struggle. But at this stage, it's probably worth trying all options.
I noticed that the background removal in Microsoft's SDK 1.8 has the exact same problem, so if we can fix this we're doing better than Microsoft. 😉

@Funkcorner
Copy link
Author

I know you're busy with a lot of cooler things, but I thought I'd share some stuff about the occlusion shadow correction in case you get a chance to code any of it.

Here is a table showing the width of an occlusion shadow in pixels (Kinect v2 with 7 pixels per degree):

image

It plots this Excel formula, where:
f is the distance in mm between the camera and the foreground (that's casting the shadow)
b is the distance in mm between the camera and the background
e is the distance in mm between the depth sensor and the nearest IR emitter - in my case 35mm on the Kv2

=CEILING(DEGREES( ACOS( ( ( 2 * b^2) - ( ( ( (b- f)/ f) * e) ^2 ) ) / ( 2 * b^ 2 ) ) ) * 7, 1)

In C++ I think that might be:

width = ceil( acos( ( ( 2 * b^2) - ( ( ( (b- f)/ f) * e) ^2 ) ) / ( 2 * b^ 2 ) ) * (180.0/3.141592653589793238463) * 7)

... but as I say, I'm not a C++ coder.

The '7' at the end is because the Kv2 has 7 pixels per degree, so it converts degrees to pixels for the Kv2. Other devices would need a different value here

I had plans for using this info as part of the code for fixing occlusion shadows, but actually it may not be very useful there. But if nothing else, it helps describe the size of the shadows - I guessed at 21 pixels and in fact that's quite close - if your foreground is further away than 600mm, 21 pixels should be the right setting to fill any shadow.

I can think of lots of complex ways to calculate missing z values, but so far the idea of "use the closest valid z value from the 21 depth pixels to the left" is probably the best place to start. We can try getting more complex if that doesn't work.

I can see you're doing lots of great stuff elsewhere in the plugin, so you might not get this into 0.3, but if you're able to find some time I'd love to help test output with you.

@SirLynix
Copy link
Owner

Wow, that's a lot of cool data! Thank you.

I'll try to fetch 21 pixels to the left soon, to see how it can improves occlusion shadows. But I'm wondering, would it be viable for a generic algorithm to just try to fetch the first valid depth value on the left?

@Funkcorner
Copy link
Author

Funkcorner commented Oct 22, 2020 via email

@Funkcorner
Copy link
Author

Just checking in - have you had a chance to try this at all?
Looking forward to the next release...

@SirLynix
Copy link
Owner

SirLynix commented Nov 5, 2020

Hi, sorry I am kind of busy with other projects. I ran a really quick try of the "fetch the first valid pixel to the left" solution which didn't change a thing. I still have to give it another try soon!

@Funkcorner
Copy link
Author

Just found this, almost by accident:

https://ieeexplore.ieee.org/document/6264232

Summary:
We propose a method for filtering the occlusion areas and small holes on the depth map generated by the Kinect depth camera. This approach uses the original RGB image provided by the conventional Kinect camera and the depth map. We extract the moving body using the original image and the background differentials, then applying a 4-neighbor-pixels-interpolation to the none-body areas before filling the body areas. The proposed method can fill the occlusion areas effectively and remain the details of edges accurately, which can be used as a preprocessing stage before using the depth data of Kinect.

So a similar approach, using the value of neighbouring pixels to fill gaps.

@SirLynix
Copy link
Owner

Looks cool! It looks like something I did experience in the depth processing branch, something I began working on but which didn't yield a lot of results. But I'm thinking of a different approach that might help.

The big problem I have with Kinect v1 and v2 is that "color to depth mapping" part, where Kinect v3 is able to directly map the depth map in the color space. Maybe I should try to fix the mapped depth image, that should help. Will look soon (I'm kinda busy for the moment with other projects but I will work on it in a few days).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants