Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Projection matrices need variants for Vulkan and have wrong docs #25

Open
cjay opened this issue Feb 27, 2020 · 9 comments
Open

Projection matrices need variants for Vulkan and have wrong docs #25

cjay opened this issue Feb 27, 2020 · 9 comments

Comments

@cjay
Copy link

cjay commented Feb 27, 2020

In Numeric.Matrix: perspective and orthogonal currently follow OpenGL conventions. They require negative near and far values, otherwise they produce bogus Z values and messed up signs. The haddock comment on perspective states "Near plane clipping distance (always positive)" which is wrong. I suppose the OpenGL convention sees them as z-coordinates of the planes in screen space instead of distances to the camera.

They also project onto Z values from -1 to 1. For Vulkan, they need to produce a Z range from 0 to 1.

The GLM library has different defines for those variants, but I don't believe compile time flags are a good solution. I suggest splitting the projections up into variants and naming them accordingly.

@cjay
Copy link
Author

cjay commented Feb 28, 2020

I think what I did there has the correct matrices for Vulkan, I'm not 100% sure though. The scale function would also be a good addition.

@achirkin
Copy link
Owner

achirkin commented Feb 29, 2020

Thanks for raising this issue! The coordinate transformations is a rather tricky question due to the abudance of tutorials following different conventions. I will try to recall and write down here my thoughts when I was implementing these matrices first time. Please, correct me where appropritate :)

The source for my formulae is mostly the GLSL wikibooks.

First of all, I always stick the right-hand coordinate system, be it camera, clip, NDC, or screen space.

Here are all coordinate systems (x, y, z, w):

  • Camera space:

    • x: negative left -> positive right
    • y: negative bottom -> positive up
    • z: negative forward -> positive back.
      That is, a lower (negative) z value means further away from the viewer (camera). Points with positive z are behind the camera and, thus, invisible.
  • Clip space:
    The same as camera space.
    In the clip space, all points outside of the clip bounds are removed.
    The clip bounds in OpenGL pipeline are -w < x < w, -w < y < w, -w < z < w;
    in Vulkan -w < x < w, -w < y < w, 0 < z < w.

  • NDC space:
    The fourth coordinate w is removed by "perspective division" (divide the rest coordinates by w).

  • Screen space:
    z is kind-of ignored, so one can assume it directed either forwads or backwards.
    OpenGL:

    • x: negative left -> positive right
    • y: negative bottom -> positive up

    Vulkan:

    • x: negative left -> positive right
    • y: negative top-> positive down

    That is, to preseve right-hand rule, one can assume, z is directed "backwards" in OpenGL and "forwards" in Vulkan. This can be easily controlled via OpenGL or Vulkan depth comparison checks.

This was the theory how I understood it. However, a simple check shows that the perspective and orthogonal transforms in the GLSL wikibooks flip the z coordinate, so that in the clip space, the far plane (z = -f) is mapped to w and the near plane (z = -n) is mapped to -w, which makes it a left-handed coordinate system. This also has slipped through our tests.

To this moment, I don't think we have many users of these functions. So, I guess, we can change the behavior of the two functions to not flip the z coordinate. What do you think, @Rotaerk , @cjay ?

As for the Vulkan vs OpenGL clip space, we can solve it two ways: either write another pair of transforms for Vulkan, or write a transform for switch between different clip/NDC spaces. The latter is particularly simple:

openGL2VulkanNDC :: Mat44f
openGL2VulkanNDC = mkMat
  1   0   0   0
  0   1   0   0
  0   0  0.5  0
  0   0  0.5  1

So that e.g. perspectiveVk == perspective %* openGL2VulkanNDC .

@Rotaerk
Copy link
Contributor

Rotaerk commented Feb 29, 2020

What I did in my vulkan-tutorial.com implementation was: https://github.com/Rotaerk/vulkanTest/blob/master/vulkan-tutorial/main/src/Main.hs#L1535-L1555. Looks like it's the transpose of the one you suggested above, because I'm putting the glToVk on the left side of the %*, but I've also got a negative in the Y-multiplier.

Anyway, OpenGL NDC are inherently left-handed, with +X being to the right, +Y being up, and +Z being forward. In Vulkan, the NDC are right-handed because they flipped +Y to being down, with the direction of the other two axes untouched. (They also changed the visible segment of the Z axis, but that doesn't change the handedness.) See https://stackoverflow.com/questions/4124041/is-opengl-coordinate-system-left-handed-or-right-handed and https://matthewwellings.com/blog/the-new-vulkan-coordinate-system/ for more info on this stuff.

I don't know why the standard GL approach is to work in right-handed coordinates for object and world space, given that the NDC is left-handed, but as a result, it's necessary to transform to left-handed when moving to clip space (the space just before the automatic w-divide that moves you to NDC). In general the projection matrices exist to transform from camera space to clip space, so in addition to applying transforms like perspective, it also transforms into the clip space coordinate system, so the conversion from right to left is baked in.

So this flip from right to left is a feature, not a bug. But perhaps it would be a possible improvement to divorce the orthographic vs perspective transforms from the transform to clip space? As in you can have a "easytensor clip space" that the projection matrix puts you in, and then have toVulkan and toGL matrices. I'm not sure if that even makes sense though ... You'd probably end up making easytensor clip space match, say, Vulkan, and then your toVulkan would just be an identity matrix...

@cjay
Copy link
Author

cjay commented Mar 1, 2020

I don't know why the standard GL approach is to work in right-handed coordinates for object and world space

I wasn't even aware of that (or had forgotten). Does this approach live on for most Vulkan code, with +Y upwards and -Z forwards?

Independently of how it's implemented, I think the most important thing is that the documentation is very clear about the assumptions and has a few sentences to give graphics newbies some idea about what they might be doing wrong ("your world space is expected to be oriented like this, and if you're using Vulkan typical use looks like this ..").

I think toVulkan/toGL conversion matrices aren't ideal, though still good enough. Someone might change their projection matrix every frame to get a smooth zoom effect, so they would need to make one or two extra matrix multiplications every frame.

@Rotaerk
Copy link
Contributor

Rotaerk commented Mar 1, 2020

@cjay I might not be accurate in calling it a "standard", but it's at least the case that the matrices in the GLSL wiki that achirkin linked are designed around a right-handed system. But people are under no requirement to use that, and maybe there are other common libraries that work in the left-handed system, so perhaps it's not a "standard".

@Rotaerk
Copy link
Contributor

Rotaerk commented Mar 1, 2020

I asked for suggestions in a graphics development channel, and what I got out of it was that the different APIs can be configured to use different NDC systems, so a matrix library can just pick a system and stick with it. One person said that the community-preferred system is that of D3D: Z ranges from 0 to 1 (like vulkan; unlike GL), Y is up (like GL; unlike vulkan), and where the viewport origin is top-left (like Vulkan; unlike GL). There are even extensions to GL and Vulkan to allow them to behave like D3D. (As far as I know, this D3D preference might just be his biased opinion...)

The idea of having a "perspectiveVK" and a "perspectiveGL" was generally disliked. Though if we wanted to support multiple NDC systems, we could perhaps parameterize it around the direction of Y and the visible range of Z. I'm unclear on whether the viewport origin matters with regards to the matrices.

@Rotaerk
Copy link
Contributor

Rotaerk commented Mar 1, 2020

I got some clarification: Apparently the concern here is that engines tend to be abstracted and built upon multiple APIs, but the same matrix transforms must work for all. In practice, they tend expose the D3D convention for normalized device coordinates, so GL and Vulkan need to be configured to make sense of this.

D3D's NDC is: +Y is up, Z is [0, 1], and viewport origin is top left. Note that this is a left-handed system. (Can still use right-handed prior to clip space though.)

GL's convention is: +Y is up, Z is [-1, 1], and viewport origin is bottom left. To match D3D's convention, the ARB_clip_control extension can be used to change the Z range to [0, 1], and to move the viewport origin to the top left. Without this extension, you'd have to add a transform in matrix to do (Z + 1)/2 and -Y. (+Y may be up, but it's effectively down since the viewport origin is at the bottom.)

Vulkan's convention is: +Y is down, Z is [0, 1], and viewport origin is top left. To match D3D's convention, the only change required is to make +Y be up. To avoid putting it in your matrix, this can be done by setting the viewport's height to a negative value. (See:
https://www.saschawillems.de/blog/2019/03/29/flipping-the-vulkan-viewport/)

So if we have to pick a convention, I think we should go with D3D's. (I got some others' feedback too; they agree with the D3D preference.) But if it can be done without runtime cost, it might be better to parameterize the projection matrices by Y-direction, Z-range, and viewport origin.

Edit: Actually, is there any reason to take Y direction and viewport origin separately, rather than just taking the effective Y direction? I think the only effect of viewport origin on matrices is that it can flip Y.

@achirkin
Copy link
Owner

achirkin commented Mar 5, 2020

Thanks guys for the fruitful discussion!

It looks to me that we have two problems to sort out: projection transform (from view to clip coordinates) and the meaning (orientation) of the clip (/NDC/viewport) coordinates. If we were to parametrize the projection matrices for different graphics apis, we could do that kind of zero-cost by adding one more type parameter to the HomTransform4 class and make a few specialized instances, something like HomTransform4 VulkanAPI Double or HomTransform4 D3dAPI Double (or with Y and Z range/direction markers in place of api names). On the other hand, I like the idea of an abstract "easytensor clip space": keep the transforms in easytensor clip space and provide a few helper transforms for different apis. I don't think it's worth to optimize cpu cycles here, because the projection transform is not supposed to be re-computed very often anyway.

I think most important takeaway here is to write a very detailed and clear documentation on how the transforms should be used :) I'll try to do that next time I have some free time.

Next, I'd like to adjust the clip space used in current implementation and add the api-specific helpers.
It's pretty annoying to me that the current transforms swap Z axis and make the clip coordinates left-handed.
A more debatable question is what would be our choice of Z range? I like the [-1, 1] version, because it's mathematically more elegant for being symmetric w.r.t. other axes.

So, what do you think?

A. Use type class specialization and type-level markers to parametrize over api
B. Split projection into two and provide helpers:
    a. `Z` range `[-1, 1]`
    b. `Z` range `[0, 1]`
    c. other?..

@Rotaerk
Copy link
Contributor

Rotaerk commented Mar 5, 2020

As @cjay mentioned, there might be cases where you do recompute the projection transform over and over, so it would be better to pick a convention that the APIs can generally be configured to match (like I described with my last post) rather than adding in an additional transform to the API's convention via matrix multiplication. As for what that convention should be, the graphics people I spoke to seemed pretty adamant that D3D's was the preferred one. I'm not sure if D3D can be configured to match GL's/Vulkan's convention, but the reverse is true.

One person mentioned that the Z range of [0,1] offers the advantage that it "makes it nearly trivial to flip the depth buffer, for superior precision at extreme distances". It also makes sense to me that, normally, the greatest precision is located closest to the camera, rather than halfway through the visible volume. Also, is it necessarily less elegant? If you think of the eye being at (0,0,0), it naturally wouldn't see behind it into the -Z range. The space is still symmetrical; you just can't see the back half.

I don't know of any technical reason to prefer one +Y direction over another, but the community seems to like up. I'm guessing this is because +Y is usually up whenever you imagine cartesian axes?

To me left-handed makes the most sense for clip-space/NDC, since that's the natural result of +X being right, +Y being up, and +Z being forward (increasing depth into the screen). As for whether it's left- or right-handed before clip-space, it doesn't matter to me. It might be less confusing if it is left-handed as well, but if there's some benefit to right-handed coordinates, I don't think that's crucial.

Anyway, all that's for if you pick just one convention for 'easytensor clip space'. But if you want to parameterize it instead, I'd suggest 4 possibilities: the two Z ranges times the two Y directions. My only concerns with that really are any run-time impact or code-duplication involved, and whether it's flexible enough, if there are other conventions I've not heard of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants