Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coordinate system conversion doesn't work as expected #2

Open
alexeystrakh opened this issue Jul 3, 2017 · 6 comments
Open

coordinate system conversion doesn't work as expected #2

alexeystrakh opened this issue Jul 3, 2017 · 6 comments

Comments

@alexeystrakh
Copy link

I'm trying to make UIKit <-> AV <-> Vision coordinate systems work together and your example of conversion makes sense. Unfortunately, when I'm trying to apply it to face detection approach and draw a rectangular of a detected face the coordinates are completely off

for example, when a user taps on screen coordinates x: 200 y: 100 I'm drawing a box at that position with width: 100, height: 150

let tappedAt = sender.location(in: self.cameraView)
let uiBox = CGRect(x: tappedAt.x, y: tappedAt.y, width: 100, height: 150);
let avBox = self.cameraLayer.metadataOutputRectConverted(fromLayerRect: uiBox)
let vnBox = CGRect(x: avBox.origin.x, y: 1 - avBox.origin.y, width: avBox.width, height: avBox.height)
        
print("user tapped at: ", tappedAt.x, tappedAt.y)
print(String(format: "-> UI box | x:%.01f y:%.01f w:%.01f h:%.01f", uiBox.origin.x, uiBox.origin.y, uiBox.width, uiBox.height))
printMsg(String(format: "-> AV box | x:%.01f y:%.01f w:%.01f h:%.01f", avBox.origin.x, avBox.origin.y, vnBox.width, avBox.height))
print(String(format: "-> VN box | x:%.01f y:%.01f w:%.01f h:%.01f", vnBox.origin.x, vnBox.origin.y, vnBox.width, vnBox.height))

it gives the following output:

user tapped at:  200.0 100.0
-> UI box | x:200.0 y:100.0 w:100.0 h:150.0
-> AV box | x:0.1 y:0.6 w:0.4 h:0.3
-> VN box | x:0.1 y:0.4 w:0.4 h:0.3
  • why the width and height are flipped (width of the UI box corresponds to the height of the AV/VN boxes)?
  • why x for the UI box corresponds to y of the AV/VN box (I tapped at x: 200 but it affects the y part of AV/NV box, not x)?

It looks like the coordinate system is flipped but I'm unable to come up with the specific system to convert coordinates properly and draw the bounding box for a detected face

@alexeystrakh
Copy link
Author

I was able to transform it via flip and shift to width/height but I supposed it shouldn't be that tricky:

let vnBox = newObservation.boundingBox
print(String(format: "-> VN box | x:%.01f y:%.01f w:%.01f h:%.01f", vnBox.origin.x, vnBox.origin.y, vnBox.width, vnBox.height))
let avBox = CGRect(x: 1 - (vnBox.origin.y + vnBox.width), y: 1 - (vnBox.origin.x + vnBox.height), width: vnBox.width, height: vnBox.height)
print(String(format: "-> AV box | x:%.01f y:%.01f w:%.01f h:%.01f", avBox.origin.x, avBox.origin.y, avBox.width, avBox.height))
let uiBox = self.cameraLayer.layerRectConverted(fromMetadataOutputRect: avBox)
print(String(format: "-> UI box | x:%.01f y:%.01f w:%.01f h:%.01f", uiBox.origin.x, uiBox.origin.y, uiBox.width, uiBox.height))

img_2917

@jeffreybergier
Copy link
Owner

AFAIR, you only need to flip the Y origin to switch back and forth between AVFoundation space and Vision space. I'm not sure why you're having to flip the the X origin as well. However, that picture is so symmetrical in the X direction, it can be hard to tell if you're flipping it right. Try finding a photo that has only 1 face in one corner of the photo. That way you know you're doing the flipping correctly.

@kasimok
Copy link

kasimok commented Aug 20, 2017

@alexeystrakh Hey Alex,
The same issue occurred while writing another app when i want to track a rect whose width != height.

The looks like in your app, your device orientation is portrait. But the issue is the VISION framework seems to understand the camera in a "landscape" mode. In this way the width of the rect(in portrait mode) becomes the height of the rect in landscape mode.

covert your uibox by convertRectToHorizontal defines below.

let kWidth = UIScreen.main.bounds.width
let kHeight = UIScreen.main.bounds.height
func convertRectToHorizontal(rect: CGRect) -> CGRect{
    return CGRect.init(x: rect.minY, y: kWidth - rect.origin.x - rect.width, width: rect.height, height: rect.width)
}

and you will get what you expected.

@Briahas
Copy link

Briahas commented Sep 21, 2017

same for me:

var transformedRect = newObservation.boundingBox
transformedRect.origin.y = 1 - transformedRect.origin.y
let convertedRect = self.cameraLayer.layerRectConverted(fromMetadataOutputRect: transformedRect)

doesnt work as expected

resulted rects lower than expected
I use next code instead:

let rectWidth = source.size.width * boundingRect.size.width
let rectHeight = source.size.height * boundingRect.size.height
let rect = CGRect(x: 0, y:0, width: source.size.width, height: source.size.height)

@Pyroh
Copy link

Pyroh commented Oct 13, 2017

The best way to solve this is to use affine transform :

let t = CGAffineTransform(translationX: 0.5, y: 0.5)
            .rotated(by: CGFloat.pi / 2)
            .translatedBy(x: -0.5, y: -0.5)
            .translatedBy(x: 1.0, y: 0)
            .scaledBy(x: -1, y: 1)
var box = obs.boundingBox.applying(t)
box = previewLayer.layerRectConverted(fromMetadataOutputRect: box)

Note that I didn't even tried to optimize and refactor the affine transform.

@jeffreybergier
Copy link
Owner

It looks like apple may have a solution for the rectangle problem. I just saw it today and haven't tried it. Might be worth a look though: https://developer.apple.com/documentation/vision/2908993-vnimagerectfornormalizedrect?language=objc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants