Skip to content
simonfuhrmann edited this page Oct 25, 2014 · 14 revisions

Wiki HomeMath Cookbook

Camera Conventions

The MVE camera conventions use common textbook notation, e.g. from the book "Multiple View Geometry in Computer Vision" by Hartley and Zisserman. The projection of a 3D point X in world coordinates to a 2D point x on the image plane in homogeneous coordinates computes as follows:

x = K * (R * X + t)

where K is the calibration matrix, R is the world to camera rotation matrix, and t is the camera translation vector. R and t are referred to as extrinsic camera parameters. The calibration matrix K is assembled from quantities referred to as intrinsic camera parameters, described below. The inverse projection from a 2D image coordinate x in homogeneous coordinates to a 3D point in world coordinates with respect to a depth d is computed as:

X = R^T * (K^-1 * x * d - t)

Extrinsic Parameters

The extrinsic parameters transform 3D points X in world coordinates into 3D points in camera coordinates X' = R * X + t. This transformation can also be applied using homogeneous coordinates X' = (R|t) * X where (R|t) is a 3x4 matrix. The translation vector is computed from the known camera center as t = -R * c. The camera center is computed from the known translation as c = -R-1 * t. The inverse of R can be obtained by transposing R (only if R is a proper rotation matrix, i.e. R-1 = RT). To transform a point in camera coordinates to world coordinates, the inverse world-to-camera, or camera-to-world, transformation is applied: X = R-1 * (X' - t).

The extrinsic parameters perform a transformation into the camera coordinate system. The camera coordinate system conventions are those of Hartley and Zisserman: The camera is looking along the positive z-axis, the x-axis goes to the left and the y-axis goes upwards.

Intrinsic Parameters

The calibration matrix is composed of the focal length of the camera, the principal point of the image plane, and the pixel aspect ratio. The focal length is normalized in the following way: Suppose the longer side of the image plane has length 1 in 3D space. Then the normalized focal length is the orthogonal distance from the camera center to the image plane. For example, the normalized focal length of a 70mm lens projecting on a 35mm sensor is 2.

For a 3D point X' in camera coordinates, the projection on the image plane is computed as x = p(K * X') where p(x') is a function that performs the central projection, i.e. divides by the third coordinate in order to get a point on the image plane at distance 1 from the camera center.

The calibration matrix K can directly be defined such that image coordinates are obtained. This is done by scaling the focal length with the largest dimension, i.e. with max(width, height), and setting the principal point to width / 2 and height / 2 respectively. This yields continuous coordinates on the image plane between (0,0) and (width, height). The center of pixel (0,0) is at (0.5, 0.5), i.e. the obtained coordinates on the image plane need to be subtracted by (0.5, 0.5) to obtain pixel coordinates.