Skip to content

ENRICH: multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetry

Notifications You must be signed in to change notification settings

davidemarelli/ENRICH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

ENRICH: multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetry

Authors: Davide Marelli ¹, Luca Morelli ² ³, Elisa Mariarosaria Farella ², Simone Bianco ¹, Gianluigi Ciocca ¹, Fabio Remondino ²

¹ Imaging and Vision Laboratory (IVL), Department of Informatics, Systems and Communication, University of Milano-Bicocca, Viale Sarca 336, Milano, 20126, Italy
² 3D Optical Metrology (3DOM) unit, Bruno Kessler Foundation (FBK), Via Sommarive 18, Trento, 38123, Italy
³ Dept. of Civil, Environmental and Mechanical Engineering (DICAM), University of Trento, Via Mesiano 77, Trento, 38123, Italy

Version 1

  📂 Download ENRICH

  📄 Read the ENRICH paper

This file contains information about the structure and contents on the ENRICH datasets. Please refer to the related paper for information about the generation method and the purpose of ENRICH.

Folder structure

Each zip file in the root is relative to a specific dataset:

  • ENRICH-Aerial is an aerial image block of the city of Launceston, Australia. The acquisition is performed by simulating a typical oblique aerial camera with five views (nadir and four oblique views).
  • ENRICH-Square is a ground-level dataset of a square captured by four cameras, each one moving on a different path, with different focal length, orientation, and lighting conditions.
  • ENRICH-Statue is a ground-level dataset portraying a statue (placed in the center of the ENRICH-Square scene), acquired using four cameras.

Each dataset uses the following folder structure:

ENRICH-<Dataset Name>
|   README.pdf
│   cameras.csv            # cameras extrinsic parameters in Comma-Separated Values (CSV) format
│   cameras.tsv            # cameras extrinsic parameters in Tab-Separated Values (TSV) format
│   gcp_images_list.csv    # Ground Control Points (GCPs) visibility and 2D image location, CSV format
│   gcp_images_list.tsv    # GCPs visibility and 2D image location, TSV format
│   gcp_list.csv           # 3D location of GCPs, CSV format
│   gcp_list.tsv           # 3D location of GCPs, TSV format
│   gcp_placement.jpg      # preview image of GCPs placement in the 3D scene
│
├── 3D
│   ├── geometry
│   │   │   <Dataset Name>.obj                # 3D model in Wavefront OBJ format
│   │
│   └── geometry+materials
│       │   <Dataset Name>.obj                # 3D model in Wavefront OBJ format
│       │   <Dataset Name>.mtl                # materials definition library file
│       │   ... <texture files> ...           # additional texture files
│
├── depth
│   ├── exr
│   │   │   <Frame>_<Camera Name>_depth.exr   # 32bit float depth
│   │   │   ...
│   │
│   └── png-preview
│       │   <Frame>_<Camera Name>_depth.png   # colorized depth preview
│       │   ...
│       │   lut.png                           # colors look-up table
│
└── images
    │   <Frame>_<Camera Name>.jpg             # RGB images
    │   ...

Files content

cameras.csv/tsv

This file describes the pose of the cameras using the following fields:

label      ── string, the filename of the image to which the entry refers, file extension included
position_x ┐
position_y ├─ three float numbers representing the global position of the camera
position_z ┘
omega      ┐
phi        ├─ Omega, Phi, Kappa angles defining the rotation of the camera. Three floats, angles in radians
kappa      ┘
yaw        ┐
pitch      ├─ Yaw, Pitch, Roll angles defining the rotation of the camera. Three floats, angles in radians
roll       ┘
rotation_w ┐
rotation_x ├─ four floats, representing the camera rotation quaternion
rotation_y │
rotation_z ┘
lookat_x   ┐
lookat_y   ├─ three floats, representing the camera look-at direction vector
lookat_z   ┘

The values are defined according to a global coordinate reference system, having X growing right/east, Y growing forward/north, and Z growing upward/zenith. Transformations are meant to translate and rotate the camera in this global reference system.
Omega, Phi, and Kappa are counterclockwise (CCW) local rotations along the X, Y, and Z axes, applied in the following order $R=R_x \cdot R_y \cdot R_z$.
Pitch and Roll are CCW local rotations along the X and Y axes respectively, and Yaw is a clockwise (CW) local rotation along the Z axis. The order of application is $R=R_z \cdot R_x \cdot R_y$.

gcp_images_list.csv

This file reports the 2D coordinate of the GCP center in the images where the GCP is visible. The fields are:

image_name ── string, the filename of the image that portrays the GCP
gcp_name   ── string, unique identifier of the GCP
image_u    ┬─ two floats, u and v coordinate of the center of the GCP in the image
image_v    ┘

The GCP coordinates are relative to its center; coordinates start at $(0,0)$, which is the top-left corner of the top-left pixel in the image, and end at $(6016,4016)$, which is the bottom-right corner of the bottom-right pixel in the image. $u$ grows right, and $v$ grows downward.

gcp_list.csv

This file contains the position of the GCPs in the scene. The available fields are:

gcp_name   ── string, unique identifier of the GCP
x_east     ┐
y_north    ├─ three floats, representing the global coordinates of the center of the GCP in the scene
z_altitude ┘
type       ── string, shape of the GCP, may be either `cross` or `round`

The 3D global coordinate reference system is defined as in Section cameras.csv/tsv.

3D geometry

3D geometry for all datasets is provided as Wavefront OBJ files. Folder ENRICH-<Dataset Name>/3D/geometry includes only an OBJ file with the 3D geometry of the scene. Folder ENRICH-<Dataset Name>/3D/geometry+materials includes the geometry and the materials (MTL definition file and textures) as well.

Depths

EXR files are used to store the depth ground truth for each color image. Metric depths are available as 32bit float values in folder ENRICH-<Dataset Name>/depth/exr.

Colorized depth previews are stored as PNG files (folder ENRICH-<Dataset Name>/depth/png-preview).
The lookup table with the color used is stored in the file ENRICH-<Dataset Name>/depth/png-preview/lut.png.
The ENRICH-Statue and ENRICH-Square datasets use a logarithmic color map that covers the range $[0,600]$ meters applying a different color every 0.25 meters of depth. The ENRICH-Aerial uses instead a linear color map that covers the range $[114,300]$ meters applying a different color every 0.25 meters. In all cases, black color is used for pixels whose depth is at infinity (i.e. sky).

Images

All the images are acquired at resolution 6016×4016px (24MP) by a virtual perspective (pinhole) camera that uses the configuration of the Nikon D750 DSLR full-frame camera with sensor size 35.9×24mm and a pixel size of 5.95µm. The images do not present lens distortion.

Focal Length, as well as image count, orientation, lighting setup, and average Ground Sample Distance (GSD), are summarized in the Table of Section Acquisition setup.

All the images are available as JPEG files with the following metadata (EXIF version 2.30) fields set:

Exif:Make                     ── NIKON CORPORATION
Exif:Model                    ── NIKON D750
Exif:XResolution              ── 1
Exif:YResolution              ── 1
Exif:FocalLength              ── <varying>mm
Exif:FocalLengthIn35mmFormat  ── <varying>mm
Exif:FocalPlaneXResolution    ── 1680
Exif:FocalPlaneYResolution    ── 1680
Exif:FocalPlaneResolutionUnit ── cm
Exif:ExifImageWidth           ── 6016px
Exif:ExifImageHeight          ── 4016px
Exif:ExifVersion              ── 0230

XMP:GPSLatitude               ── <varying>m
XMP:GPSLongitude              ── <varying>m
XMP:GPSAltitude               ── <varying>m
XMP:GPSAltitudeRef            ── Above Sea Level

GPS:GPSAltitude               ── <varying>m
GPS:GPSAltitudeRef            ── Above Sea Level
GPS:GPSImgDirectionRef        ── True North
GPS:GPSImgDirection           ── [0, 359.99]°, Yaw
GPS:GPSPitch                  ── [-180, +180]°, Pitch
GPS:GPSRoll                   ── [-180, +180]°, Roll

For more information about the metadata refer to Cipa EXIF specification.
The 3D coordinate reference system is defined as in Section cameras.csv/tsv.
GPS fields are used to store coordinates in that reference system; no actual GPS coordinates are used.

Acquisition setup

The following subsections describe the scene and camera setup used in the three datasets of ENRICH.
Focal length, image count, orientation, lighting setup, and average Ground Sample Distance (GSD) are summarized in the Table here below.

Dataset Camera Focal Length Images Orientation Lighting Setup GSD
Enrich-Aerial nadir 35mm / 5,882px 60 Landscape Uniform light 2.5cm
forward 70mm / 11,764px 60 Landscape Uniform light 1.8cm
backward 70mm / 11,764px 60 Landscape Uniform light 1.8cm
right 70mm / 11,764px 60 Landscape Uniform light 1.8cm
left 70mm / 11,764px 60 Landscape Uniform light 1.8cm
nadir 2 35mm / 5,882px 39 Landscape Uniform light 3.0cm
ENRICH-Square camera 1 35mm / 5,882px 50 Landscape & Portrait Partly cloudy 0.8cm
camera 2 50mm / 8,403px 50 Landscape Clear sky 0.5cm
camera 3 35mm / 5,882px 50 Landscape & Portrait Sunrise 0.8cm
camera 4 35mm / 5,882px 50 Landscape Clear sky 1.0cm
ENRICH-Statue camera 1 50mm / 8,403px 50 Landscape Partly cloudy 0.69mm
camera 2 35mm / 5,882px 50 Portrait Clear sky 0.64mm
camera 3 50mm / 8,403px 50 Landscape Sunrise 0.70mm
camera 4 35mm / 5,882px 50 Portrait Cloudy 0.64mm

ENRICH-Aerial

The ENRICH-Aerial dataset is generated from an aerial image block of the city of Launceston, Australia. A total of 26 GCPs of size 50×50cm are positioned in the scene on flat or almost-flat surfaces at different elevations. The acquisition is performed by simulating a typical oblique aerial camera with five views: one nadir and four oblique views (forward, backward, left, and right). The oblique cameras have an angle of 45° w.r.t. the nadir direction. The five cameras are rigidly mounted on a virtual flying platform at the same altitude, with the oblique ones having a 20cm padding from the nadir camera in their viewing direction. The image acquisitions followed six parallel strips, with ten acquisition points in each track, providing a total of 300 images. A second acquisition pass orthogonal to the first one, is performed using only the nadir camera (nadir-2). This pass uses three parallel strips with 13 images each, for a total of additional 39 images. In both cases, the image overlap for the nadir images is 80% along and 60% across the track respectively. The flying heights are approximately 150m and 175m above the ground.

Sample images from the ENRICH-Aerial dataset

ENRICH-Square

The ENRICH-Square is a ground-level dataset of a square captured by four cameras. The scene comprises facades of various monumental buildings surrounding the square, trees, and statues. A total of 54 cross pattern GCPs (size 15×15cm) are placed on the facades of the buildings at different heights. Camera 1 (25 landscape and 25 portrait images) follows a circular path of 5m radius around the center of the square with the camera watching through the center of the circle. Camera 2 follows two different circles (2m radius and 3.4m elevation, 6.25m radius and 4m elevation), looking directly toward the buildings. Camera 3 uses the same configuration as the first one (poses slightly different), the images are acquired at sunrise with a predominant orange color and strong shadows. Camera 4 follows the border of the square taking pictures of its opposite side from 1.3m above the ground.

Sample images from the ENRICH-Square dataset

ENRICH-Statue

The ENRICH-Statue dataset uses the same virtual setup of ENRICH-Square, with an additional statue placed at the center of the square. In this dataset 12 cross pattern GCPs of size 2×2cm are placed directly on the statue and its basement. Four cameras acquired 200 images (50 images each). Camera 1 and Camera 3 captured landscape images rotating around the statue (radius 3.75m), looking at it slightly from the bottom. Camera 2 and Camera 4 rotate around (radius 2.25m) the statue looking at it in portrait orientation from slightly above.

Sample images from the ENRICH-Statue dataset

Copyright & license

© 2022 Imaging and Vision Laboratory (IVL) - University of Milano-Bicocca, 3D Optical Metrology (3DOM) unit - Bruno Kessler Foundation (FBK).
All rights reserved.

Permission is hereby granted, without written agreement, to use, copy, and modify the ENRICH databases under the terms of the CC BY-NC 3.0 license, provided that the copyright notice in its entirety appears in all copies of this database. You may also publish the images in journals to report research results without any further permission. Any other commercial distribution or publication of the data, in original or modified form, requires prior written permission from the copyright owners.

Acknowledgements

This research was supported by grants from NVIDIA and utilized NVIDIA Quadro RTX 6000.
This work was also partly supported by the project “AI@TN” funded by the Autonomous Province of Trento (Italy).
Authors are thankful to Michele Welponer (3DOM-FBK) for contributing in the preparation of an initial 3D scene the further elaborated and included in ENRICH.

We would like to thank the authors of the 3D models used in the virtual scene of our datasets.
The 3D meshes have been simplified and cut for use in our datasets. All 3D models were downloaded in 2020. Source URLs, authors, licenses, and techniques used to create the models are listed here below.

Citation

If you use ENRICH, please cite it:

@article{enrich2023,
    title = {ENRICH: multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetry},
    author = {Davide Marelli and Luca Morelli and Elisa Mariarosaria Farella and Simone Bianco and Gianluigi Ciocca and Fabio Remondino},
    journal = {ISPRS Journal of Photogrammetry and Remote Sensing},
    volume = {198},
    pages = {84-98},
    year = {2023},
    issn = {0924-2716},
    doi = {https://doi.org/10.1016/j.isprsjprs.2023.03.002}
}