Spaces:
Build error
Build error
Taken from download link found at: http://www.cvlibs.net/datasets/kitti/eval_tracking.php | |
########################################################################### | |
# THE KITTI VISION BENCHMARK SUITE: TRACKING BENCHMARK # | |
# Andreas Geiger Philip Lenz Raquel Urtasun # | |
# Karlsruhe Institute of Technology # | |
# Toyota Technological Institute at Chicago # | |
# www.cvlibs.net # | |
########################################################################### | |
For recent updates see http://www.cvlibs.net/datasets/kitti/eval_tracking.php. | |
This file describes the KITTI tracking benchmarks, consisting of 21 | |
training sequences and 29 test sequences. | |
Despite the fact that we have labeled 8 different classes, only the classes | |
'Car' and 'Pedestrian' are evaluated in our benchmark, as only for those | |
classes enough instances for a comprehensive evaluation have been labeled. | |
The labeling process has been performed in two steps: First we hired a set | |
of annotators, to label 3D bounding boxes for tracklets in 3D Velodyne | |
point clouds. Since for a pedestrian tracklet, a single 3D bounding box | |
tracklet (dimensions have been fixed) often fits badly, we additionally | |
labeled the left/right boundaries of each object by making use of Mechanical | |
Turk. We also collected labels of the object's occlusion state, and computed | |
the object's truncation via backprojecting a car/pedestrian model into the | |
image plane. | |
NOTE: WHEN SUBMITTING RESULTS, PLEASE STORE THEM IN THE SAME DATA FORMAT IN | |
WHICH THE GROUND TRUTH DATA IS PROVIDED (SEE BELOW), USING THE FILE NAMES | |
0000.txt 0001.txt ... CREATE A ZIP ARCHIVE OF THEM AND STORE YOUR | |
RESULTS (ONLY THE RESULTS OF THE TEST SET) IN ITS ROOT FOLDER. FOR A | |
RE-SUBMISSION, _ONLY_ THE RE-SUBMITTED RESULTS WILL BE SHOWN IN THE TABLE. | |
Data Format Description | |
======================= | |
The data for training and testing can be found in the corresponding folders. | |
The sub-folders are structured as follows: | |
- image_02/%04d/ contains the left color camera sequence images (png) | |
- image_03/%04d/ contains the right color camera sequence images (png) | |
- label_02/ contains the left color camera label files (plain text files) | |
- calib/ contains the calibration for all four cameras (plain text files) | |
The label files contain the following information. | |
All values (numerical or strings) are separated via spaces, each row | |
corresponds to one object. The 17 columns represent: | |
#Values Name Description | |
---------------------------------------------------------------------------- | |
1 frame Frame within the sequence where the object appearers | |
1 track id Unique tracking id of this object within this sequence | |
1 type Describes the type of object: 'Car', 'Van', 'Truck', | |
'Pedestrian', 'Person_sitting', 'Cyclist', 'Tram', | |
'Misc' or 'DontCare' | |
1 truncated Integer (0,1,2) indicating the level of truncation. | |
Note that this is in contrast to the object detection | |
benchmark where truncation is a float in [0,1]. | |
1 occluded Integer (0,1,2,3) indicating occlusion state: | |
0 = fully visible, 1 = partly occluded | |
2 = largely occluded, 3 = unknown | |
1 alpha Observation angle of object, ranging [-pi..pi] | |
4 bbox 2D bounding box of object in the image (0-based index): | |
contains left, top, right, bottom pixel coordinates | |
3 dimensions 3D object dimensions: height, width, length (in meters) | |
3 location 3D object location x,y,z in camera coordinates (in meters) | |
1 rotation_y Rotation ry around Y-axis in camera coordinates [-pi..pi] | |
1 score Only for results: Float, indicating confidence in | |
detection, needed for p/r curves, higher is better. | |
Here, 'DontCare' labels denote regions in which objects have not been labeled, | |
for example because they have been too far away from the laser scanner. To | |
prevent such objects from being counted as false positives our evaluation | |
script will ignore objects tracked in don't care regions of the test set. | |
You can use the don't care labels in the training set to avoid that your object | |
detector/tracking algorithm is harvesting hard negatives from those areas, | |
in case you consider non-object regions from the training images as negative | |
examples. | |
The reference point for the 3D bounding box for each object is centered on the | |
bottom face of the box. The corners of bounding box are computed as follows with | |
respect to the reference point and in the object coordinate system: | |
x_corners = [l/2, l/2, -l/2, -l/2, l/2, l/2, -l/2, -l/2]^T | |
y_corners = [0, 0, 0, 0, -h, -h, -h, -h ]^T | |
z_corners = [w/2, -w/2, -w/2, w/2, w/2, -w/2, -w/2, w/2 ]^T | |
with l=length, h=height, and w=width. | |
The coordinates in the camera coordinate system can be projected in the image | |
by using the 3x4 projection matrix in the calib folder, where for the left | |
color camera for which the images are provided, P2 must be used. The | |
difference between rotation_y and alpha is, that rotation_y is directly | |
given in camera coordinates, while alpha also considers the vector from the | |
camera center to the object center, to compute the relative orientation of | |
the object with respect to the camera. For example, a car which is facing | |
along the X-axis of the camera coordinate system corresponds to rotation_y=0, | |
no matter where it is located in the X/Z plane (bird's eye view), while | |
alpha is zero only, when this object is located along the Z-axis of the | |
camera. When moving the car away from the Z-axis, the observation angle | |
(\alpha) will change. | |
An overview of the coordinate systems, reference point and geometrical | |
definitions is given in cs_overview.pdf. | |
To project a point from Velodyne coordinates into the left color image, | |
you can use this formula: x = P2 * R0_rect * Tr_velo_to_cam * y | |
For the right color image: x = P3 * R0_rect * Tr_velo_to_cam * y | |
Note: All matrices are stored row-major, i.e., the first values correspond | |
to the first row. R0_rect contains a 3x3 matrix which you need to extend to | |
a 4x4 matrix by adding a 1 as the bottom-right element and 0's elsewhere. | |
Tr_xxx is a 3x4 matrix (R|t), which you need to extend to a 4x4 matrix | |
in the same way! | |
The sensors were not moved between the different days while taking footage. | |
However, the full camera calibration was performed for every day separately. | |
Therefore, only 'Tr_imu_velo' is constant for all sequences. | |
Note that while all this information is available for the training data, | |
only the data which is actually needed for the particular benchmark must | |
be provided to the evaluation server. However, all 17 values must be provided | |
at all times, with the unused ones set to their default values (=invalid). | |
Additionally a 18'th value must be provided | |
with a floating value of the score for a particular tracked detection, where | |
higher indicates higher confidence in the detection. The range of your scores | |
will be automatically determined by our evaluation server, you don't have to | |
normalize it, but it should be roughly linear. | |
Tracking Benchmark | |
================== | |
The goal in the object tracking task is to estimate object tracklets for the | |
classes 'Car', 'Pedestrian', and (optional) 'Cyclist'. The tracking | |
algorithm must provide as output the 2D 0-based bounding boxes in each image in | |
the sequence using the format specified above, as well as a score, indicating | |
the confidence in the particular frame for this track. All other values must be | |
set to their default values (=invalid), see above. One text file per sequence | |
must be provided in a zip archive, where each file can contain many detections, | |
depending on the number of objects per sequence. In our evaluation we only | |
evaluate detections/objects larger than 25 pixel (height) in the image and do | |
not count Vans as false positives for cars or Sitting Persons as wrong positives | |
for Pedestrians due to their similarity in appearance. (All ignored objects | |
are considered as DontCare areas.) As evaluation criterion we follow the | |
HOTA, CLEARMOT and Mostly-Tracked/Partly-Tracked/Mostly-Lost metrics. | |
Raw Data | |
======== | |
Raw data is mapped to the tracking benchmark sequences and available for | |
download. | |
The velodyne and positioning data for training and testing can be found in the | |
corresponding folders. The sub-folders are structured as follows: | |
- velodyne/%04d/ contains the raw velodyne point clouds (binary file) | |
- oxts/ contains the raw position (oxts) data (plain text files) | |