2025 Challenge Track Description

Track 1: Multi-Camera 3D Perception

Challenge Track 1 involves synthetic animated people data in multiple indoor settings generated using the NVIDIA Omniverse Platform. The 2025 edition expands last year’s corpus with:

SplitHoursCamerasScenesResolution / FPSObjects in Training GT*File Size
Train + Val≈42 h50419 indoor layouts (warehouse, hospital, retail, office)1080 p @ 30 fps363 instances (292 person + service robots / forklifts)74 GB ( depth maps optional, >3 TB)

* Counts refer to the training + validation ground-truth only; the test split is hidden before the release of evaluation system.

Each scene provides temporally-synchronized RGB video, camera calibration, a top-down map, and per-frame 2D/3D annotations. Depth maps (PNG-in-HDF5) are included but very large; feel free to ignore them if storage or I/O is a concern.

    • Task

Teams should detect every object and keep the same identity ID while they move within and across all cameras in a scene.

    • Submission Format

For compatibility with the official evaluation server, results must be a single plain-text file (track1.txt) where each line describes one detection:

〈scene_id〉 〈class_id object_id frame_id x y 〈z width 〈length height yaw

FieldTypeDescription
scene_idintUnique identifier for each multi-camera sequence.
class_idintStarting from zero, denoting an object’s category. (Person→0, Forklift→1, NovaCarter→2, Transporter→3, FourierGR1T2→4, AgilityDigit→5.)
object_idintPositive, unique ID per scene & class. Remains constant across all cameras within the same scene and class.
frame_idintZero-based frame index within that scene.
x, y, z,float3D coordinates of the bounding-box centroid in the world coordinate system which is in meters.
width, length, heightfloatBox dimensions in meters along its x (width), y (length) and z (height) axes of the object-centered coordinate system, with the origin at the centroid.
yawfloatEuler angle in radians about the y-axis of the object-centered coordinate system defining the box’s heading in the world coordinate system. (Pitch and roll are assumed zero.)

Example: in scene 0, if a Person is assigned obj_id = 5, then a Forklift cannot use obj_id = 5 (it must use a different ID, e.g. 6).

Archive the text file as track1.zip or track1.tar.gz before uploading.

    • Evaluation

Scores are computed with 3D HOTA [1], which jointly balances detection, association and localization quality. HOTA score will be computed per class within a scene which will be averaged. A weighted average will then be computed on these scores across all scenes based on the total no. of objects. 3D IoU will be used for matching GT & prediction objects.

      • Leaderboard = raw HOTA on the hidden test set.
      • Online-tracker bonus: If your paper + code prove that only past frames are used, a +10 % multiplicative bonus is applied when deciding the final winner and runner-up (the public leaderboard itself shows the un-bonused score).

Example: Team A (offline) = 69 % HOTA; Team B (online) = 61 % ⇒ bonus → 67.1 % HOTA. Team B ranks higher in the final award list.

    • Data Access

Note: Depth files are huge (> 3 TB). If bandwidth or disk is limited, download only the other files.

By downloading you agree to the Physical AI Smart Spaces licence (CC-BY 4.0).

References

[1] J. Luiten et al., “HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking,” IJCV, 2021.