2025 FAQs
General
1. We would like to participate. What do we need to do?
This time the data access information is shared on each track’s description page under CHALLENGE tab. Please find the instructions in there.
2. I am interested only in submitting a paper but not in the Challenge. Can I do that?
Yes. Please make sure to submit your paper by the submission deadline.
3. How large can a team be?
There are no restrictions on team size.
4. Are we allowed to use other external data/pre-trained models?
External dataset or pre-trained models are allowed only if they are public. Teams that wish to be listed in the public leader board and win the challenge awards are NOT allowed to use any private data or private pre-trained model for either training or validation. The winning teams and runners-up are required to submit their training and testing codes for verification after the challenge submission deadline in order to ensure that no private data or private pre-trained model was used for training and the tasks were performed by algorithms and not humans.
5. What are the prizes?
This information is shared in the Awards section.
6. Will we need to submit our code?
Teams need to make there code publicly accessible to be considered for winning (including complete/reproducible pipeline for mode training/creation). This is to ensure that no private data is used for training and the tasks were performed by algorithms and not humans and contribute to the community.
7. How will the submissions be evaluated?
The submission formats for each track are detailed on each track’s description page under CHALLENGE tab.
The validation sets are allowed to be used in training.
9. Are we allowed to use test sets in training?
Additional manual annotations on our testing data are strictly prohibited. We also do not encourage the use of testing data in any way during training, with or without labels, because the task is supposed to be fairly evaluated in real life where we don’t have access to testing data at all. Although it is permitted to perform algorithms like clustering to automatically generate pseudo labels on the testing data, we will choose a winning method without using such techniques when multiple teams have similar performance (~1%). Finally, please keep in mind that, like all the previous editions of the AI City Challenge, all the winning methods and runners-up will be requested to submit their code for verification purposes. Their performance needs to be reproducible using the training/validation/synthetic data only.
10. Are we allowed to use data/pre-trained models from the previous edition(s) of the AI City Challenge?
Data from previous edition(s) of the AI City Challenge are allowed to be used.
11. Do the winning teams and runners-up need to submit papers and present at the workshop?
Track 1 – Multi-Camera 3D Perception
1. Is calibration available for each camera?
The comprehensive camera calibration information is available for each camera, including 3-by-4 camera matrix, intrinsic parameters, extrinsic parameters, etc.
2. What is the standard of labeling visible 2D bounding boxes??
The annotations of the test set are generated based on the same standards as the training and validation set.
- For occluded objects (objects that are blocked by an object within the camera frame), objects must satisfy both the visibility in height and width requirements.
- For objects that are truncated – objects that are cut off via the camera frame, the objects must satisfy EITHER of the conditions in visibility for height OR the visibility for width.
- Here are the definitions for visibility in height and width:
- Visibility for height
- If the head is visible and 20% of the height is visible then, label the object.
- If the head is not visible, then label the object if 60% of the height is visible.
- Visibility for width
- More than 60% body width visible should be labeled.
- Visibility for height
3. How are the object IDs used for evaluation? Do the submitted IDs need to be consistent with the ground truths?
We use the HOTA metric for evaluation. The IDs in the submitted results do not need to match the exact IDs in the ground truths. We will use bipartite matching for their comparison, which will be based on IoU of 3D bounding boxes in the global coordinate system.
Track 2 – Traffic Safety Description and Analysis
1. Is it compulsory for participants to use a generative model to generate captions from the videos? Or we can use the training caption as a ground truth and we can apply a retrieval model to retrieve the closet caption from the training database to submit it on the test set?
The text should describe the new scenario, which may not be found in the training set. Specifically this is not a retrieval task and we are seeking for generative solutions. Teams may submit results to evaluation system and rank on the leaderboard with any method. But we will be manually evaluating award contenders and teams only using retrieval method will be disqualified from winning the awards. For example, considering a method which uses features extracted from the test set videos and retrieves the “closest-meaning” caption from the training set for submission, this will not be qualified for winning the track since it is not a generative solution.
Track 3 – Warehouse Spatial Intelligent
[We will add frequently asked questions with answers here for this new track]
Track 4 – Road Object Detection in Fish-Eye Cameras
1. Is calibration available for each camera?
There is no calibration information for all cameras involved in the train and test sets.
2. Does the evaluation time for FPS include the time taken for loading the model, loading images, and performing pre-processing tasks?
Model loading is not included in the FPS calculation; however, preprocessing for individual images is included.
3. Is inference using batch processing allowed, or should all images be processed individually?
Batch processing is not permitted; images must be processed individually to simulate real-time application.
4. Is it permissible to initiate another parallel process for loading images while the inference process is running?
To have a straightforward evaluation, parallel processing of any kind is prohibited; all operations must be performed sequentially for all 1000 images. The following pseudocode provides a detailed overview of the evaluation process:
BEGIN
INITIALIZE timer_start = CURRENT_TIME
FOR each image IN image_folder (1000 images)
LOAD image
PREPROCESS image
PERFORM_INFERENCE on image
POSTPROCESS inference_result
SAVE result
END_FOR
SET timer_end = CURRENT_TIME
CALCULATE elapsed_time = timer_end - timer_start
CALCULATE fps = 1000 / elapsed_time
DISPLAY fps
CALCULATE F1-score based on the result
CALCULATE Metric (harmonic mean of F1-score and fps)
END