2024 FAQs


1. We would like to participate. What do we need to do?

Please fill out the participation intent form to list your institution, your team and the tracks you will participate in. You just need to complete the online AI City Challenge Datasets Request Form.

2. I am interested only in submitting a paper but not in the Challenge. Can I do that?

Yes. Please make sure to submit your paper by the submission deadline.

3. How large can a team be?

There are no restrictions on team size.

4. What are the rules for downloading the data set?

A participation agreement is available ahead of the data being shared. You need to accept that agreement and submit that response ahead of getting access to the data set.

5. Are we allowed to use other external data/pre-trained models?

External dataset or pre-trained models are allowed only if they are public. Teams that wish to be listed in the public leader board and win the challenge awards are NOT allowed to use any private data or private pre-trained model for either training or validation. The winning teams and runners-up are required to submit their training and testing codes for verification after the challenge submission deadline in order to ensure that no private data or private pre-trained model was used for training and the tasks were performed by algorithms and not humans.

6. What are the prizes?

This information is shared in the Awards section.

7. Will we need to submit our code?

Teams need to make there code publicly accessible to be considered for winning (including complete/reproducible pipeline for mode training/creation). This is to ensure that no private data is used for training and the tasks were performed by algorithms and not humans and contribute to the community.

8. How will the submissions be evaluated?

The submission formats for each track are detailed on the Data and Evaluation page.

9. Are we allowed to use validation sets in training? 


The validation sets are allowed to be used in training.


10. Are we allowed to use test sets in training? 

Additional manual annotations on our testing data are strictly prohibited. We also do not encourage the use of testing data in any way during training, with or without labels, because the task is supposed to be fairly evaluated in real life where we don’t have access to testing data at all. Although it is permitted to perform algorithms like clustering to automatically generate pseudo labels on the testing data, we will choose a winning method without using such techniques when multiple teams have similar performance (~1%). Finally, please keep in mind that, like all the previous editions of the AI City Challenge, all the winning methods and runners-up will be requested to submit their code for verification purposes. Their performance needs to be reproducible using the training/validation/synthetic data only. 


11. Are we allowed to use data/pre-trained models from the previous edition(s) of the AI City Challenge?

Data from previous edition(s) of the AI City Challenge are allowed to be used


12. Do the winning teams and runners-up need to submit papers and present at the workshop? 
All the winning teams and runners-up have to submit papers, register and present at the workshop, in order to be qualified for winning. 

Track 1 – Multi-Camera People Tracking

1. Is calibration available for each camera?

The 3-by-4 camera matrix and 3-by-3 homography matrix are provided for each camera.

2. What is the standard of labeling?

The annotations of the test set are generated based on the same standards as the training and validation set.

  • For occluded objects (objects that are blocked by an object within the camera frame), objects must satisfy both the visibility in height and width requirements.
  • For objects that are truncated – objects that are cut off via the camera frame, the objects must satisfy EITHER of the conditions in visibility for height OR the visibility for width.
  • Here are the definitions for visibility in height and width:
    • Visibility for height
      • If the head is visible and 20% of the height is visible then, label the object.
      • If the head is not visible, then label the object if 60% of the height is visible.
    • Visibility for width
      • More than 60% body width visible should be labeled.

3. How are the object IDs used for evaluation? Do the submitted IDs need to be consistent with the ground truths?

We use the HOTA metric for evaluation. The IDs in the submitted results do not need to match the exact IDs in the ground truths. We will use bipartite matching for their comparison, which will be based on 3D distance of the bottom points of objects in the global coordinate system.

Track 2 – Traffic Safety Description and Analysis

1. Is it compulsory for participants to use a generative model to generate captions from the videos? Or we can use the training caption as a ground truth and we can apply a retrieval model to retrieve the closet caption from the training database to submit it on the test set?

The text should describe the new scenario, which may not be found in the training set. Specifically this is not a retrieval task and we are seeking for generative solutions. Teams may submit results to evaluation system and rank on the leaderboard with any method. But we will be manually evaluating award contenders and teams only using retrieval method will be disqualified from winning the awards. For example, considering a method which uses features extracted from the test set videos and retrieves the “closest-meaning” caption from the training set for submission, this will not be qualified for winning the track since it is not a generative solution.

Track 3 – Naturalistic Driving Action Recognition

1. Can we use dataset A2 for supervised or semi-supervised training on our algorithms?

Same as for other tracks, here for track3 the data set A2 (provided to teams with no label) should be used as validation only which means any sort of training (manual labeling, semi-supervise labeling) using dataset A2 is prohibited.

2. Why it is sometimes out of sync between the three camera view?

The videos are synced at the start, however since there are some video concatenation happened during the video creation process, it may be slightly out of sync in the later part but should not be more than one second.

3. Does the activity id start at 0 or 1?

There was a discrepancy between the track description and the label file in the dataset. To confirm, the activity id starts at 0 and the info on the track description page has been updated accordingly.

Track 4 – Road Object Detection in Fish-Eye Cameras

1. Is calibration available for each camera?

There is no calibration information for all cameras involved in the train and test sets.

Track 5 – Detecting Violation of Helmet Rule for Motorcyclists

1. What is the standard of labeling?

An object is annotated if 40% of the object is seen. The minimum height and width of the bounding boxes are 40 pixels. Objects which are smaller than 40 pixels are not taken into consideration and will not influence the test accuracy results.

2. In the dataset, objects behind the redacted areas are not annotated. Why?

The objects which are overlapping with the redacted area (blurred region) are not taken into consideration, because the blurred region can suppress some important features of the objects to be detected. In the test dataset, any objects (in the submission file) which will have overlap with redacted areas will be ignored and will not influence test accuracy.

3. whether the data type of width and height of submitted results in Track 5 is Int or Float?

The data type of width and height for the submitted result is Integer.

4. Could we use the Track 5 data from last year or any external dataset to train the model?

We have expanded the dataset from last year thus there is no need to use data from last year to train the model since it is a subset of the new dataset we provide this year. In terms of using external dataset, we follow the same principle with other tracks this year where external dataset or pre-trained models are allowed only if they are public. Any usage of private data or private model will disqualify you from winning the track.