2026 FAQs

General

1. We would like to participate. What do we need to do?

This time the data access information is shared on each track’s description page under CHALLENGE tab. Please find the instructions in there.

2. I am interested only in submitting a paper but not in the Challenge. Can I do that?

Yes. Please make sure to submit your paper by the submission deadline.

3. How large can a team be?

There are no restrictions on team size.

4. Are we allowed to use other external data/pre-trained models?

External dataset or pre-trained models are allowed only if they are public. Teams that wish to be listed in the public leader board and win the challenge awards are NOT allowed to use any private data or private pre-trained model for either training or validation. The winning teams and runners-up are required to submit their training and testing codes for verification after the challenge submission deadline in order to ensure that no private data or private pre-trained model was used for training and the tasks were performed by algorithms and not humans.

5. What are the prizes?

This information is shared in the Awards section.

6. Will we need to submit our code?

Teams need to make there code publicly accessible to be considered for winning (including complete/reproducible pipeline for mode training/creation). This is to ensure that no private data is used for training and the tasks were performed by algorithms and not humans and contribute to the community.

7. How will the submissions be evaluated?

The submission formats for each track are detailed on each track’s description page under CHALLENGE tab.

8. Are we allowed to use validation sets in training?

The validation sets are allowed to be used in training.

9. Are we allowed to use test sets in training?

Additional manual annotations on our testing data are strictly prohibited. We also do not encourage the use of testing data in any way during training, with or without labels, because the task is supposed to be fairly evaluated in real life where we don’t have access to testing data at all. Although it is permitted to perform algorithms like clustering to automatically generate pseudo labels on the testing data, we will choose a winning method without using such techniques when multiple teams have similar performance (~1%). Finally, please keep in mind that, like all the previous editions of the AI City Challenge, all the winning methods and runners-up will be requested to submit their code for verification purposes. Their performance needs to be reproducible using the training/validation/synthetic data only.

10. Are we allowed to use data/pre-trained models from the previous edition(s) of the AI City Challenge?

Data from previous edition(s) of the AI City Challenge are allowed to be used.

11. Do the winning teams and runners-up need to submit papers and present at the workshop?

All the winning teams and runners-up have to submit papers, register and present at the workshop, in order to be qualified for winning.

Track 1 – Multi-Camera 3D Perception (Sim2Real)

1. Is calibration available for each camera?

The comprehensive camera calibration information is available for each camera, including 3-by-4 camera matrix, intrinsic parameters, extrinsic parameters, etc.

2. What is the standard of labeling visible 2D bounding boxes?

The annotations of the test set are generated based on the same standards as the training and validation set.

- For occluded objects (objects that are blocked by an object within the camera frame), objects must satisfy both the visibility in height and width requirements.
- For objects that are truncated – objects that are cut off via the camera frame, the objects must satisfy EITHER of the conditions in visibility for height OR the visibility for width.
- Here are the definitions for visibility in height and width:
  - Visibility for height
    - If the head is visible and 20% of the height is visible then, label the object.
    - If the head is not visible, then label the object if 60% of the height is visible.
  - Visibility for width
    - More than 60% body width visible should be labeled.

3. How are the object IDs used for evaluation? Do the submitted IDs need to be consistent with the ground truths?

We use the HOTA metric for evaluation. The IDs in the submitted results do not need to match the exact IDs in the ground truths. We will use bipartite matching for their comparison, which will be based on IoU of 3D bounding boxes in the global coordinate system.

Track 2 – Transportation Safety Understanding and Captioning (Sim2Real)

1. Can we use any pre-trained models from earlier AI City Challenge versions of this track?

No. Models pre-trained or fine-tuned on AI City Challenge data from previous years or data from other tracks from this year’s challenge are not allowed for this track.

2. Can other pre-trained models be used for this challenge track?

Yes. Teams may fine-tune general pre-trained models with open weights such as Qwen 3.6 or Gemma 4. As solutions to this challenge should be reproducible, teams should refrain from using paid API-based models such as Google Gemini or OpenAI GPT.

3. Can we modify the synthetic videos to increase their realism before we train our models?

Yes. Teams may use generative models that may enhance the realism of the scenes, such as the NVIDIA Cosmos line of models, which may better align the synthetic data and the real data distributions.

Track 3 – Anomalous Events in Transportation

1. Are closed-source / paid API-based models (e.g., OpenAI GPT, Google Gemini, Anthropic Claude) allowed in Track 3?

Yes. Track 3 permits publicly accessible models or services, including paid API-based models, to be used in any role in your pipeline:

- generating, refining, augmenting, or cleaning training data;
- producing intermediate outputs (captions, structured event descriptions, reasoning chains) consumed by a downstream model;
- serving as the final inference system that produces submitted predictions.

This applies uniformly to all ten in-domain task types and to both out-of-domain evaluations.

The only hard constraint is that any external models or data you use must be publicly accessible to all participants, such as through a public API, open download link, public repository, or official project page with access instructions. Teams may also use their own internally developed models in any of the above roles, provided they are trained only on permitted challenge data and publicly accessible external data/models. No private data may be used in their training. Internally developed models do not need to be published prior to award verification. Award candidates will be required to open-source the components they train and control (code, model weights, training recipes, and any custom inference-pipeline code) by the open-source deadline announced on the official challenge website. Components that participating teams did not train or control (e.g., closed-source APIs accessed via their public service) are not required to be released.

Because closed-source services are not bit-for-bit reproducible, award candidates must clearly disclose the service, model version, role in the pipeline, and prompts/configuration for each closed-source API component used. Code verification will focus on confirming that no private data or non-public model was used, rather than on bit-exact replay of closed-source API calls.

2. Must we train a single end-to-end VLM, or are modular and agentic systems allowed?

Track 3 requires a unified system: one inference pipeline that handles all task types across the TAR in-domain test set and if you choose to enter them, the FETV and PSI VQA out-of-domain leaderboards. Given a video and a task-specific prompt, the system must produce an answer through that unified pipeline. Task-specific prompts, parsing, and routing inside the pipeline are allowed, but teams may not submit separately tuned or specialized systems per task type or per test set.

The internal architecture of the unified system is up to you, and may be:

- a single fine-tuned end-to-end VLM model,
- an agentic pipeline (planning, tool use, multiple model calls in sequence),
- a modular multi-stage system in which different components play different roles (e.g., a perception/captioning module feeding a separate reasoning module).

Any external models or data you use must be publicly accessible to all participants, such as through a public API, open download link, public repository, or official project page with access instructions. Teams may also use their own internally developed models, provided they are trained only on permitted challenge data and publicly accessible external data/models. No private data may be used in their training. Internally developed models do not need to be published prior to award verification. Award candidates will be required to open-source the components they train and control (code, model weights, training recipes, and any custom inference-pipeline code) by the open-source deadline announced on the official challenge website. Each submission must clearly categorize the developed solution.

3. Are we allowed to use additional public datasets, web-sourced videos, or other external data?

Yes. Track 3 allows publicly accessible data to be added to your training mixture, including additional public video datasets, web-sourced videos under permissive licenses, synthetic data from public generative models, and distilled outputs from public or commercial APIs. External data sources used must be declared in the technical report. Private data, scraped data without a clear public license, and non-public datasets are not allowed.

4. Can we modify, rewrite, or extend the released training annotations?

Yes. The released annotations may be used as-is, post-processed (for example, by rewriting chain-of-thought reasoning, filtering noisy items, or generating additional Q&A pairs), or combined with annotations you generate yourself using public or commercial models. Any additional generated annotations or substantial annotation post-processing should be described in the technical report.

5. How many leaderboards does Track 3 have, and how do I submit to each?

Track 3 has three separate leaderboards, each with its own submission and scoring (scores are not combined across them):

- TAR (in-domain) — the main Track 3 leaderboard.
- FETV (out-of-domain 1) — submitted and scored through the evaluation server as Track 7 (“Traffic Violation Understanding (FETV) – Track 3 Out-of-Domain Evaluation”).
- PSI VQA (out-of-domain 2) — submitted and scored through the evaluation server as Track 8 (“PSI-VQA: Pedestrian Situated Intent (PSI) Visual Question Answering (VQA) – Track 3 Out-of-Domain Evaluation”).

The two out-of-domain leaderboards are optional. See the Track 3 description page for each leaderboard’s dataset, submission format, and metrics.

6. Do I have to enter all three leaderboards, and how does prize eligibility work?

No. You may enter any subset. Each leaderboard carries its own prize. To keep the out-of-domain awards a demonstration of generalization from the in-domain TAR task, eligibility is: the TAR prize requires a valid TAR submission; the FETV prize requires valid submissions to both TAR and FETV; the PSI VQA prize requires valid submissions to both TAR and PSI VQA.

7. Is the same model required across all three leaderboards?

The intent of the out-of-domain leaderboards is to evaluate how a model built for the in-domain TAR task generalizes, so we encourage using the same unified system across the leaderboards you enter. Per-leaderboard prompting, parsing, and routing within that unified pipeline are allowed, but separately tuned or specialized systems per leaderboard are not.

Track 4 – Text-Based Person Re-Identification (Sim2Real)

1. Can we use the official test set as a validation set without using ground-truth labels?

The test data and its distribution must not be used in any form during the training process. Using the official test set as a validation set—even without using ground-truth labels—is not permitted. The test set should be used strictly for final inference and leaderboard evaluation only. Any use of test set outputs (with or without labels) for model selection, threshold tuning, ensemble selection, pseudo-labeling, or post-processing adjustment is prohibited.

Track 5 – Generative Traffic Video Forecasting

1. Are pretrained models or external data allowed?

Yes. Public pretrained models and external data are allowed. However, the base models and external data used for post-training must be clearly declared in the technical report or method description.

2. What is the required submission format?

Participants must submit the generated results as raw frame images. Please do not package or compress the outputs into a video file. The evaluation will be conducted based on the submitted image frames with the specified naming convention.

3. Can I enhance or refine the provided text prompts?

Yes. Participants may use automatic caption refinement or prompt enhancement methods. However, the refinement algorithm must be clearly described, and manual rewriting of the prompts is not allowed.

4. Can I use additional annotations from the WTS dataset, such as vehicle bounding boxes?

Yes, but temporal causality must be strictly preserved. Additional annotations may only be used on the conditioning history frames provided as input. No information derived from future frames may be used.

5. Can I use models that are not generative methods?

We seek solutions on multimodal generative models. Traditional regressive pipelines, such as direct video continuation methods, are discouraged and will not be eligible for the challenge award. Participants are therefore strongly encouraged to design their methods around generative forecasting frameworks.

Track 6 – Cross-City Object Detection (Milestone Project Hafnia)

1. Do participants get direct access to the full training and benchmarking datasets?

No. Participants will not be able to directly download, view, or extract the full training or benchmarking datasets. Only a small sample dataset will be downloadable for local development and pipeline adaptation. The full training data will be accessible only through managed Hafnia training jobs, and the benchmarking data will remain hidden. Benchmarking will run in two steps: inference will happen inside the Hafnia platform, and the generated predictions will then be submitted to the AI City Challenge evaluation system, where evaluation against the ground truth and ranking will take place.

2. How do I register and get access?

Users will find an overview, useful links, and registration information on the Hafnia Track 6 community page:

https://community.hafnia.milestonesys.com/home/clubs/ai-city-challenge-track-6-omnhs/overview

Applications will be reviewed by the organizers, and access is expected to be granted typically within 3–4 days after registration, provided that the application is valid. Participation is limited to 200 individual Hafnia accounts, granted on a first-come, first-served basis after validation.

3. Are pretrained models, ensembles, or external data allowed?

Pretrained models are allowed, and ensembles are allowed if they can be executed as a single inference pipeline within the platform constraints. External datasets are not allowed for training within the Track 6 challenge workflow. Participants may bring models pretrained outside the challenge, but the challenge training stage must use the Track 6 data provided through Hafnia.

4. How will training, benchmarking, and submission work?

Training will be performed through the Hafnia Training-as-a-Service platform. Each accepted participant account will receive 30,000 Hafnia credits for Track 6 only and may run one experiment at a time. A quickstart guide and example trainer package for training object detectors will be provided through the Hafnia platform. Benchmarking functionality is expected to become available in early June 2026. Once enabled, participants will upload their inference package to Hafnia. The Hafnia platform will run inference on the hidden benchmarking data and generate prediction files. Participants will then manually submit those prediction files to the official AI City Challenge evaluation and ranking website, where evaluation against the ground truth and ranking will take place.

5. Who owns the trained models, and what must be made public?

Participants remain responsible for any rights in the models they train, and no ownership is transferred to the organizers. Trained model weights may be downloaded after completed training jobs, and models and results obtained through Track 6 are intended for research purposes only. In accordance with the AI City Challenge rules, awarded models must be reproducible and made publicly available through GitHub. Award candidates will be required to release the source code and materials needed to reproduce their submitted approach by the open-source deadline.