We’re excited to announce that the 10th AI City Challenge has been accepted as a workshop at ECCV 2026. Please find the details below.
2026 AI CITY CHALLENGE
The AI City Challenge, hosted at ECCV 2026, continues to push the boundaries of computer vision and AI in real-world environments, with a strong emphasis on synthetic-to-real (Sim2Real) transfer and unified reasoning. The challenge drives innovation in intelligent transportation, smart cities, and large-scale video analytics by leveraging both large-scale synthetic data and real-world test scenarios.
By tackling diverse data sources—from multi-camera warehouse environments and traffic systems to multimodal video-language datasets—participants will develop and benchmark methods capable of robust perception, reasoning, and prediction under real-world constraints. A key focus of this edition is bridging the gap between synthetic training and real-world deployment, enabling scalable and generalizable AI systems.
This 10th edition of the Challenge introduces new reasoning-centric tasks, expanded Sim2Real evaluation protocols, and six challenging tracks as below:
Challenge Track 1. Multi-Camera 3D Perception (Sim2Real): Teams are tasked with tracking multiple object classes—including people, autonomous mobile robots (AMRs), humanoids, and forklifts—across large-scale camera networks. The dataset includes over 250 hours of synthetic video from 1,500 cameras with detailed 2D/3D annotations and cross-camera identities. A new real-world test set is introduced to evaluate Sim2Real generalization. Evaluation is based on 3D HOTA (Higher Order Tracking Accuracy).
Challenge Track 2. Transportation Safety Understanding and Captioning (Sim2Real): Using synthetic traffic datasets, participants develop models to understand and describe safety-critical scenarios in the real world. Tasks include video captioning and visual question answering (VQA) focused on pedestrian-centric risk and causal reasoning. Evaluation metrics include BLEU, METEOR, ROUGE-L, CIDEr, and VQA accuracy. Top-performing teams are required to submit Dockerized solutions for reproducibility and deployment validation.
Challenge Track 3. Anomalous Events in Transportation: This track challenges participants to build a single unified model that detects, reasons about, and explains anomalous events in transportation video. Training data includes 44,040 chain-of-thought reasoning annotations across 10 task types covering 3,670 CCTV videos from eight public sources. Models are evaluated on a human-verified in-domain test set and two out-of-domain test sets spanning fisheye intersection footage and egocentric dashcam scenarios, to be released in mid-May. Evaluation metrics include accuracy, temporal IoU, and reference-based language metrics (BERTScore, BLEU, METEOR, ROUGE-L).
Challenge Track 4. Text-Based Person Re-Identification (Sim2Real): Participants tackle text-based person retrieval, where natural language queries describe both appearance and behaviors (including anomalous actions). Models are trained on synthetic data and evaluated on real-world test sets. This track builds on recent benchmarks and emphasizes cross-modal reasoning between vision and language. Accuracy is measured using mean Average Precision (mAP).
Challenge Track 5. Generative Traffic Video Forecasting: This track introduces generative modeling for traffic scene forecasting. Participants generate future video frames conditioned on historical observations and textual descriptions. The goal is to produce temporally consistent and safety-aware predictions. Evaluation metrics include PSNR, SSIM, LPIPS, FVD, and vision-language model (VLM) scores for safety-critical events.
Challenge Track 6. Cross-City Object Detection (Milestone Systems): This track focuses on fine-grained object detection in real-world traffic scenes under cross-city domain shift. Participants train models on a large-scale hidden dataset from one city and are evaluated on a different city with distinct visual characteristics, viewpoints, and environmental conditions, emphasizing robustness to geographic generalization. The benchmark includes more than 40k annotated images and over 100k vehicle instances across a long-tailed set of classes, with bounding-box annotations collected from diverse urban and roadway video streams. To support privacy-conscious benchmarking, full training and inference are conducted through the Milestone Hafnia Training as a Service platform, where teams submit containerized pipelines and are evaluated primarily using mean Average Precision (mAP).
Participants are invited to compete in one or more of the five challenge tracks. To join, please navigate to the page of your target track(s) under CHALLENGE tab.
Important Dates
Below is the tentative timeline for the 2026 AI City Challenge.
- Release of training and validation data sets: April 20, 2026
- Release of evaluation server and test data sets: May 18, 2026
- Challenge track submissions due: TBD
- Workshop papers due: TBD
- Acceptance notification:TBD
- Camera-ready papers due:TBD
- Open source on GitHub by awards candidates due: August 7, 2026 (Anywhere on Earth)
- Presentation of papers and announcement of awards at ECCV 2026: September 8/9, 2026
Workshop Committee
Zheng Tang
NVIDIA
Shuo Wang
NVIDIA
David Anastasiu
Santa Clara University
Ming-Ching Chang
University at Albany – SUNY
Anuj Sharma
Iowa State University
Quan Kong
Woven by Toyota
Munkhjargal Gochoo
The United Arab Emirates University
Jun-Wei Hsieh
National Yung-Ming Chiao-Tung University
Tomasz Kornuta
NVIDIA
Zhedong Zheng
University of Macau
Renran Tian
North Carolina State University
Judah Goldfeder
Columbia University
Fulgencio Navarro
Milestone Systems
Rama Chellappa
Johns Hopkins University
Challenge Committee
Yuxing Wang
NVIDIA
Yizhou Wang
NVIDIA
Sameer Satish Pusegaonkar
NVIDIA
Anqi (Alice) Li
NVIDIA
Nalin Dadhich
NVIDIA
Ridham Kachhadiya
Santa Clara University
Dhanishtha Patil
Santa Clara University
Han (Paris) Zhang
NVIDIA
Yilin Zhao
NVIDIA
Zaid Pervaiz Bhat
NVIDIA
Shuyu Yang
Xi’anJiaotong University
Ashutosh Kumar
Woven by Toyota
Rong Wang
Woven by Toyota
Rafael Martin Nieto
Milestone Systems
Peter Christiansen
Milestone Systems
Sujit Biswas
NVIDIA
Xunlei Wu
NVIDIA
Vidya Murali
NVIDIA
CITATIONS
Please cite the papers from previous AI City Challenges accordingly if you choose to work with our datasets or refer to the previous challenge results. You can find the list of papers here.
