We’re excited to announce that the 10th AI City Challenge has been accepted as a workshop at ECCV 2026. Please find the details below.

2026 AI CITY CHALLENGE

The AI City Challenge, hosted at ECCV 2026, continues to push the boundaries of computer vision and AI in real-world environments, with a strong emphasis on synthetic-to-real (Sim2Real) transfer and unified reasoning. The challenge drives innovation in intelligent transportation, smart cities, and large-scale video analytics by leveraging both large-scale synthetic data and real-world test scenarios.

By tackling diverse data sources—from multi-camera warehouse environments and traffic systems to multimodal video-language datasets—participants will develop and benchmark methods capable of robust perception, reasoning, and prediction under real-world constraints. A key focus of this edition is bridging the gap between synthetic training and real-world deployment, enabling scalable and generalizable AI systems.

This 10th edition of the Challenge introduces new reasoning-centric tasks, expanded Sim2Real evaluation protocols, and six challenging tracks as below:

 

Challenge Track 1. Multi-Camera 3D Perception (Sim2Real): Teams are tasked with tracking multiple object classes—including people, autonomous mobile robots (AMRs), humanoids, and forklifts—across large-scale camera networks. The dataset includes over 250 hours of synthetic video from 1,500 cameras with detailed 2D/3D annotations and cross-camera identities. A new real-world test set is introduced to evaluate Sim2Real generalization. Evaluation is based on 3D HOTA (Higher Order Tracking Accuracy).

Challenge Track 2. Transportation Safety Understanding and Captioning (Sim2Real): Using synthetic traffic datasets, participants develop models to understand and describe safety-critical scenarios in the real world. Tasks include video captioning and visual question answering (VQA) focused on pedestrian-centric risk and causal reasoning. Evaluation metrics include BLEU, METEOR, ROUGE-L, CIDEr, and VQA accuracy. Top-performing teams are required to submit Dockerized solutions for reproducibility and deployment validation.

Challenge Track 3. Anomalous Events in Transportation: This track challenges participants to build a single unified model that detects, reasons about, and explains anomalous events in transportation video. Training data includes 44,040 chain-of-thought reasoning annotations across 10 task types covering 3,670 CCTV videos from eight public sources. Models are evaluated on a human-verified in-domain test set and two out-of-domain test sets spanning fisheye intersection footage and egocentric dashcam scenarios, to be released in mid-May. Evaluation metrics include accuracy, temporal IoU, and reference-based language metrics (BERTScore, BLEU, METEOR, ROUGE-L).

Challenge Track 4. Text-Based Person Re-Identification (Sim2Real): Participants tackle text-based person retrieval, where natural language queries describe both appearance and behaviors (including anomalous actions). Models are trained on synthetic data and evaluated on real-world test sets. This track builds on recent benchmarks and emphasizes cross-modal reasoning between vision and language. Accuracy is measured using mean Average Precision (mAP).

Challenge Track 5. Generative Traffic Video Forecasting: This track introduces generative modeling for traffic scene forecasting. Participants generate future video frames conditioned on historical observations and textual descriptions. The goal is to produce temporally consistent and safety-aware predictions. Evaluation metrics include PSNR, SSIM, LPIPS, FVD, and vision-language model (VLM) scores for safety-critical events.

Challenge Track 6. Cross-City Object Detection (Milestone Systems): This track focuses on fine-grained object detection in real-world traffic scenes under cross-city domain shift. Participants train models on a large-scale hidden dataset from one city and are evaluated on a different city with distinct visual characteristics, viewpoints, and environmental conditions, emphasizing robustness to geographic generalization. The benchmark includes more than 40k annotated images and over 100k vehicle instances across a long-tailed set of classes, with bounding-box annotations collected from diverse urban and roadway video streams. To support privacy-conscious benchmarking, full training and inference are conducted through the Milestone Hafnia Training as a Service platform, where teams submit containerized pipelines and are evaluated primarily using mean Average Precision (mAP).

Participants are invited to compete in one or more of the five challenge tracks. To join, please navigate to the page of your target track(s) under CHALLENGE tab.

Important Dates

Below is the tentative timeline for the 2026 AI City Challenge.

Workshop Committee

Zheng Tang

NVIDIA

Shuo Wang

NVIDIA

David Anastasiu

Santa Clara University

Ming-Ching Chang

University at Albany – SUNY

Anuj Sharma

Iowa State University

Quan Kong

Woven by Toyota

Munkhjargal Gochoo

The United Arab Emirates University

Jun-Wei Hsieh

National Yung-Ming Chiao-Tung University

Tomasz Kornuta

NVIDIA

Zhedong Zheng

University of Macau

Renran Tian

North Carolina State University

Judah Goldfeder

Columbia University

Fulgencio Navarro

Milestone Systems

Rama Chellappa

Johns Hopkins University

Challenge Committee

Yuxing Wang

NVIDIA

Yizhou Wang

NVIDIA

Sameer Satish Pusegaonkar

NVIDIA

Anqi (Alice) Li

NVIDIA

Nalin Dadhich

NVIDIA

Ridham Kachhadiya

Santa Clara University

Dhanishtha Patil

Santa Clara University

Han (Paris) Zhang

NVIDIA

Yilin Zhao

NVIDIA

Zaid Pervaiz Bhat

NVIDIA

Shuyu Yang

Xi’anJiaotong University

Ashutosh Kumar

Woven by Toyota

Rong Wang

Woven by Toyota

Rafael Martin Nieto

Milestone Systems

Peter Christiansen

Milestone Systems

Sujit Biswas

NVIDIA

Xunlei Wu

NVIDIA

Vidya Murali

NVIDIA

CITATIONS

Please cite the papers from previous AI City Challenges accordingly if you choose to work with our datasets or refer to the previous challenge results. You can find the list of papers here