2025 WORKSHOP PROGRAM

This year’s AI City Challenge workshop will take place on Monday, October 20th as a full-day workshop of ICCV 2025.

Location: Hawai’i Convention Center, Honolulu, Hawai’i
Virtual attendance of our workshop will be available. We will share the link soon.

Please find tentative the workshop schedule below (Hawaii Time GMT-10):  

07:30 AM – 08:00 AM

 

 

Breakfast

08:00 AM – 08:30 AM

 

 

Opening – Workshop Overview

08:30 AM – 09:15AM

 

 

Keynote

Speaker: Prof. Laura Leal-Taixé

09:15 AM – 10:00 AM

 

Keynote

Speaker: Prof. Frank Wang

10:00 AM – 10:15AM

 

Morning Coffee Break

10:15 AM – 11:05 AM

 

Paper Presentations and Q&A – Track1

(1) Paper_ID 21: Multi-Camera 3D Object Tracking via 3D Point Clouds and Re-Identification

(2) Paper_ID 12: DepthTrack: Cluster Meets BEV for Multi-Camera Multi-Target 3D Tracking

(3) Paper_ID 26: Online 3D Multi-Camera Perception through Robust 2D Tracking and Depth-based Late Aggregation

(4) Paper_ID 23: VGCRTrack: Multi-Camera 3D Tracking with View-Aware Geometric Center Refinement

(5) Paper_ID 6: MCBLT: Multi-Camera Multi-Object 3D Tracking in Long Videos

11:05 AM – 12:05 PM

 

Paper Presentations and Q&A – Track2

(1) Paper_ID 4: TrafficInternVL: Understanding Traffic Scenarios with Vision–Language Models

(2) Paper_ID 34: Multi-Agent Cooperation for Traffic Safety Description and Analysis

(3) Paper_ID 14: TrafficVILA: Scaling Vision-Language Models to High-Resolution Video Understanding for Traffic Safety Analysis

(4) Paper_ID 13: TrafficInternVL: Spatially-Guided Fine-Tuning with Caption Refinement for Fine-Grained Traffic Safety Captioning and Visual Question Answering 

(5) Paper_ID 27: TrafficVILA: A Multimodal Framework for Traffic Safety Description and Analysis 

(6) Paper_ID 22: Domain-Aware Enhancements to Vision-Language Models for Urban Traffic Safety Question Answering 

(7) Paper_ID 42: STER-VLM: Spatio-Temporal With Enhanced Reference Vision-Language Models

(8) Paper_ID 16: Task-Specific Dual-Model Framework for Comprehensive Traffic Safety Video Description and Analysis  

12:05 PM – 13:30 PM

 

Lunch

13:30 PM – 14:15 PM

 

Paper Presentations and Q&A – Track3

(1) Paper_ID 3: Warehouse Spatial Question Answering with LLM Agent: 1st Place Solution of the 9th AI City Challenge Track 3

(2) Paper_ID 20: Multimodal and Multi-task Fusion for Spatial Reasoning

(3) Paper_ID 9: SmolRGPT: Efficient Spatial Reasoning for Warehouse Environments with 600M Parameters

(4) Paper_ID 11: Prompt-Guided Spatial Understanding with RGB-D Transformers for Fine-Grained Object Relation Reasoning

(5) Paper_ID 41: TinyGiantVLM: A Lightweight Vision-Language Architecture for Spatial Reasoning under Resource Constraints

14:15 PM – 15:20 PM

 

Paper Presentations and Q&A – Track4

(1) Paper_ID 39: A Lightweight and Data-Centric Framework for Real-Time Object Detection in Fisheye Camera

(2) Paper_ID 5: Enhanced Fisheye Object Detection via YOLO Ensemble Learning and Weighted Box Fusion

(3) Paper_ID 35: Boosting Fisheye Detection with Augmentations and Ensembles

(4) Paper_ID 17: Data Augmentation Is All You Need For Robust Fisheye Object Detection

(5) Paper_ID 7: A Unified Detection Pipeline for Robust Object Detection in Fisheye-Based Traffic Surveillance

(6) Paper_ID 8: Augmentation, Distillation and Optimization: A Practical Pipeline for Fisheye Object Detection on Edge Devices

(7) Paper_ID 25: A Real-time Vehicle Detection Pipeline with Data-centric Enhancements and Multi-stage DETR Distillation

(8) Paper_ID 32: Efficient and Distortion-Aware Fisheye Object Detection for Edge Devices

(9) Paper_ID 33: Real-Time Object Detection on Edge Devices: A Fisheye Specific DFINE

15:20 PM – 15:35 PM

 

Afternoon Coffee Break

15:35 PM – 15:55 PM

 

Paper Presentations and Q&A – Independent

(1) Paper_ID 15: EKI-GAN: Context-Aware Vehicle Trajectory Forecasting with Vehicle Factors and Environmental Information at Signalized Intersections

(2) Paper_ID 18: Hierarchical Multi-Modal Fusion for Roadside VRU Detection: Method Complementarity Under Sparse Label Constraints

15:55 PM – 16:15 PM

 

Award Ceremony

 

16:30 PM – 17:30 PM

 

Poster Session (29 posters)

17:30 PM

 

Adjourn