2025 Workshop – AI CITY CHALLENGE

2025 WORKSHOP PROGRAM

This year’s AI City Challenge workshop will take place on Monday, October 20th as a full-day workshop of ICCV 2025.

Location: Room 313A, Hawai’i Convention Center, Honolulu, Hawai’i

For virtual attendance: zoom link https://us06web.zoom.us/j/87290496334 (pwd: farmcans)

Please find tentative the workshop schedule below (Hawaii Time GMT-10):

07:30 AM – 08:00 AM

Breakfast

08:00 AM – 08:30 AM

Opening – Workshop Overview

08:30 AM – 09:15AM

Keynote

Speaker: Prof. Laura Leal-Taixé

09:15 AM – 10:00 AM

Keynote

Speaker: Prof. Frank Wang

10:00 AM – 10:15AM

Morning Coffee Break

10:15 AM – 11:05 AM

Paper Presentations and Q&A – Track1

(1) Paper_ID 21: Multi-Camera 3D Object Tracking via 3D Point Clouds and Re-Identification (media)

(2) Paper_ID 12: DepthTrack: Cluster Meets BEV for Multi-Camera Multi-Target 3D Tracking (media)

(3) Paper_ID 26: Online 3D Multi-Camera Perception through Robust 2D Tracking and Depth-based Late Aggregation (media)

(4) Paper_ID 23: VGCRTrack: Multi-Camera 3D Tracking with View-Aware Geometric Center Refinement (media)

(5) Paper_ID 6: MCBLT: Multi-Camera Multi-Object 3D Tracking in Long Videos (media)

11:05 AM – 12:05 PM

Paper Presentations and Q&A – Track2

(1) Paper_ID 4: TrafficInternVL: Understanding Traffic Scenarios with Vision–Language Models (media)

(2) Paper_ID 34: Multi-Agent Cooperation for Traffic Safety Description and Analysis (media)

(3) Paper_ID 14: TrafficVILA: Scaling Vision-Language Models to High-Resolution Video Understanding for Traffic Safety Analysis (media)

(4) Paper_ID 13: TrafficInternVL: Spatially-Guided Fine-Tuning with Caption Refinement for Fine-Grained Traffic Safety Captioning and Visual Question Answering (media)

(5) Paper_ID 27: TrafficVILA: A Multimodal Framework for Traffic Safety Description and Analysis (media)

(6) Paper_ID 22: Domain-Aware Enhancements to Vision-Language Models for Urban Traffic Safety Question Answering (media)

(7) Paper_ID 42: STER-VLM: Spatio-Temporal With Enhanced Reference Vision-Language Models (media)

(8) Paper_ID 16: Task-Specific Dual-Model Framework for Comprehensive Traffic Safety Video Description and Analysis (media)

12:05 PM – 13:30 PM

Lunch

13:30 PM – 14:15 PM

Paper Presentations and Q&A – Track3

(1) Paper_ID 3: Warehouse Spatial Question Answering with LLM Agent: 1st Place Solution of the 9th AI City Challenge Track 3 (media)

(2) Paper_ID 20: Multimodal and Multi-task Fusion for Spatial Reasoning (media)

(3) Paper_ID 9: SmolRGPT: Efficient Spatial Reasoning for Warehouse Environments with 600M Parameters (media)

(4) Paper_ID 11: Prompt-Guided Spatial Understanding with RGB-D Transformers for Fine-Grained Object Relation Reasoning (media)

(5) Paper_ID 41: TinyGiantVLM: A Lightweight Vision-Language Architecture for Spatial Reasoning under Resource Constraints (media)

14:15 PM – 15:20 PM

Paper Presentations and Q&A – Track4

(1) Paper_ID 39: A Lightweight and Data-Centric Framework for Real-Time Object Detection in Fisheye Camera (media)

(2) Paper_ID 5: Enhanced Fisheye Object Detection via YOLO Ensemble Learning and Weighted Box Fusion (media)

(3) Paper_ID 35: Boosting Fisheye Detection with Augmentations and Ensembles (media)

(4) Paper_ID 17: Data Augmentation Is All You Need For Robust Fisheye Object Detection (media)

(5) Paper_ID 7: A Unified Detection Pipeline for Robust Object Detection in Fisheye-Based Traffic Surveillance (media)

(6) Paper_ID 8: Augmentation, Distillation and Optimization: A Practical Pipeline for Fisheye Object Detection on Edge Devices (media)

(7) Paper_ID 25: A Real-time Vehicle Detection Pipeline with Data-centric Enhancements and Multi-stage DETR Distillation (media)

(8) Paper_ID 32: Efficient and Distortion-Aware Fisheye Object Detection for Edge Devices (media)

(9) Paper_ID 33: Real-Time Object Detection on Edge Devices: A Fisheye Specific DFINE (media)

15:20 PM – 15:35 PM

Afternoon Coffee Break

15:35 PM – 15:55 PM

Paper Presentations and Q&A – Independent

(1) Paper_ID 15: EKI-GAN: Context-Aware Vehicle Trajectory Forecasting with Vehicle Factors and Environmental Information at Signalized Intersections (media)

(2) Paper_ID 18: Hierarchical Multi-Modal Fusion for Roadside VRU Detection: Method Complementarity Under Sparse Label Constraints (media)

15:55 PM – 16:15 PM

Award Ceremony

16:30 PM – 17:30 PM

Poster Session (29 posters)

Assigned Boards: 1-29

17:30 PM

Adjourn