2025 WORKSHOP PROGRAM
This year’s AI City Challenge workshop will take place on Monday, October 20th as a full-day workshop of ICCV 2025.
Location: Room 313A, Hawai’i Convention Center, Honolulu, Hawai’i
For virtual attendance: zoom link https://us06web.zoom.us/j/87290496334 (pwd: farmcans)
Please find tentative the workshop schedule below (Hawaii Time GMT-10):
- Monday, June17, 2024
- Workshop
- www.aicityclallenge.org
07:30 AM – 08:00 AM
Breakfast
08:00 AM – 08:30 AM
Opening – Workshop Overview
08:30 AM – 09:15AM
Keynote
Speaker: Prof. Laura Leal-Taixé
09:15 AM – 10:00 AM
Keynote
Speaker: Prof. Frank Wang
10:00 AM – 10:15AM
Morning Coffee Break
10:15 AM – 11:05 AM
Paper Presentations and Q&A – Track1
(1) Paper_ID 21: Multi-Camera 3D Object Tracking via 3D Point Clouds and Re-Identification (media)
(2) Paper_ID 12: DepthTrack: Cluster Meets BEV for Multi-Camera Multi-Target 3D Tracking (media)
(3) Paper_ID 26: Online 3D Multi-Camera Perception through Robust 2D Tracking and Depth-based Late Aggregation (media)
(4) Paper_ID 23: VGCRTrack: Multi-Camera 3D Tracking with View-Aware Geometric Center Refinement (media)
(5) Paper_ID 6: MCBLT: Multi-Camera Multi-Object 3D Tracking in Long Videos (media)
11:05 AM – 12:05 PM
Paper Presentations and Q&A – Track2
(1) Paper_ID 4: TrafficInternVL: Understanding Traffic Scenarios with Vision–Language Models (media)
(2) Paper_ID 34: Multi-Agent Cooperation for Traffic Safety Description and Analysis (media)
(3) Paper_ID 14: TrafficVILA: Scaling Vision-Language Models to High-Resolution Video Understanding for Traffic Safety Analysis (media)
(4) Paper_ID 13: TrafficInternVL: Spatially-Guided Fine-Tuning with Caption Refinement for Fine-Grained Traffic Safety Captioning and Visual Question Answering (media)
(5) Paper_ID 27: TrafficVILA: A Multimodal Framework for Traffic Safety Description and Analysis (media)
(6) Paper_ID 22: Domain-Aware Enhancements to Vision-Language Models for Urban Traffic Safety Question Answering (media)
(7) Paper_ID 42: STER-VLM: Spatio-Temporal With Enhanced Reference Vision-Language Models (media)
(8) Paper_ID 16: Task-Specific Dual-Model Framework for Comprehensive Traffic Safety Video Description and Analysis (media)
12:05 PM – 13:30 PM
Lunch
13:30 PM – 14:15 PM
Paper Presentations and Q&A – Track3
(1) Paper_ID 3: Warehouse Spatial Question Answering with LLM Agent: 1st Place Solution of the 9th AI City Challenge Track 3 (media)
(2) Paper_ID 20: Multimodal and Multi-task Fusion for Spatial Reasoning (media)
(3) Paper_ID 9: SmolRGPT: Efficient Spatial Reasoning for Warehouse Environments with 600M Parameters (media)
(4) Paper_ID 11: Prompt-Guided Spatial Understanding with RGB-D Transformers for Fine-Grained Object Relation Reasoning (media)
(5) Paper_ID 41: TinyGiantVLM: A Lightweight Vision-Language Architecture for Spatial Reasoning under Resource Constraints (media)
14:15 PM – 15:20 PM
Paper Presentations and Q&A – Track4
(1) Paper_ID 39: A Lightweight and Data-Centric Framework for Real-Time Object Detection in Fisheye Camera (media)
(2) Paper_ID 5: Enhanced Fisheye Object Detection via YOLO Ensemble Learning and Weighted Box Fusion (media)
(3) Paper_ID 35: Boosting Fisheye Detection with Augmentations and Ensembles (media)
(4) Paper_ID 17: Data Augmentation Is All You Need For Robust Fisheye Object Detection (media)
(5) Paper_ID 7: A Unified Detection Pipeline for Robust Object Detection in Fisheye-Based Traffic Surveillance (media)
(6) Paper_ID 8: Augmentation, Distillation and Optimization: A Practical Pipeline for Fisheye Object Detection on Edge Devices (media)
(7) Paper_ID 25: A Real-time Vehicle Detection Pipeline with Data-centric Enhancements and Multi-stage DETR Distillation (media)
(8) Paper_ID 32: Efficient and Distortion-Aware Fisheye Object Detection for Edge Devices (media)
(9) Paper_ID 33: Real-Time Object Detection on Edge Devices: A Fisheye Specific DFINE (media)
15:20 PM – 15:35 PM
Afternoon Coffee Break
15:35 PM – 15:55 PM
Paper Presentations and Q&A – Independent
(1) Paper_ID 15: EKI-GAN: Context-Aware Vehicle Trajectory Forecasting with Vehicle Factors and Environmental Information at Signalized Intersections (media)
(2) Paper_ID 18: Hierarchical Multi-Modal Fusion for Roadside VRU Detection: Method Complementarity Under Sparse Label Constraints (media)
15:55 PM – 16:15 PM
Award Ceremony
16:30 PM – 17:30 PM
Poster Session (29 posters)
Assigned Boards: 1-29
17:30 PM
Adjourn
