2024 Challenge Tracks (Beta)

To participate, please fill out this online AI City Challenge Datasets Request Form.

Participants can compete in one or more of the following five challenges:

Challenge Track 1: Multi-Camera People Tracking

Participating teams are tasked with tracking people across multiple cameras using an expanded synthetic dataset. The scale of the dataset has seen significant enhancements: the number of cameras has increased from 129 to approximately 1,300, and the number of people tracked has risen from 156 to around 3,400. Additionally, we are providing 3D annotations and camera matrices to aid in this task. The evaluation metric for this challenge has been updated to the Higher Order Tracking Accuracy (HOTA) based on 3D distance, enabling more comprehensive measurement of tracking accuracy. A new aspect of this challenge  track is the encouragement to use online tracking, where methods utilize only the information from past frames to predict current frame results. Submissions employing online tracking will receive a 10% bonus to their HOTA score. This bonus will be instrumental in determining the winner and runner-up in cases where accuracy levels are comparable.

Challenge Track 2: Traffic Safety Description and Analysis

This task revolves around the long fine-grained video captioning of traffic safety scenarios, especially those involving pedestrian accidents. Leveraging multiple cameras and viewpoints, participants will be challenged to describe the continuous moment before the incidents, as well as the normal scene, captioning all pertinent details regarding the surrounding context, attention, location, and behavior of the pedestrian and vehicle. This task provides a new dataset WTS, featuring staged accidents with stunt drivers and pedestrians in a controlled environment, and offers a unique opportunity for detailed analysis in traffic safety scenarios. The analysis result could be valuable for wide usage across industry and society, e.g., it could lead to the streamlining of the inspection process in insurance cases and contribute to the prevention of pedestrian accidents. More features of the dataset can be referred to the dataset homepage (https://woven-visionai.github.io/wts-dataset-homepage/)The top teams of this task are planned to be invited and offered the opportunity to deploy and test their solutions in Woven City after 2025 Summer.

Challenge Track 3: Naturalistic Driving Action Recognition

Distracted driving is highly dangerous and is reported to kill about 8 people every day in the United States. Today, naturalistic driving studies and computer vision techniques provide the much needed solution to identify and eliminate distracting driving behavior on the road. However, lack of labels, poor data quality and resolution have created obstacles along the way for deriving insights from data pertaining to the driver in the real world. Naturalistic driving studies serve as an essential tool in studying driver behavior in real-time. They capture every action of the driver in the traffic environment such as those involving drowsiness or distracted behavior. In this challenge track, users will be presented with synthetic naturalistic data of the driver collected from multiple camera locations inside the vehicle. The objective is to classify the distracted behavior activities executed by the driver in a given time frame. The training dataset will consist of a diverse group of drivers with and without appearance block performing 16 different tasks (such as phone call, eating, and reaching back) that could potentially distract them from driving. The performance of this classification task will be evaluated in terms of the speed and accuracy of the model. Participating teams will have the option to use any one camera view for the classification of driver tasks.

Challenge Track 4: Road Object Detection in Fish-Eye Cameras

Fisheye lenses have gained popularity owing to their natural, wide, and omnidirectional coverage, which traditional cameras with narrow fields of view (FoV) cannot achieve. In traffic monitoring systems, fisheye cameras are advantageous as they effectively reduce the number of cameras required to cover broader views of streets and intersections. Despite these benefits, fisheye cameras present distorted views that necessitate a non-trivial design for image undistortion and unwarping or a dedicated design for handling distortions during processing. It is worth noting that, to the best of our knowledge, there is no open dataset available for fisheye road object detection for traffic surveillance applications. The datasets (FishEye8K and FishEye1Keval) comprises different traffic patterns and conditions, including urban highways, road intersections, various illumination, and viewing angles of the five road object classes in various scales.

Challenge Track 5: Detecting Violation of Helmet Rule for Motorcyclists

Motorcycles are one of the most popular modes of transportation, particularly in developing countries such as India. Due to lesser protection compared to cars and other standard vehicles, motorcycle riders are exposed to a greater risk of crashes. Therefore, wearing helmets for motorcycle riders is mandatory as per traffic rules and automatic detection of motorcyclists without helmets is one of the critical tasks to enforce strict regulatory traffic safety measures.