Thrust 2: Situational Awareness

Advancing real-time visual analytics through multi-object detection and tracking, relationship detection, multi-scale activity detection, and object re-identification while protecting privacy and trust.

Project 1: Data and Systems Support for Scene and Activity Understanding

Computer vision serves as the foundation of situational awareness in streetscape applications. Modern computer vision approaches rely on machine learning, which dictates the need for training across large datasets, as well as computing workflows that optimize training and inference. For the former, CS3 is developing training datasets required to achieve high-fidelity situational awareness through its own testbeds and third-party sources and developing mechanisms to automate the labeling and curation of these datasets. For the latter, CS3 is developing new workflows to optimize execution performance of training and inference workloads, while introducing protections that preserve situational awareness without revealing personally identifiable information.

Dataset Collection from Testbeds and Internet

Automated Image Labeling Based on Joint Text and Image Learning

Privacy- or Detection-Preserving Encoding Pipelines

Project 2: Scene and Activity Understanding

Detecting and understanding streetscape objects (e.g., pedestrians, vehicles) and activities (e.g., cross the street) is the core of situational awareness. Beyond the inherent complexity and dynamism of modern streetscape objects and activities, these tasks are complicated by a broad range of factors, from low-resolution camera feeds, to environmental occlusions (e.g., fog, rain, traffic), to scenes that cannot be captured by a single camera. To address these challenges, CS3 is advancing the fundamental science and engineering of scene and activity understanding for complex streetscape scenarios, under variable resolution and occlusion, over multi-camera networks.

Hierarchical Activity Recognition Under Low Resolution

4D Dynamic Scene Completion

Visual Question Answering

Multi-Object Tracking and Re-Identification

Project 3: Trajectory Analysis and Prediction

Future streetscape applications depend not only on the current state of the streetscape, but on the anticipated future state. As examples, future smart intersections must anticipate a pedestrian’s intent to cross, and future traffic safety applications must anticipate a vehicle’s intent to change lanes, even if the driver fails to signal. CS3 is developing new mechanisms and systems to forecast object trajectories within modern streetscapes over near and far time horizons (e.g., 1s, 10s) to enable these applications.

AI-Based, Physics-Informed Crowd Modeling and Prediction

Future streetscape applications require an accurate understanding of how road users move and interact. Given a large amount of data extracted from cameras and other sensors, AI-based methods are robust to train a machine learning model to predict their future trajectories. However, due to stochasticity of pedestrians and cars, physics-based knowledge could help inform the prediction with domain knowledge, especially when their movements are constrained by interactions with other road users. We propose a physics-informed machine learning scheme to simultaneously model the movement of multiple road users at road intersections and on road segments.

Project 4: Multi-Modal Integration

Computer vision, which primarily relies upon video data, plays an important role in realizing situational awareness across many streetscape applications. In some cases, however, non-video data sources can complement or replace the computer vision pipeline. This includes LiDAR, mmWave radar, environmental sensors, RTK/GPS, and data (intentionally) provided by pedestrians (e.g., inertial data from smartphones). CS3 is exploring the ways in which these multi-modal sources can be integrated within its situational awareness framework to enhance scene and activity understanding.

Exploiting Non-Vision-based Modalities

This cross-cutting research involves the fusion of different data streams for the purpose of tracking, classification and opt-in user identification. Data sources include video, mmWave, GPS, Lidar, IMU sensing as well as multi-camera or moving camera images. Physics can be injected into the fusion schemes in order to more accurately estimate positions and velocities of people or objects.

Inference of Wind Dynamics for Drone Path Planning

The adoption of drone technologies in urban areas have been stymied by legitimate concerns surrounding safety due to complex air currents in the urban canopy. This work develops fundamental approaches to estimate wind forces through inference using the discrepancy between the actual and target flight trajectory as well as on-board IMU sensing. Drone reliability in urban environments can significantly impact a variety of applications, including urban infrastructure inspection, deliveries and public safety monitoring.

Cross-Modal Entity Recognition

Numerous streetscape applications rely on the identification within a streetscape scene of a user who has opted in to a service. This can be done by correlating the user’s device IMU sensor information, for example, with the motion information extracted from camera video tracking of individuals. This presents a seamless means of identifying, within a crowded streetscape scene, the blind user in the wayfinding application, for example.

Recent Publications

%7B%22status%22%3A%22success%22%2C%22updateneeded%22%3Afalse%2C%22instance%22%3Afalse%2C%22meta%22%3A%7B%22request_last%22%3A0%2C%22request_next%22%3A0%2C%22used_cache%22%3Atrue%7D%2C%22data%22%3A%5B%7B%22key%22%3A%22QRDMB2QB%22%2C%22library%22%3A%7B%22id%22%3A5017967%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Dave%20et%20al.%22%2C%22parsedDate%22%3A%222024-10%22%2C%22numChildren%22%3A1%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%202%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EDave%2C%20Ishan%20Rajendrakumar%2C%20et%20al.%20%26%23x201C%3BFinePseudo%3A%20Improving%20Pseudo-Labelling%20through%20Temporal-Alignablity%20for%20Semi-Supervised%20Fine-Grained%20Action%20Recognition.%26%23x201D%3B%20%3Ci%3EComputer%20Vision%20%26%23x2013%3B%20Eccv%202024%3A%2018th%20European%20Conference%2C%20Milan%2C%20Italy%2C%20September%2029%26%23x2013%3BOctober%204%2C%202024%2C%20Proceedings%2C%20Part%20Viii%3C%5C%2Fi%3E%2C%20Springer-Verlag%2C%202024%2C%20pp.%20389%26%23x2013%3B408%2C%20%3Ca%20class%3D%27zp-ItemURL%27%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1007%5C%2F978-3-031-73242-3_22%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1007%5C%2F978-3-031-73242-3_22%3C%5C%2Fa%3E.%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22FinePseudo%3A%20improving%20pseudo-labelling%20through%20temporal-alignablity%20for%20semi-supervised%20fine-grained%20action%20recognition%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ishan%20Rajendrakumar%22%2C%22lastName%22%3A%22Dave%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mamshad%20Nayeem%22%2C%22lastName%22%3A%22Rizve%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mubarak%22%2C%22lastName%22%3A%22Shah%22%7D%5D%2C%22abstractNote%22%3A%22Real-life%20applications%20of%20action%20recognition%20often%20require%20a%20fine-grained%20understanding%20of%20subtle%20movements%2C%20e.g.%2C%20in%20sports%20analytics%2C%20user%20interactions%20in%20AR%5C%2FVR%2C%20and%20surgical%20videos.%20Although%20fine-grained%20actions%20are%20more%20costly%20to%20annotate%2C%20existing%20semi-supervised%20action%20recognition%20has%20mainly%20focused%20on%20coarse-grained%20action%20recognition.%20Since%20fine-grained%20actions%20are%20more%20challenging%20due%20to%20the%20absence%20of%20scene%20bias%2C%20classifying%20these%20actions%20requires%20an%20understanding%20of%20action-phases.%20Hence%2C%20existing%20coarse-grained%20semi-supervised%20methods%20do%20not%20work%20effectively.%20In%20this%20work%2C%20we%20for%20the%20first%20time%20thoroughly%20investigate%20semi-supervised%20fine-grained%20action%20recognition%20%28FGAR%29.%20We%20observe%20that%20alignment%20distances%20like%20dynamic%20time%20warping%20%28DTW%29%20provide%20a%20suitable%20action-phase-aware%20measure%20for%20comparing%20fine-grained%20actions%2C%20a%20concept%20previously%20unexploited%20in%20FGAR.%20However%2C%20since%20regular%20DTW%20distance%20is%20pairwise%20and%20assumes%20strict%20alignment%20between%20pairs%2C%20it%20is%20not%20directly%20suitable%20for%20classifying%20fine-grained%20actions.%20To%20utilize%20such%20alignment%20distances%20in%20a%20limited-label%20setting%2C%20we%20propose%20an%20Alignability-Verification-based%20Metric%20learning%20technique%20to%20effectively%20discriminate%20between%20fine-grained%20action%20pairs.%20Our%20learnable%20alignability%20score%20provides%20a%20better%20phase-aware%20measure%2C%20which%20we%20use%20to%20refine%20the%20pseudo-labels%20of%20the%20primary%20video%20encoder.%20Our%20collaborative%20pseudo-labeling-based%20framework%20%5Cu2018FinePseudo%5Cu2019%20significantly%20outperforms%20prior%20methods%20on%20four%20fine-grained%20action%20recognition%20datasets%3A%20Diving48%2C%20FineGym99%2C%20FineGym288%2C%20and%20FineDiving%2C%20and%20shows%20improvement%20on%20existing%20coarse-grained%20datasets%3A%20Kinetics400%20and%20Something-SomethingV2.%20We%20also%20demonstrate%20the%20robustness%20of%20our%20collaborative%20pseudo-labeling%20in%20handling%20novel%20unlabeled%20classes%20in%20open-world%20semi-supervised%20setups.%22%2C%22date%22%3A%222024-10%22%2C%22proceedingsTitle%22%3A%22Computer%20Vision%20%5Cu2013%20Eccv%202024%3A%2018th%20European%20Conference%2C%20Milan%2C%20Italy%2C%20September%2029%5Cu2013october%204%2C%202024%2C%20Proceedings%2C%20Part%20Viii%22%2C%22conferenceName%22%3A%22%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.1007%5C%2F978-3-031-73242-3_22%22%2C%22ISBN%22%3A%22978-3-031-73241-6%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1007%5C%2F978-3-031-73242-3_22%22%2C%22collections%22%3A%5B%22WWKB5UDW%22%5D%2C%22dateModified%22%3A%222025-03-06T21%3A00%3A23Z%22%7D%7D%2C%7B%22key%22%3A%22RBTJ73TH%22%2C%22library%22%3A%7B%22id%22%3A5017967%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Dave%20et%20al.%22%2C%22parsedDate%22%3A%222024-10%22%2C%22numChildren%22%3A1%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%202%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EDave%2C%20Ishan%20Rajendrakumar%2C%20et%20al.%20%26%23x201C%3BFinePseudo%3A%20Improving%20Pseudo-Labelling%20through%20Temporal-Alignablity%20for%20Semi-Supervised%20Fine-Grained%20Action%20Recognition.%26%23x201D%3B%20%3Ci%3EComputer%20Vision%20%26%23x2013%3B%20Eccv%202024%3A%2018th%20European%20Conference%2C%20Milan%2C%20Italy%2C%20September%2029%26%23x2013%3BOctober%204%2C%202024%2C%20Proceedings%2C%20Part%20Viii%3C%5C%2Fi%3E%2C%20Springer-Verlag%2C%202024%2C%20pp.%20389%26%23x2013%3B408%2C%20%3Ca%20class%3D%27zp-ItemURL%27%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1007%5C%2F978-3-031-73242-3_22%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1007%5C%2F978-3-031-73242-3_22%3C%5C%2Fa%3E.%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22FinePseudo%3A%20improving%20pseudo-labelling%20through%20temporal-alignablity%20for%20semi-supervised%20fine-grained%20action%20recognition%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ishan%20Rajendrakumar%22%2C%22lastName%22%3A%22Dave%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mamshad%20Nayeem%22%2C%22lastName%22%3A%22Rizve%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mubarak%22%2C%22lastName%22%3A%22Shah%22%7D%5D%2C%22abstractNote%22%3A%22Real-life%20applications%20of%20action%20recognition%20often%20require%20a%20fine-grained%20understanding%20of%20subtle%20movements%2C%20e.g.%2C%20in%20sports%20analytics%2C%20user%20interactions%20in%20AR%5C%2FVR%2C%20and%20surgical%20videos.%20Although%20fine-grained%20actions%20are%20more%20costly%20to%20annotate%2C%20existing%20semi-supervised%20action%20recognition%20has%20mainly%20focused%20on%20coarse-grained%20action%20recognition.%20Since%20fine-grained%20actions%20are%20more%20challenging%20due%20to%20the%20absence%20of%20scene%20bias%2C%20classifying%20these%20actions%20requires%20an%20understanding%20of%20action-phases.%20Hence%2C%20existing%20coarse-grained%20semi-supervised%20methods%20do%20not%20work%20effectively.%20In%20this%20work%2C%20we%20for%20the%20first%20time%20thoroughly%20investigate%20semi-supervised%20fine-grained%20action%20recognition%20%28FGAR%29.%20We%20observe%20that%20alignment%20distances%20like%20dynamic%20time%20warping%20%28DTW%29%20provide%20a%20suitable%20action-phase-aware%20measure%20for%20comparing%20fine-grained%20actions%2C%20a%20concept%20previously%20unexploited%20in%20FGAR.%20However%2C%20since%20regular%20DTW%20distance%20is%20pairwise%20and%20assumes%20strict%20alignment%20between%20pairs%2C%20it%20is%20not%20directly%20suitable%20for%20classifying%20fine-grained%20actions.%20To%20utilize%20such%20alignment%20distances%20in%20a%20limited-label%20setting%2C%20we%20propose%20an%20Alignability-Verification-based%20Metric%20learning%20technique%20to%20effectively%20discriminate%20between%20fine-grained%20action%20pairs.%20Our%20learnable%20alignability%20score%20provides%20a%20better%20phase-aware%20measure%2C%20which%20we%20use%20to%20refine%20the%20pseudo-labels%20of%20the%20primary%20video%20encoder.%20Our%20collaborative%20pseudo-labeling-based%20framework%20%5Cu2018FinePseudo%5Cu2019%20significantly%20outperforms%20prior%20methods%20on%20four%20fine-grained%20action%20recognition%20datasets%3A%20Diving48%2C%20FineGym99%2C%20FineGym288%2C%20and%20FineDiving%2C%20and%20shows%20improvement%20on%20existing%20coarse-grained%20datasets%3A%20Kinetics400%20and%20Something-SomethingV2.%20We%20also%20demonstrate%20the%20robustness%20of%20our%20collaborative%20pseudo-labeling%20in%20handling%20novel%20unlabeled%20classes%20in%20open-world%20semi-supervised%20setups.%22%2C%22date%22%3A%222024-10%22%2C%22proceedingsTitle%22%3A%22Computer%20Vision%20%5Cu2013%20Eccv%202024%3A%2018th%20European%20Conference%2C%20Milan%2C%20Italy%2C%20September%2029%5Cu2013october%204%2C%202024%2C%20Proceedings%2C%20Part%20Viii%22%2C%22conferenceName%22%3A%22%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.1007%5C%2F978-3-031-73242-3_22%22%2C%22ISBN%22%3A%22978-3-031-73241-6%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1007%5C%2F978-3-031-73242-3_22%22%2C%22collections%22%3A%5B%22BJBN7N3S%22%5D%2C%22dateModified%22%3A%222025-03-05T16%3A36%3A48Z%22%7D%7D%2C%7B%22key%22%3A%22S3PIS5NJ%22%2C%22library%22%3A%7B%22id%22%3A5017967%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Dave%20et%20al.%22%2C%22parsedDate%22%3A%222024-10%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%202%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EDave%2C%20Ishan%20Rajendrakumar%2C%20et%20al.%20%26%23x201C%3BFinePseudo%3A%20Improving%20Pseudo-Labelling%20through%20Temporal-Alignablity%20for%20Semi-Supervised%20Fine-Grained%20Action%20Recognition.%26%23x201D%3B%20%3Ci%3EComputer%20Vision%20%26%23x2013%3B%20Eccv%202024%3A%2018th%20European%20Conference%2C%20Milan%2C%20Italy%2C%20September%2029%26%23x2013%3BOctober%204%2C%202024%2C%20Proceedings%2C%20Part%20Viii%3C%5C%2Fi%3E%2C%20Springer-Verlag%2C%202024%2C%20pp.%20389%26%23x2013%3B408%2C%20%3Ca%20class%3D%27zp-ItemURL%27%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1007%5C%2F978-3-031-73242-3_22%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1007%5C%2F978-3-031-73242-3_22%3C%5C%2Fa%3E.%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22FinePseudo%3A%20improving%20pseudo-labelling%20through%20temporal-alignablity%20for%20semi-supervised%20fine-grained%20action%20recognition%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ishan%20Rajendrakumar%22%2C%22lastName%22%3A%22Dave%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mamshad%20Nayeem%22%2C%22lastName%22%3A%22Rizve%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mubarak%22%2C%22lastName%22%3A%22Shah%22%7D%5D%2C%22abstractNote%22%3A%22Real-life%20applications%20of%20action%20recognition%20often%20require%20a%20fine-grained%20understanding%20of%20subtle%20movements%2C%20e.g.%2C%20in%20sports%20analytics%2C%20user%20interactions%20in%20AR%5C%2FVR%2C%20and%20surgical%20videos.%20Although%20fine-grained%20actions%20are%20more%20costly%20to%20annotate%2C%20existing%20semi-supervised%20action%20recognition%20has%20mainly%20focused%20on%20coarse-grained%20action%20recognition.%20Since%20fine-grained%20actions%20are%20more%20challenging%20due%20to%20the%20absence%20of%20scene%20bias%2C%20classifying%20these%20actions%20requires%20an%20understanding%20of%20action-phases.%20Hence%2C%20existing%20coarse-grained%20semi-supervised%20methods%20do%20not%20work%20effectively.%20In%20this%20work%2C%20we%20for%20the%20first%20time%20thoroughly%20investigate%20semi-supervised%20fine-grained%20action%20recognition%20%28FGAR%29.%20We%20observe%20that%20alignment%20distances%20like%20dynamic%20time%20warping%20%28DTW%29%20provide%20a%20suitable%20action-phase-aware%20measure%20for%20comparing%20fine-grained%20actions%2C%20a%20concept%20previously%20unexploited%20in%20FGAR.%20However%2C%20since%20regular%20DTW%20distance%20is%20pairwise%20and%20assumes%20strict%20alignment%20between%20pairs%2C%20it%20is%20not%20directly%20suitable%20for%20classifying%20fine-grained%20actions.%20To%20utilize%20such%20alignment%20distances%20in%20a%20limited-label%20setting%2C%20we%20propose%20an%20Alignability-Verification-based%20Metric%20learning%20technique%20to%20effectively%20discriminate%20between%20fine-grained%20action%20pairs.%20Our%20learnable%20alignability%20score%20provides%20a%20better%20phase-aware%20measure%2C%20which%20we%20use%20to%20refine%20the%20pseudo-labels%20of%20the%20primary%20video%20encoder.%20Our%20collaborative%20pseudo-labeling-based%20framework%20%5Cu2018FinePseudo%5Cu2019%20significantly%20outperforms%20prior%20methods%20on%20four%20fine-grained%20action%20recognition%20datasets%3A%20Diving48%2C%20FineGym99%2C%20FineGym288%2C%20and%20FineDiving%2C%20and%20shows%20improvement%20on%20existing%20coarse-grained%20datasets%3A%20Kinetics400%20and%20Something-SomethingV2.%20We%20also%20demonstrate%20the%20robustness%20of%20our%20collaborative%20pseudo-labeling%20in%20handling%20novel%20unlabeled%20classes%20in%20open-world%20semi-supervised%20setups.%22%2C%22date%22%3A%222024-10%22%2C%22proceedingsTitle%22%3A%22Computer%20Vision%20%5Cu2013%20Eccv%202024%3A%2018th%20European%20Conference%2C%20Milan%2C%20Italy%2C%20September%2029%5Cu2013october%204%2C%202024%2C%20Proceedings%2C%20Part%20Viii%22%2C%22conferenceName%22%3A%22%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.1007%5C%2F978-3-031-73242-3_22%22%2C%22ISBN%22%3A%22978-3-031-73241-6%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1007%5C%2F978-3-031-73242-3_22%22%2C%22collections%22%3A%5B%22BGCBXSQ3%22%5D%2C%22dateModified%22%3A%222025-02-28T13%3A52%3A19Z%22%7D%7D%2C%7B%22key%22%3A%22C8FY3S7J%22%2C%22library%22%3A%7B%22id%22%3A5017967%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Khalili%20and%20Smyth%22%2C%22parsedDate%22%3A%222024-09-25%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%202%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EKhalili%2C%20Boshra%2C%20and%20Andrew%20W.%20Smyth.%20%26%23x201C%3BSOD-YOLOv8%26%23x2014%3BEnhancing%20YOLOv8%20for%20Small%20Object%20Detection%20in%20Aerial%20Imagery%20and%20Traffic%20Scenes.%26%23x201D%3B%20%3Ci%3ESensors%3C%5C%2Fi%3E%2C%20vol.%2024%2C%20no.%2019%2C%20Sept.%202024%2C%20p.%206209%2C%20%3Ca%20class%3D%27zp-DOIURL%27%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.3390%5C%2Fs24196209%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.3390%5C%2Fs24196209%3C%5C%2Fa%3E.%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22SOD-YOLOv8%5Cu2014Enhancing%20YOLOv8%20for%20Small%20Object%20Detection%20in%20Aerial%20Imagery%20and%20Traffic%20Scenes%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Boshra%22%2C%22lastName%22%3A%22Khalili%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Andrew%20W.%22%2C%22lastName%22%3A%22Smyth%22%7D%5D%2C%22abstractNote%22%3A%22Object%20detection%2C%20as%20a%20crucial%20aspect%20of%20computer%20vision%2C%20plays%20a%20vital%20role%20in%20traffic%20management%2C%20emergency%20response%2C%20autonomous%20vehicles%2C%20and%20smart%20cities.%20Despite%20the%20significant%20advancements%20in%20object%20detection%2C%20detecting%20small%20objects%20in%20images%20captured%20by%20high-altitude%20cameras%20remains%20challenging%2C%20due%20to%20factors%20such%20as%20object%20size%2C%20distance%20from%20the%20camera%2C%20varied%20shapes%2C%20and%20cluttered%20backgrounds.%20To%20address%20these%20challenges%2C%20we%20propose%20small%20object%20detection%20YOLOv8%20%28SOD-YOLOv8%29%2C%20a%20novel%20model%20specifically%20designed%20for%20scenarios%20involving%20numerous%20small%20objects.%20Inspired%20by%20efficient%20generalized%20feature%20pyramid%20networks%20%28GFPNs%29%2C%20we%20enhance%20multi-path%20fusion%20within%20YOLOv8%20to%20integrate%20features%20across%20different%20levels%2C%20preserving%20details%20from%20shallower%20layers%20and%20improving%20small%20object%20detection%20accuracy.%20Additionally%2C%20we%20introduce%20a%20fourth%20detection%20layer%20to%20effectively%20utilize%20high-resolution%20spatial%20information.%20The%20efficient%20multi-scale%20attention%20module%20%28EMA%29%20in%20the%20C2f-EMA%20module%20further%20enhances%20feature%20extraction%20by%20redistributing%20weights%20and%20prioritizing%20relevant%20features.%20We%20introduce%20powerful-IoU%20%28PIoU%29%20as%20a%20replacement%20for%20CIoU%2C%20focusing%20on%20moderate%20quality%20anchor%20boxes%20and%20adding%20a%20penalty%20based%20on%20differences%20between%20predicted%20and%20ground%20truth%20bounding%20box%20corners.%20This%20approach%20simplifies%20calculations%2C%20speeds%20up%20convergence%2C%20and%20enhances%20detection%20accuracy.%20SOD-YOLOv8%20significantly%20improves%20small%20object%20detection%2C%20surpassing%20widely%20used%20models%20across%20various%20metrics%2C%20without%20substantially%20increasing%20the%20computational%20cost%20or%20latency%20compared%20to%20YOLOv8s.%20Specifically%2C%20it%20increased%20recall%20from%2040.1%25%20to%2043.9%25%2C%20precision%20from%2051.2%25%20to%2053.9%25%2C%20mAP0.5%20from%2040.6%25%20to%2045.1%25%2C%20and%20mAP0.5%3A0.95%20from%2024%25%20to%2026.6%25.%20Furthermore%2C%20experiments%20conducted%20in%20dynamic%20real-world%20traffic%20scenes%20illustrated%20SOD-YOLOv8%5Cu2019s%20significant%20enhancements%20across%20diverse%20environmental%20conditions%2C%20highlighting%20its%20reliability%20and%20effective%20object%20detection%20capabilities%20in%20challenging%20scenarios.%22%2C%22date%22%3A%222024-09-25%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.3390%5C%2Fs24196209%22%2C%22ISSN%22%3A%221424-8220%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fwww.mdpi.com%5C%2F1424-8220%5C%2F24%5C%2F19%5C%2F6209%22%2C%22collections%22%3A%5B%22BGCBXSQ3%22%5D%2C%22dateModified%22%3A%222025-02-28T13%3A52%3A19Z%22%7D%7D%2C%7B%22key%22%3A%22EEYUQHNH%22%2C%22library%22%3A%7B%22id%22%3A5017967%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Chang%20et%20al.%22%2C%22parsedDate%22%3A%222024-09%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%202%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EChang%2C%20Che-Jui%2C%20et%20al.%20%26%23x201C%3BLearning%20from%20Synthetic%20Human%20Group%20Activities.%26%23x201D%3B%20%3Ci%3EProceedings%20of%20the%20IEEE%5C%2FCVF%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition%3C%5C%2Fi%3E%2C%20IEEE%2C%202024%2C%20pp.%2021922%26%23x2013%3B32%2C%20%3Ca%20class%3D%27zp-DOIURL%27%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1109%5C%2FCVPR52733.2024.02070%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1109%5C%2FCVPR52733.2024.02070%3C%5C%2Fa%3E.%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22Learning%20from%20synthetic%20human%20group%20activities%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Che-Jui%22%2C%22lastName%22%3A%22Chang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Danrui%22%2C%22lastName%22%3A%22Li%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Deep%22%2C%22lastName%22%3A%22Patel%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Parth%22%2C%22lastName%22%3A%22Goel%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Honglu%22%2C%22lastName%22%3A%22Zhou%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Seonghyeon%22%2C%22lastName%22%3A%22Moon%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Samuel%20S%22%2C%22lastName%22%3A%22Sohn%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Sejong%22%2C%22lastName%22%3A%22Yoon%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Vladimir%22%2C%22lastName%22%3A%22Pavlovic%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mubbasir%22%2C%22lastName%22%3A%22Kapadia%22%7D%5D%2C%22abstractNote%22%3A%22The%20study%20of%20complex%20human%20interactions%20and%20group%20activities%20has%20become%20a%20focal%20point%20in%20human-centric%20computer%20vision.%20However%2C%20progress%20in%20related%20tasks%20is%20often%20hindered%20by%20the%20challenges%20of%20obtaining%20large-scale%20labeled%20datasets%20from%20real-world%20scenarios.%20To%20address%20the%20limitation%2C%20we%20introduce%20M3%20Act%2C%20a%20synthetic%20data%20generator%20for%20multi-view%20multi-group%20multi-person%20human%20atomic%20actions%20and%20group%20activities.%20Powered%20by%20Unity%20Engine%2C%20M3%20Act%20features%20mul-tiple%20semantic%20groups%2C%20highly%20diverse%20and%20photorealistic%20images%2C%20and%20a%20comprehensive%20set%20of%20annotations%2C%20which%20facilitates%20the%20learning%20of%20human-centered%20tasks%20across%20single-person%2C%20multi-person%2C%20and%20multi-group%20conditions.%20We%20demonstrate%20the%20advantages%20of%20M3%20Act%20across%20three%20core%20experiments.%20The%20results%20suggest%20our%20synthetic%20dataset%20can%20significantly%20improve%20the%20performance%20of%20several%20downstream%20methods%20and%20replace%20real-world%20datasets%20to%20reduce%20cost.%20Notably%2C%20M3%20Act%20improves%20the%20state-of-the-art%20MOTRv2%20on%20DanceTrack%20dataset%2C%20leading%20to%20a%20hop%20on%20the%20leaderboard%20from%2010th%20to%202nd%20place.%20Moreover%2C%20M3%20Act%20opens%20new%20research%20for%20controllable%203D%20group%20activity%20generation.%20We%20define%20multiple%20metrics%20and%20propose%20a%20competitive%20baseline%20for%20the%20novel%20task.%20Our%20code%20and%20data%20are%20available%20at%20our%20project%20page%3A%20http%3A%5C%2F%5C%2Fcjerry1243.github.io%5C%2FM3Act.%22%2C%22date%22%3A%222024-09%22%2C%22proceedingsTitle%22%3A%22Proceedings%20of%20the%20IEEE%5C%2FCVF%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition%22%2C%22conferenceName%22%3A%22%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.1109%5C%2FCVPR52733.2024.02070%22%2C%22ISBN%22%3A%22979-8-3503-5300-6%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fieeexplore.ieee.org%5C%2Fdocument%5C%2F10657474%5C%2F%22%2C%22collections%22%3A%5B%22WWKB5UDW%22%5D%2C%22dateModified%22%3A%222025-03-06T21%3A00%3A22Z%22%7D%7D%2C%7B%22key%22%3A%22J34VZRYA%22%2C%22library%22%3A%7B%22id%22%3A5017967%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Khalili%20and%20Smyth%22%2C%22parsedDate%22%3A%222024-09%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%202%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EKhalili%2C%20Boshra%2C%20and%20Andrew%20W.%20Smyth.%20%26%23x201C%3BSOD-YOLOv8%26%23x2014%3BEnhancing%20YOLOv8%20for%20Small%20Object%20Detection%20in%20Aerial%20Imagery%20and%20Traffic%20Scenes.%26%23x201D%3B%20%3Ci%3ESensors%3C%5C%2Fi%3E%2C%20vol.%2024%2C%20no.%2019%2C%20Sept.%202024%2C%20p.%206209%2C%20%3Ca%20class%3D%27zp-DOIURL%27%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.3390%5C%2Fs24196209%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.3390%5C%2Fs24196209%3C%5C%2Fa%3E.%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22SOD-YOLOv8%5Cu2014Enhancing%20YOLOv8%20for%20Small%20Object%20Detection%20in%20Aerial%20Imagery%20and%20Traffic%20Scenes%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Boshra%22%2C%22lastName%22%3A%22Khalili%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Andrew%20W.%22%2C%22lastName%22%3A%22Smyth%22%7D%5D%2C%22abstractNote%22%3A%22Object%20detection%2C%20as%20a%20crucial%20aspect%20of%20computer%20vision%2C%20plays%20a%20vital%20role%20in%20traffic%20management%2C%20emergency%20response%2C%20autonomous%20vehicles%2C%20and%20smart%20cities.%20Despite%20the%20significant%20advancements%20in%20object%20detection%2C%20detecting%20small%20objects%20in%20images%20captured%20by%20high-altitude%20cameras%20remains%20challenging%2C%20due%20to%20factors%20such%20as%20object%20size%2C%20distance%20from%20the%20camera%2C%20varied%20shapes%2C%20and%20cluttered%20backgrounds.%20To%20address%20these%20challenges%2C%20we%20propose%20small%20object%20detection%20YOLOv8%20%28SOD-YOLOv8%29%2C%20a%20novel%20model%20specifically%20designed%20for%20scenarios%20involving%20numerous%20small%20objects.%20Inspired%20by%20efficient%20generalized%20feature%20pyramid%20networks%20%28GFPNs%29%2C%20we%20enhance%20multi-path%20fusion%20within%20YOLOv8%20to%20integrate%20features%20across%20different%20levels%2C%20preserving%20details%20from%20shallower%20layers%20and%20improving%20small%20object%20detection%20accuracy.%20Additionally%2C%20we%20introduce%20a%20fourth%20detection%20layer%20to%20effectively%20utilize%20high-resolution%20spatial%20information.%20The%20efficient%20multi-scale%20attention%20module%20%28EMA%29%20in%20the%20C2f-EMA%20module%20further%20enhances%20feature%20extraction%20by%20redistributing%20weights%20and%20prioritizing%20relevant%20features.%20We%20introduce%20powerful-IoU%20%28PIoU%29%20as%20a%20replacement%20for%20CIoU%2C%20focusing%20on%20moderate%20quality%20anchor%20boxes%20and%20adding%20a%20penalty%20based%20on%20differences%20between%20predicted%20and%20ground%20truth%20bounding%20box%20corners.%20This%20approach%20simplifies%20calculations%2C%20speeds%20up%20convergence%2C%20and%20enhances%20detection%20accuracy.%20SOD-YOLOv8%20significantly%20improves%20small%20object%20detection%2C%20surpassing%20widely%20used%20models%20across%20various%20metrics%2C%20without%20substantially%20increasing%20the%20computational%20cost%20or%20latency%20compared%20to%20YOLOv8s.%20Specifically%2C%20it%20increased%20recall%20from%2040.1%25%20to%2043.9%25%2C%20precision%20from%2051.2%25%20to%2053.9%25%2C%20mAP0.5%20from%2040.6%25%20to%2045.1%25%2C%20and%20mAP0.5%3A0.95%20from%2024%25%20to%2026.6%25.%20Furthermore%2C%20experiments%20conducted%20in%20dynamic%20real-world%20traffic%20scenes%20illustrated%20SOD-YOLOv8%5Cu2019s%20significant%20enhancements%20across%20diverse%20environmental%20conditions%2C%20highlighting%20its%20reliability%20and%20effective%20object%20detection%20capabilities%20in%20challenging%20scenarios.%22%2C%22date%22%3A%222024-09%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.3390%5C%2Fs24196209%22%2C%22ISSN%22%3A%221424-8220%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fwww.mdpi.com%5C%2F1424-8220%5C%2F24%5C%2F19%5C%2F6209%22%2C%22collections%22%3A%5B%22WWKB5UDW%22%5D%2C%22dateModified%22%3A%222025-03-06T21%3A00%3A21Z%22%7D%7D%2C%7B%22key%22%3A%22JJJVDH3I%22%2C%22library%22%3A%7B%22id%22%3A5017967%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Chang%20et%20al.%22%2C%22parsedDate%22%3A%222024-09%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%202%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EChang%2C%20Che-Jui%2C%20et%20al.%20%26%23x201C%3BLearning%20from%20Synthetic%20Human%20Group%20Activities.%26%23x201D%3B%20%3Ci%3EProceedings%20of%20the%20IEEE%5C%2FCVF%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition%3C%5C%2Fi%3E%2C%20IEEE%2C%202024%2C%20pp.%2021922%26%23x2013%3B32%2C%20%3Ca%20class%3D%27zp-DOIURL%27%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1109%5C%2FCVPR52733.2024.02070%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1109%5C%2FCVPR52733.2024.02070%3C%5C%2Fa%3E.%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22Learning%20from%20synthetic%20human%20group%20activities%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Che-Jui%22%2C%22lastName%22%3A%22Chang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Danrui%22%2C%22lastName%22%3A%22Li%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Deep%22%2C%22lastName%22%3A%22Patel%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Parth%22%2C%22lastName%22%3A%22Goel%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Honglu%22%2C%22lastName%22%3A%22Zhou%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Seonghyeon%22%2C%22lastName%22%3A%22Moon%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Samuel%20S%22%2C%22lastName%22%3A%22Sohn%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Sejong%22%2C%22lastName%22%3A%22Yoon%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Vladimir%22%2C%22lastName%22%3A%22Pavlovic%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mubbasir%22%2C%22lastName%22%3A%22Kapadia%22%7D%5D%2C%22abstractNote%22%3A%22The%20study%20of%20complex%20human%20interactions%20and%20group%20activities%20has%20become%20a%20focal%20point%20in%20human-centric%20computer%20vision.%20However%2C%20progress%20in%20related%20tasks%20is%20often%20hindered%20by%20the%20challenges%20of%20obtaining%20large-scale%20labeled%20datasets%20from%20real-world%20scenarios.%20To%20address%20the%20limitation%2C%20we%20introduce%20M3%20Act%2C%20a%20synthetic%20data%20generator%20for%20multi-view%20multi-group%20multi-person%20human%20atomic%20actions%20and%20group%20activities.%20Powered%20by%20Unity%20Engine%2C%20M3%20Act%20features%20mul-tiple%20semantic%20groups%2C%20highly%20diverse%20and%20photorealistic%20images%2C%20and%20a%20comprehensive%20set%20of%20annotations%2C%20which%20facilitates%20the%20learning%20of%20human-centered%20tasks%20across%20single-person%2C%20multi-person%2C%20and%20multi-group%20conditions.%20We%20demonstrate%20the%20advantages%20of%20M3%20Act%20across%20three%20core%20experiments.%20The%20results%20suggest%20our%20synthetic%20dataset%20can%20significantly%20improve%20the%20performance%20of%20several%20downstream%20methods%20and%20replace%20real-world%20datasets%20to%20reduce%20cost.%20Notably%2C%20M3%20Act%20improves%20the%20state-of-the-art%20MOTRv2%20on%20DanceTrack%20dataset%2C%20leading%20to%20a%20hop%20on%20the%20leaderboard%20from%2010th%20to%202nd%20place.%20Moreover%2C%20M3%20Act%20opens%20new%20research%20for%20controllable%203D%20group%20activity%20generation.%20We%20define%20multiple%20metrics%20and%20propose%20a%20competitive%20baseline%20for%20the%20novel%20task.%20Our%20code%20and%20data%20are%20available%20at%20our%20project%20page%3A%20http%3A%5C%2F%5C%2Fcjerry1243.github.io%5C%2FM3Act.%22%2C%22date%22%3A%222024-09%22%2C%22proceedingsTitle%22%3A%22Proceedings%20of%20the%20IEEE%5C%2FCVF%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition%22%2C%22conferenceName%22%3A%22%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.1109%5C%2FCVPR52733.2024.02070%22%2C%22ISBN%22%3A%22979-8-3503-5300-6%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fieeexplore.ieee.org%5C%2Fdocument%5C%2F10657474%5C%2F%22%2C%22collections%22%3A%5B%22BJBN7N3S%22%5D%2C%22dateModified%22%3A%222025-03-05T16%3A36%3A47Z%22%7D%7D%2C%7B%22key%22%3A%22WC287NZ8%22%2C%22library%22%3A%7B%22id%22%3A5017967%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Khalili%20and%20Smyth%22%2C%22parsedDate%22%3A%222024-09%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%202%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EKhalili%2C%20Boshra%2C%20and%20Andrew%20W.%20Smyth.%20%26%23x201C%3BSOD-YOLOv8%26%23x2014%3BEnhancing%20YOLOv8%20for%20Small%20Object%20Detection%20in%20Aerial%20Imagery%20and%20Traffic%20Scenes.%26%23x201D%3B%20%3Ci%3ESensors%3C%5C%2Fi%3E%2C%20vol.%2024%2C%20no.%2019%2C%20Sept.%202024%2C%20p.%206209%2C%20%3Ca%20class%3D%27zp-DOIURL%27%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.3390%5C%2Fs24196209%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.3390%5C%2Fs24196209%3C%5C%2Fa%3E.%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22SOD-YOLOv8%5Cu2014Enhancing%20YOLOv8%20for%20Small%20Object%20Detection%20in%20Aerial%20Imagery%20and%20Traffic%20Scenes%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Boshra%22%2C%22lastName%22%3A%22Khalili%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Andrew%20W.%22%2C%22lastName%22%3A%22Smyth%22%7D%5D%2C%22abstractNote%22%3A%22Object%20detection%2C%20as%20a%20crucial%20aspect%20of%20computer%20vision%2C%20plays%20a%20vital%20role%20in%20traffic%20management%2C%20emergency%20response%2C%20autonomous%20vehicles%2C%20and%20smart%20cities.%20Despite%20the%20significant%20advancements%20in%20object%20detection%2C%20detecting%20small%20objects%20in%20images%20captured%20by%20high-altitude%20cameras%20remains%20challenging%2C%20due%20to%20factors%20such%20as%20object%20size%2C%20distance%20from%20the%20camera%2C%20varied%20shapes%2C%20and%20cluttered%20backgrounds.%20To%20address%20these%20challenges%2C%20we%20propose%20small%20object%20detection%20YOLOv8%20%28SOD-YOLOv8%29%2C%20a%20novel%20model%20specifically%20designed%20for%20scenarios%20involving%20numerous%20small%20objects.%20Inspired%20by%20efficient%20generalized%20feature%20pyramid%20networks%20%28GFPNs%29%2C%20we%20enhance%20multi-path%20fusion%20within%20YOLOv8%20to%20integrate%20features%20across%20different%20levels%2C%20preserving%20details%20from%20shallower%20layers%20and%20improving%20small%20object%20detection%20accuracy.%20Additionally%2C%20we%20introduce%20a%20fourth%20detection%20layer%20to%20effectively%20utilize%20high-resolution%20spatial%20information.%20The%20efficient%20multi-scale%20attention%20module%20%28EMA%29%20in%20the%20C2f-EMA%20module%20further%20enhances%20feature%20extraction%20by%20redistributing%20weights%20and%20prioritizing%20relevant%20features.%20We%20introduce%20powerful-IoU%20%28PIoU%29%20as%20a%20replacement%20for%20CIoU%2C%20focusing%20on%20moderate%20quality%20anchor%20boxes%20and%20adding%20a%20penalty%20based%20on%20differences%20between%20predicted%20and%20ground%20truth%20bounding%20box%20corners.%20This%20approach%20simplifies%20calculations%2C%20speeds%20up%20convergence%2C%20and%20enhances%20detection%20accuracy.%20SOD-YOLOv8%20significantly%20improves%20small%20object%20detection%2C%20surpassing%20widely%20used%20models%20across%20various%20metrics%2C%20without%20substantially%20increasing%20the%20computational%20cost%20or%20latency%20compared%20to%20YOLOv8s.%20Specifically%2C%20it%20increased%20recall%20from%2040.1%25%20to%2043.9%25%2C%20precision%20from%2051.2%25%20to%2053.9%25%2C%20mAP0.5%20from%2040.6%25%20to%2045.1%25%2C%20and%20mAP0.5%3A0.95%20from%2024%25%20to%2026.6%25.%20Furthermore%2C%20experiments%20conducted%20in%20dynamic%20real-world%20traffic%20scenes%20illustrated%20SOD-YOLOv8%5Cu2019s%20significant%20enhancements%20across%20diverse%20environmental%20conditions%2C%20highlighting%20its%20reliability%20and%20effective%20object%20detection%20capabilities%20in%20challenging%20scenarios.%22%2C%22date%22%3A%222024-09%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.3390%5C%2Fs24196209%22%2C%22ISSN%22%3A%221424-8220%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fwww.mdpi.com%5C%2F1424-8220%5C%2F24%5C%2F19%5C%2F6209%22%2C%22collections%22%3A%5B%22BJBN7N3S%22%5D%2C%22dateModified%22%3A%222025-03-05T16%3A36%3A46Z%22%7D%7D%2C%7B%22key%22%3A%22P6QAP8CC%22%2C%22library%22%3A%7B%22id%22%3A5017967%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Chang%20et%20al.%22%2C%22parsedDate%22%3A%222024-09%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%202%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EChang%2C%20Che-Jui%2C%20et%20al.%20%26%23x201C%3BLearning%20from%20Synthetic%20Human%20Group%20Activities.%26%23x201D%3B%20%3Ci%3EProceedings%20of%20the%20IEEE%5C%2FCVF%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition%3C%5C%2Fi%3E%2C%20IEEE%2C%202024%2C%20pp.%2021922%26%23x2013%3B32%2C%20%3Ca%20class%3D%27zp-DOIURL%27%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1109%5C%2FCVPR52733.2024.02070%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1109%5C%2FCVPR52733.2024.02070%3C%5C%2Fa%3E.%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22Learning%20from%20synthetic%20human%20group%20activities%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Che-Jui%22%2C%22lastName%22%3A%22Chang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Danrui%22%2C%22lastName%22%3A%22Li%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Deep%22%2C%22lastName%22%3A%22Patel%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Parth%22%2C%22lastName%22%3A%22Goel%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Honglu%22%2C%22lastName%22%3A%22Zhou%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Seonghyeon%22%2C%22lastName%22%3A%22Moon%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Samuel%20S%22%2C%22lastName%22%3A%22Sohn%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Sejong%22%2C%22lastName%22%3A%22Yoon%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Vladimir%22%2C%22lastName%22%3A%22Pavlovic%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mubbasir%22%2C%22lastName%22%3A%22Kapadia%22%7D%5D%2C%22abstractNote%22%3A%22The%20study%20of%20complex%20human%20interactions%20and%20group%20activities%20has%20become%20a%20focal%20point%20in%20human-centric%20computer%20vision.%20However%2C%20progress%20in%20related%20tasks%20is%20often%20hindered%20by%20the%20challenges%20of%20obtaining%20large-scale%20labeled%20datasets%20from%20real-world%20scenarios.%20To%20address%20the%20limitation%2C%20we%20introduce%20M3%20Act%2C%20a%20synthetic%20data%20generator%20for%20multi-view%20multi-group%20multi-person%20human%20atomic%20actions%20and%20group%20activities.%20Powered%20by%20Unity%20Engine%2C%20M3%20Act%20features%20mul-tiple%20semantic%20groups%2C%20highly%20diverse%20and%20photorealistic%20images%2C%20and%20a%20comprehensive%20set%20of%20annotations%2C%20which%20facilitates%20the%20learning%20of%20human-centered%20tasks%20across%20single-person%2C%20multi-person%2C%20and%20multi-group%20conditions.%20We%20demonstrate%20the%20advantages%20of%20M3%20Act%20across%20three%20core%20experiments.%20The%20results%20suggest%20our%20synthetic%20dataset%20can%20significantly%20improve%20the%20performance%20of%20several%20downstream%20methods%20and%20replace%20real-world%20datasets%20to%20reduce%20cost.%20Notably%2C%20M3%20Act%20improves%20the%20state-of-the-art%20MOTRv2%20on%20DanceTrack%20dataset%2C%20leading%20to%20a%20hop%20on%20the%20leaderboard%20from%2010th%20to%202nd%20place.%20Moreover%2C%20M3%20Act%20opens%20new%20research%20for%20controllable%203D%20group%20activity%20generation.%20We%20define%20multiple%20metrics%20and%20propose%20a%20competitive%20baseline%20for%20the%20novel%20task.%20Our%20code%20and%20data%20are%20available%20at%20our%20project%20page%3A%20http%3A%5C%2F%5C%2Fcjerry1243.github.io%5C%2FM3Act.%22%2C%22date%22%3A%222024-09%22%2C%22proceedingsTitle%22%3A%22Proceedings%20of%20the%20IEEE%5C%2FCVF%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition%22%2C%22conferenceName%22%3A%222024%20IEEE%5C%2FCVF%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition%20%28CVPR%29%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.1109%5C%2FCVPR52733.2024.02070%22%2C%22ISBN%22%3A%22979-8-3503-5300-6%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fieeexplore.ieee.org%5C%2Fdocument%5C%2F10657474%5C%2F%22%2C%22collections%22%3A%5B%22BGCBXSQ3%22%5D%2C%22dateModified%22%3A%222025-02-28T13%3A52%3A19Z%22%7D%7D%2C%7B%22key%22%3A%225XT4SW5P%22%2C%22library%22%3A%7B%22id%22%3A5017967%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Turkcan%20et%20al.%22%2C%22parsedDate%22%3A%222024-04%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%202%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3ETurkcan%2C%20Mehmet%20Kerem%2C%20et%20al.%20%3Ci%3EConstellation%20Dataset%3A%20Benchmarking%20High-Altitude%20Object%20Detection%20for%20an%20Urban%20Intersection%3C%5C%2Fi%3E.%20Apr.%202024%2C%20%3Ca%20class%3D%27zp-DOIURL%27%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FarXiv.2404.16944%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FarXiv.2404.16944%3C%5C%2Fa%3E.%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Constellation%20dataset%3A%20benchmarking%20high-altitude%20object%20detection%20for%20an%20urban%20intersection%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mehmet%20Kerem%22%2C%22lastName%22%3A%22Turkcan%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Sanjeev%22%2C%22lastName%22%3A%22Narasimhan%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Chengbo%22%2C%22lastName%22%3A%22Zang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Gyung%20Hyun%22%2C%22lastName%22%3A%22Je%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Bo%22%2C%22lastName%22%3A%22Yu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mahshid%22%2C%22lastName%22%3A%22Ghasemi%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Javad%22%2C%22lastName%22%3A%22Ghaderi%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Gil%22%2C%22lastName%22%3A%22Zussman%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Zoran%22%2C%22lastName%22%3A%22Kostic%22%7D%5D%2C%22abstractNote%22%3A%22We%20introduce%20Constellation%2C%20a%20dataset%20of%2013K%20images%20suitable%20for%20research%20on%20detection%20of%20objects%20in%20dense%20urban%20streetscapes%20observed%20from%20high-elevation%20cameras%2C%20collected%20for%20a%20variety%20of%20temporal%20conditions.%20The%20dataset%20addresses%20the%20need%20for%20curated%20data%20to%20explore%20problems%20in%20small%20object%20detection%20exemplified%20by%20the%20limited%20pixel%20footprint%20of%20pedestrians%20observed%20tens%20of%20meters%20from%20above.%20It%20enables%20the%20testing%20of%20object%20detection%20models%20for%20variations%20in%20lighting%2C%20building%20shadows%2C%20weather%2C%20and%20scene%20dynamics.%20We%20evaluate%20contemporary%20object%20detection%20architectures%20on%20the%20dataset%2C%20observing%20that%20state-of-the-art%20methods%20have%20lower%20performance%20in%20detecting%20small%20pedestrians%20compared%20to%20vehicles%2C%20corresponding%20to%20a%2010%25%20difference%20in%20average%20precision%20%28AP%29.%20Using%20structurally%20similar%20datasets%20for%20pretraining%20the%20models%20results%20in%20an%20increase%20of%201.8%25%20mean%20AP%20%28mAP%29.%20We%20further%20find%20that%20incorporating%20domain-specific%20data%20augmentations%20helps%20improve%20model%20performance.%20Using%20pseudo-labeled%20data%2C%20obtained%20from%20inference%20outcomes%20of%20the%20best-performing%20models%2C%20improves%20the%20performance%20of%20the%20models.%20Finally%2C%20comparing%20the%20models%20trained%20using%20the%20data%20collected%20in%20two%20different%20time%20intervals%2C%20we%20find%20a%20performance%20drift%20in%20models%20due%20to%20the%20changes%20in%20intersection%20conditions%20over%20time.%20The%20best-performing%20model%20achieves%20a%20pedestrian%20AP%20of%2092.0%25%20with%2011.5%20ms%20inference%20time%20on%20NVIDIA%20A100%20GPUs%2C%20and%20an%20mAP%20of%2095.4%25.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222024-04%22%2C%22DOI%22%3A%2210.48550%5C%2FarXiv.2404.16944%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fpar.nsf.gov%5C%2Fbiblio%5C%2F10544530-constellation-dataset-benchmarking-high-altitude-object-detection-urban-intersection%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%22BGCBXSQ3%22%5D%2C%22dateModified%22%3A%222025-02-28T13%3A52%3A19Z%22%7D%7D%2C%7B%22key%22%3A%223GWGACAB%22%2C%22library%22%3A%7B%22id%22%3A5017967%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Mohammadi%20and%20Smyth%22%2C%22parsedDate%22%3A%222024-04%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%202%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EMohammadi%2C%20Sevin%2C%20and%20Andrew%20W.%20Smyth.%20%3Ci%3ENLP-Enabled%20Trajectory%20Map-Matching%20in%20Urban%20Road%20Networks%20Using%20Transformer%20Sequence-to-Sequence%20Model%3C%5C%2Fi%3E.%20Apr.%202024%2C%20%3Ca%20class%3D%27zp-DOIURL%27%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FarXiv.2404.12460%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FarXiv.2404.12460%3C%5C%2Fa%3E.%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22NLP-enabled%20trajectory%20map-matching%20in%20urban%20road%20networks%20using%20transformer%20sequence-to-sequence%20model%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Sevin%22%2C%22lastName%22%3A%22Mohammadi%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Andrew%20W%22%2C%22lastName%22%3A%22Smyth%22%7D%5D%2C%22abstractNote%22%3A%22Large-scale%20geolocation%20telematics%20data%20acquired%20from%20connected%20vehicles%20has%20the%20potential%20to%20significantly%20enhance%20mobility%20infrastructures%20and%20operational%20systems%20within%20smart%20cities.%20To%20effectively%20utilize%20this%20data%2C%20it%20is%20essential%20to%20accurately%20match%20the%20geolocation%20data%20to%20the%20road%20segments.%20However%2C%20this%20matching%20is%20often%20not%20trivial%20due%20to%20the%20low%20sampling%20rate%20and%20errors%20exacerbated%20by%20multipath%20effects%20in%20urban%20environments.%20Traditionally%2C%20statistical%20modeling%20techniques%20such%20as%20Hidden-Markov%20models%20incorporating%20domain%20knowledge%20into%20the%20matching%20process%20have%20been%20extensively%20used%20for%20map-matching%20tasks.%20However%2C%20rule-based%20map-matching%20tasks%20are%20noise-sensitive%20and%20inefficient%20in%20processing%20large-scale%20trajectory%20data.%20Deep%20learning%20techniques%20directly%20learn%20the%20relationship%20between%20observed%20data%20and%20road%20networks%20from%20the%20data%2C%20often%20without%20the%20need%20for%20hand-crafted%20rules%20or%20domain%20knowledge.%20This%20renders%20them%20an%20efficient%20approach%20for%20map-matching%20large-scale%20datasets%20and%20more%20robust%20to%20the%20noise.%20This%20paper%20introduces%20a%20sequence-to-sequence%20deep-learning%20model%2C%20specifically%20the%20transformer-based%20encoder-decoder%20model%2C%20to%20perform%20as%20a%20surrogate%20for%20map-matching%20algorithms.%20The%20encoder-decoder%20architecture%20initially%20encodes%20the%20series%20of%20noisy%20GPS%20points%20into%20a%20representation%20that%20automatically%20captures%20autoregressive%20behavior%20and%20spatial%20correlations%20between%20GPS%20points.%20Subsequently%2C%20the%20decoder%20associates%20data%20points%20with%20the%20road%20network%20features%20and%20thus%20transforms%20these%20representations%20into%20a%20sequence%20of%20road%20segments.%20The%20model%20is%20trained%20and%20evaluated%20using%20GPS%20traces%20collected%20in%20Manhattan%2C%20New%20York.%20Achieving%20an%20accuracy%20of%2076%25%2C%20transformer-based%20encoder-decoder%20models%20extensively%20employed%20in%20natural%20language%20processing%20presented%20a%20promising%20performance%20for%20translating%20noisy%20GPS%20data%20to%20the%20navigated%20routes%20in%20urban%20road%20networks.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22%22%2C%22archiveID%22%3A%22%22%2C%22date%22%3A%222024-04%22%2C%22DOI%22%3A%2210.48550%5C%2FarXiv.2404.12460%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2404.12460v1%22%2C%22language%22%3A%22en%22%2C%22collections%22%3A%5B%22BGCBXSQ3%22%5D%2C%22dateModified%22%3A%222025-02-28T13%3A52%3A19Z%22%7D%7D%2C%7B%22key%22%3A%22R4NYM3FP%22%2C%22library%22%3A%7B%22id%22%3A5017967%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A11473437%2C%22username%22%3A%22olmoore%22%2C%22name%22%3A%22%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Folmoore%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Chang%20et%20al.%22%2C%22parsedDate%22%3A%222024%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%202%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EChang%2C%20Che-Jui%2C%20et%20al.%20%3Ci%3ELearning%20from%20Synthetic%20Human%20Group%20Activities%3C%5C%2Fi%3E.%202024%2C%20pp.%2021922%26%23x2013%3B32%2C%20%3Ca%20class%3D%27zp-ItemURL%27%20href%3D%27https%3A%5C%2F%5C%2Fcjerry1243.github.io%5C%2FM3Act%5C%2F%27%3Ehttps%3A%5C%2F%5C%2Fcjerry1243.github.io%5C%2FM3Act%5C%2F%3C%5C%2Fa%3E.%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22Learning%20from%20Synthetic%20Human%20Group%20Activities%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Che-Jui%22%2C%22lastName%22%3A%22Chang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Danrui%22%2C%22lastName%22%3A%22Li%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Deep%22%2C%22lastName%22%3A%22Patel%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Parth%22%2C%22lastName%22%3A%22Goel%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Honglu%22%2C%22lastName%22%3A%22Zhou%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Seonghyeon%22%2C%22lastName%22%3A%22Moon%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Samuel%20S%22%2C%22lastName%22%3A%22Sohn%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Sejong%22%2C%22lastName%22%3A%22Yoon%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Vladimir%22%2C%22lastName%22%3A%22Pavlovic%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mubbasir%22%2C%22lastName%22%3A%22Kapadia%22%7D%5D%2C%22abstractNote%22%3A%22%22%2C%22date%22%3A%222024%22%2C%22proceedingsTitle%22%3A%22%22%2C%22conferenceName%22%3A%22Proceedings%20of%20the%20IEEE%5C%2FCVF%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition%22%2C%22language%22%3A%22%22%2C%22DOI%22%3A%22%22%2C%22ISBN%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fcjerry1243.github.io%5C%2FM3Act%5C%2F%22%2C%22collections%22%3A%5B%22MNTPRDZV%22%2C%2284R5R6XJ%22%5D%2C%22dateModified%22%3A%222024-11-20T19%3A44%3A46Z%22%7D%7D%2C%7B%22key%22%3A%226JBBTKC9%22%2C%22library%22%3A%7B%22id%22%3A5017967%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A11473437%2C%22username%22%3A%22olmoore%22%2C%22name%22%3A%22%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Folmoore%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Dave%20et%20al.%22%2C%22parsedDate%22%3A%222024%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%202%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EDave%2C%20Ishan%20Rajendrakumar%2C%20et%20al.%20%26%23x201C%3BFinepseudo%3A%20Improving%20Pseudo-Labelling%20through%20Temporal-Alignablity%20for%20Semi-Supervised%20Fine-Grained%20Action%20Recognition.%26%23x201D%3B%20%3Ci%3EArXiv%20Preprint%20ArXiv%3A2409.01448%3C%5C%2Fi%3E%2C%202024%2C%20%3Ca%20class%3D%27zp-ItemURL%27%20href%3D%27https%3A%5C%2F%5C%2Fdl.acm.org%5C%2Fdoi%5C%2F10.1007%5C%2F978-3-031-73242-3_22%27%3Ehttps%3A%5C%2F%5C%2Fdl.acm.org%5C%2Fdoi%5C%2F10.1007%5C%2F978-3-031-73242-3_22%3C%5C%2Fa%3E.%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22Finepseudo%3A%20Improving%20pseudo-labelling%20through%20temporal-alignablity%20for%20semi-supervised%20fine-grained%20action%20recognition%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ishan%20Rajendrakumar%22%2C%22lastName%22%3A%22Dave%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mamshad%20Nayeem%22%2C%22lastName%22%3A%22Rizve%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mubarak%22%2C%22lastName%22%3A%22Shah%22%7D%5D%2C%22abstractNote%22%3A%22%22%2C%22date%22%3A%222024%22%2C%22language%22%3A%22%22%2C%22DOI%22%3A%22%22%2C%22ISSN%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fdl.acm.org%5C%2Fdoi%5C%2F10.1007%5C%2F978-3-031-73242-3_22%22%2C%22collections%22%3A%5B%22MNTPRDZV%22%2C%2284R5R6XJ%22%5D%2C%22dateModified%22%3A%222024-11-20T19%3A44%3A35Z%22%7D%7D%2C%7B%22key%22%3A%22MI5WAHXT%22%2C%22library%22%3A%7B%22id%22%3A5017967%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A11473437%2C%22username%22%3A%22olmoore%22%2C%22name%22%3A%22%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Folmoore%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Mohammadi%20and%20Smyth%22%2C%22parsedDate%22%3A%222024%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%202%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EMohammadi%2C%20Sevin%2C%20and%20Andrew%20W.%20Smyth.%20%26%23x201C%3BNLP-Enabled%20Trajectory%20Map-Matching%20in%20Urban%20Road%20Networks%20Using%20Transformer%20Sequence-to-Sequence%20Model.%26%23x201D%3B%20%3Ci%3EArXiv%20Preprint%20ArXiv%3A2404.12460%3C%5C%2Fi%3E%2C%202024%2C%20%3Ca%20class%3D%27zp-ItemURL%27%20href%3D%27https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2404.12460v1%27%3Ehttps%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2404.12460v1%3C%5C%2Fa%3E.%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22NLP-enabled%20trajectory%20map-matching%20in%20urban%20road%20networks%20using%20transformer%20sequence-to-sequence%20model%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Sevin%22%2C%22lastName%22%3A%22Mohammadi%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Andrew%20W%22%2C%22lastName%22%3A%22Smyth%22%7D%5D%2C%22abstractNote%22%3A%22%22%2C%22date%22%3A%222024%22%2C%22language%22%3A%22%22%2C%22DOI%22%3A%22%22%2C%22ISSN%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2404.12460v1%22%2C%22collections%22%3A%5B%22MNTPRDZV%22%2C%2284R5R6XJ%22%5D%2C%22dateModified%22%3A%222024-11-20T19%3A42%3A54Z%22%7D%7D%2C%7B%22key%22%3A%22NEAJIJR7%22%2C%22library%22%3A%7B%22id%22%3A5017967%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A11473437%2C%22username%22%3A%22olmoore%22%2C%22name%22%3A%22%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Folmoore%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Khalili%20and%20Smyth%22%2C%22parsedDate%22%3A%222024%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%202%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EKhalili%2C%20Boshra%2C%20and%20Andrew%20W.%20Smyth.%20%26%23x201C%3BSOD-YOLOv8--Enhancing%20YOLOv8%20for%20Small%20Object%20Detection%20in%20Traffic%20Scenes.%26%23x201D%3B%20%3Ci%3EArXiv%20Preprint%20ArXiv%3A2408.04786%3C%5C%2Fi%3E%2C%202024%2C%20%3Ca%20class%3D%27zp-ItemURL%27%20href%3D%27https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2408.04786v1%27%3Ehttps%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2408.04786v1%3C%5C%2Fa%3E.%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22SOD-YOLOv8--Enhancing%20YOLOv8%20for%20Small%20Object%20Detection%20in%20Traffic%20Scenes%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Boshra%22%2C%22lastName%22%3A%22Khalili%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Andrew%20W%22%2C%22lastName%22%3A%22Smyth%22%7D%5D%2C%22abstractNote%22%3A%22%22%2C%22date%22%3A%222024%22%2C%22language%22%3A%22%22%2C%22DOI%22%3A%22%22%2C%22ISSN%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2408.04786v1%22%2C%22collections%22%3A%5B%22MNTPRDZV%22%2C%2284R5R6XJ%22%5D%2C%22dateModified%22%3A%222024-11-20T19%3A42%3A39Z%22%7D%7D%5D%7D

Dave, Ishan Rajendrakumar, et al. “FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition.” Computer Vision – Eccv 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part Viii, Springer-Verlag, 2024, pp. 389–408, https://doi.org/10.1007/978-3-031-73242-3_22.

Khalili, Boshra, and Andrew W. Smyth. “SOD-YOLOv8—Enhancing YOLOv8 for Small Object Detection in Aerial Imagery and Traffic Scenes.” Sensors, vol. 24, no. 19, Sept. 2024, p. 6209, https://doi.org/10.3390/s24196209.

Chang, Che-Jui, et al. “Learning from Synthetic Human Group Activities.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2024, pp. 21922–32, https://doi.org/10.1109/CVPR52733.2024.02070.

Turkcan, Mehmet Kerem, et al. Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection. Apr. 2024, https://doi.org/10.48550/arXiv.2404.16944.

Mohammadi, Sevin, and Andrew W. Smyth. NLP-Enabled Trajectory Map-Matching in Urban Road Networks Using Transformer Sequence-to-Sequence Model. Apr. 2024, https://doi.org/10.48550/arXiv.2404.12460.

Chang, Che-Jui, et al. Learning from Synthetic Human Group Activities. 2024, pp. 21922–32, https://cjerry1243.github.io/M3Act/.

Dave, Ishan Rajendrakumar, et al. “Finepseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition.” ArXiv Preprint ArXiv:2409.01448, 2024, https://dl.acm.org/doi/10.1007/978-3-031-73242-3_22.

Mohammadi, Sevin, and Andrew W. Smyth. “NLP-Enabled Trajectory Map-Matching in Urban Road Networks Using Transformer Sequence-to-Sequence Model.” ArXiv Preprint ArXiv:2404.12460, 2024, https://arxiv.org/abs/2404.12460v1.

Khalili, Boshra, and Andrew W. Smyth. “SOD-YOLOv8--Enhancing YOLOv8 for Small Object Detection in Traffic Scenes.” ArXiv Preprint ArXiv:2408.04786, 2024, https://arxiv.org/abs/2408.04786v1.

Researchers

Andrew W. Smyth

Center Director & Principal Investigator; Professor of Civil Engineering and Engineering Mechanics, Columbia University

Thrust 2: Situational Awareness

Project 1: Data and Systems Support for Scene and Activity Understanding

Project 2: Scene and Activity Understanding

Project 3: Trajectory Analysis and Prediction

Project 4: Multi-Modal Integration

Researchers

Andrew W. Smyth

Mubarak Shah

Carl Vondrick

Jorge Ortiz

Zoran Kostic

Mohamed Abdel-Aty

Sharon Di

Mubbasir Kapadia

Brian Smith

Mehmet Kerem Türkcan

Markus Schläpfer

Trainees

Basile Van Hoorick

Boshra Khalili

Che-Jui Chang

Chengbo Zang

Dai Quoc Tran

Devika Gumaste

Fazil Kagdi

Jay Himmatbhai Parmar

Joseph Fioresi

Pranav Kumar Kota

Rizwan Qureshi

Rodrigo Vena Garcia

Sevin Mohammadi

Soenghyeon Moon

Utkarsh Mall

William Ho

Yiran Hu

Yuyang Li

Zijin Wang

Alumni

Aishwarya Patange

Gyung Hyun Je

Hanliang Chen

Honghao Liu

Manchi Shreyas Rao

Sanjeev Narasimhan

Yuncheng Zhao