New COAL framework enhances multi-object tracking with VLM and LLM

By PulseAugur Editorial · [1 sources] · 2026-05-14 13:06

Researchers have developed a new framework called COAL to improve referring multi-object tracking, particularly in complex scenarios with similar objects. COAL addresses the challenge of sparse semantic supervision by using a vision-language model (VLM) to inject explicit semantics and an LLM for counterfactual learning to enforce attribute verification. This approach enhances instance discriminability and prevents models from relying on insufficient cues, leading to more robust compositional recognition. COAL achieved a 7.28% improvement in HOTA on the Refer-KITTI-V2 benchmark, surpassing existing state-of-the-art methods. AI

IMPACT Improves object tracking accuracy in complex visual scenes, potentially benefiting applications in autonomous driving and robotics.

RANK_REASON The cluster contains a new academic paper detailing a novel framework and its experimental validation on benchmarks. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New COAL framework enhances multi-object tracking with VLM and LLM

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Xiaobo Lu · 2026-05-14 13:06

COAL: Counterfactual and Observation-Enhanced Alignment Learning for Discriminative Referring Multi-Object Tracking

Referring Multi-Object Tracking (RMOT) faces a fundamental structural contradiction between the high-discriminability demand and the sparse semantic supervision. This mismatch is particularly acute in highly homogeneous scenarios that require fine-grained discrimination over comp…

COVERAGE [1]

COAL: Counterfactual and Observation-Enhanced Alignment Learning for Discriminative Referring Multi-Object Tracking

RELATED TOPICS