MamBOA: State-Space Architecture for Video Recognition
Researchers have introduced MamBOA, a novel state-space architecture designed for video recognition tasks. This framework is backbone-agnostic, meaning it can integrate with existing CNN, Transformer, and Mamba architectures. MamBOA enhances temporal reasoning by treating selective state-space recurrence as a motion synthesizer, achieving high accuracy on benchmarks like Diving48 with minimal additional computational cost. AI