Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 8h

On Aligning Hierarchical Standardized Embedding for Audio-visual Generalized Zero-shot Learning

Researchers have introduced a new method called Aligning Hierarchical Standardized Embedding (AHSE) to improve audio-visual generalized zero-shot learning. AHSE addresses the limitations of existing methods by standardizing and hierarchically aligning audio-visual and textual embeddings. This approach aims to reduce distributional mismatches and preserve semantic and class relationships within a shared embedding space. Experiments on benchmark datasets show AHSE achieves competitive performance in zero-shot learning tasks. AI

IMPACT This research could lead to more robust and accurate classification systems that integrate multiple data modalities.

AHSE
Audio-visual Generalized Zero-shot Learning
VGGSound-GZSL
ActivityNet-GZSL
UCF-GZSL
Aligning Hierarchical Standardized Embedding