COCO
PulseAugur coverage of COCO — every cluster mentioning COCO across labs, papers, and developer communities, ranked by signal.
2 天有情绪数据
-
VISTA system wins Ego4D challenge with object interaction anticipation
Researchers have developed VISTA, a novel system designed for anticipating human-object interactions in egocentric videos. VISTA integrates spatial object detection with temporal context from a frozen V-JEPA 2.1 model t…
-
Researchers unveil new stealthy backdoor attacks on AI models using diffusion and style features
Researchers have developed new methods for backdoor attacks on advanced AI models, specifically targeting Vision-Language Models (VLMs) and Diffusion Models (DMs). One approach, CBV, uses diffusion models to create natu…
-
FractalMamba++ scales vision models across resolutions using Hilbert curves
Researchers have introduced FractalMamba++, an enhanced vision backbone designed to improve the performance of Mamba-based models, particularly with high-resolution inputs. This new architecture leverages the geometric …
-
Colinearity Decay 训练 Vision Transformers 以实现更好的低比特量化
研究人员开发了一种名为 Colinearity Decay (CD) 的新训练技术,以使 Vision Transformers (ViTs) 更易于进行低比特量化。该方法充当结构正则化器,惩罚 Transformer 块内的对齐以减轻有害的激活离群值,同时不影响架构或任务损失。CD 旨在提高量化模型的准确性,同时保持或增强全精度性能,为 ViTs 的高效部署提供了一种方法,且没有推理时间开销。
-
New methods improve open-vocabulary object detection robustness and adaptation
Researchers have introduced several new methods to improve open-vocabulary object detection, a field that aims to identify arbitrary objects based on human prompts. One approach, EBOD, integrates a prompt-based detector…
-
Hyp2Former 使用双曲嵌入进行开放集全景分割
研究人员开发了 Hyp2Former,一个用于开放集全景分割的新颖框架,该框架利用双曲空间中的层次语义相似性。这种方法通过编码类别之间的关系,即使没有对未知对象类型进行显式训练,也能使模型更好地区分未知对象和已知类别。在 MS COCO 和 Cityscapes 等数据集上的实证结果表明,Hyp2Former 在识别未知对象方面优于现有方法,同时保持了对已知类别的鲁棒性。
-
Object detection models show mixed robustness to quantization and input degradations
A new study investigates how post-training quantization (PTQ) affects the robustness of YOLO object detection models when faced with real-world input degradations like noise and blur. Researchers evaluated various preci…
-
GPT-4o and other multimodal models evaluated on computer vision tasks
A new paper evaluates how well multimodal foundation models, including GPT-4o and Gemini 1.5 Pro, perform on standard computer vision tasks. Researchers developed a prompt-chaining method to translate vision tasks into …
-
Flow Matching research advances efficiency, control, and applications
Recent research explores advancements in Flow Matching, a generative modeling technique. Several papers introduce new methods to improve its efficiency, controllability, and applicability to diverse data types. Innovati…
-
New dataset aids computer vision identification of parasitoid wasps
Researchers have introduced the Descriptor: Parasitoid Wasps and Associated Hymenoptera Dataset (DAPWH), a new image collection aimed at improving automated identification of crucial insect groups. The dataset comprises…
-
New DBAC metric measures and identifies bias amplification in image captions
Researchers have introduced a new metric called Directional Bias Amplification in Captioning (DBAC) to measure and identify how image captioning models worsen biases present in their training data. Unlike previous metri…
-
研究人员提出使用模糊逻辑通过知识发现实现鲁棒图像识别
研究人员开发了一种新颖的方法,通过将领域知识集成到深度神经网络中来增强图像识别的鲁棒性。该方法引入了一个可微分知识单元(DKU),它使用模糊逻辑和蕴含规则来调制分类器的logits,以优化类概率。该系统能够从任务监督中自动发现隐式概念,从而在不需要显式概念标签的情况下学习类与这些概念之间的关系。在PASCAL-VOC、COCO和MedMNIST数据集上的评估表明,该方法在性能和领域泛化能力方面均有所提高。
-
Researchers find single hub text exploits vulnerabilities in CLIP cross-modal encoders
Researchers have identified a vulnerability in cross-modal encoders like CLIP, which map text and images into a shared embedding space. They discovered that a single "hub text" can generate high similarity scores with n…
-
ViCrop-Det improves small-object detection with adaptive spatial routing
Researchers have introduced ViCrop-Det, a novel framework designed to improve small-object detection in images without requiring additional training. This method utilizes Spatial Attention Entropy (SAE) derived from a m…
-
New metric T3S evaluates semantic similarity in low-level image processing
Researchers have introduced a new evaluation metric called Semantic Similarity Score (T3S) for low-level image processing tasks. This metric aims to assess whether the semantic content of an image is preserved after pro…
-
Diffusion models boost AI's vision for segmentation and anomaly detection
Researchers have developed DiCLIP, a new framework for weakly supervised semantic segmentation that enhances the capabilities of CLIP by integrating diffusion models. This approach addresses CLIP's limitations in dense …
-
New OVD method improves object detection with hierarchical consistency and unbiased objectness
Researchers have developed a new framework to improve open-vocabulary object detection (OVD), a technique that allows AI models to identify objects beyond their training data. The proposed method addresses inaccuracies …
-
New framework enhances federated cross-modal retrieval with missing modalities
Researchers have developed RCSR, a new framework designed to improve federated cross-modal retrieval, particularly when dealing with data heterogeneity and missing modalities across clients. The system utilizes a frozen…
-
HalalBench benchmark tackles OCR challenges for multilingual food packaging ingredient extraction
Researchers have introduced HalalBench, a new multilingual benchmark designed to evaluate Optical Character Recognition (OCR) performance specifically on food packaging ingredient labels. The benchmark addresses the uni…
-
BMD-45 dataset improves CCTV vehicle detection in developing cities
Researchers have introduced BMD-45, a new large-scale dataset designed to improve vehicle detection in urban traffic environments found in developing cities. This dataset contains over 45,000 images with 480,000 boundin…