实体 Vision Language Models

Vision Language Models

PulseAugur coverage of Vision Language Models — every cluster mentioning Vision Language Models across labs, papers, and developer communities, ranked by signal.

Show in brief

总计 · 30天

90 天内 30

发布 · 30天

90 天内 0

论文 · 30天

90 天内 29

层级分布 · 90 天

research 15
tool 14
commentary 1

情绪 · 30 天

6 天有情绪数据

最近 · 第 2/2 页 · 共 30 条

RESEARCH · CL_09729 · Apr 29 · 17:52

ProcFunc library streamlines 3D generation and data creation in Python

A new Python library called ProcFunc has been developed for procedural 3D generation within Blender. This library offers a collection of user-friendly functions designed to simplify the creation, combination, and execut…
RESEARCH · CL_09107 · Apr 29 · 13:19

Stateful Transformers boost streaming inference; Intel releases AutoRound quantization toolkit

A new paper introduces a stateful transformer inference engine that significantly speeds up processing for streaming data by maintaining a persistent KV cache. This approach allows for query latency that is independent …
RESEARCH · CL_09710 · Apr 29 · 00:00

Apple researchers develop Direct Steering Optimization to mitigate AI bias

Researchers have developed Direct Steering Optimization (DSO), a novel method to mitigate bias in generative models like vision-language models (VLMs) and large language models (LLMs). DSO employs reinforcement learning…
RESEARCH · CL_09839 · Apr 28 · 22:15

VLMs struggle to interpret UI animations, new dataset reveals

Researchers have developed AniMINT, a new dataset comprising 300 annotated videos of UI animations, to evaluate how well Vision-Language Models (VLMs) understand dynamic interfaces. Current VLMs can detect basic motion …
RESEARCH · CL_06682 · Apr 28 · 04:00

New methods offer efficient data valuation for LLMs and VLMs

Two new research papers propose novel methods for data valuation in large language models (LLMs). The first, "For-Value," introduces an efficient forward-only framework that estimates data value using a single forward p…
RESEARCH · CL_06562 · Apr 28 · 04:00

GA2-CLIP paper introduces generic attribute anchors for VLM prompt tuning

Researchers have developed GA2-CLIP, a novel framework designed to enhance the generalization capabilities of Vision-Language Models (VLMs) in video tasks. This plug-and-play method addresses the issue of semantic space…
RESEARCH · CL_06515 · Apr 28 · 04:00

VLMs over-correct math OCR, hiding student errors; new metric PINK improves evaluation

Researchers have identified a significant issue in evaluating handwritten math OCR systems, particularly with Vision-Language Models (VLMs). These models often over-correct student errors instead of accurately transcrib…
RESEARCH · CL_05210 · Apr 27 · 04:00

New research explores GNN interpretability and multi-graph reasoning

Researchers are exploring new methods to enhance the interpretability and utility of Graph Neural Networks (GNNs). One paper investigates the critical role of node features in graph pooling, proposing that effective poo…
RESEARCH · CL_06215 · Apr 27 · 03:23

SMoES improves MoE-VLM efficiency and effectiveness with soft modality guidance

Researchers have introduced SMoES, a novel approach for guiding expert routing in Mixture-of-Experts (MoE) vision-language models (VLMs). This method utilizes dynamic soft modality scores to account for layer-dependent …
RESEARCH · CL_01274 · May 24 · 00:00

Hugging Face 推出用于高效 LLM 的先进量化技术

研究人员正在开发先进的量化技术，以提高大型语言模型 (LLM) 的效率。AutoRound、LATMiX 和 GSQ 等新方法旨在减小模型大小和计算需求，从而能够在功能较弱的硬件上进行部署。这些方法侧重于优化模型权重和激活在较低比特宽度下的表示方式，其中一些方法已达到与更高精度模型相当的准确性。创新包括用于训练后量化的新颖校准策略和用于提高鲁棒性的可学习仿射变换。

ProcFunc library streamlines 3D generation and data creation in Python

Stateful Transformers boost streaming inference; Intel releases AutoRound quantization toolkit

Apple researchers develop Direct Steering Optimization to mitigate AI bias

VLMs struggle to interpret UI animations, new dataset reveals

New methods offer efficient data valuation for LLMs and VLMs

GA2-CLIP paper introduces generic attribute anchors for VLM prompt tuning

VLMs over-correct math OCR, hiding student errors; new metric PINK improves evaluation

New research explores GNN interpretability and multi-graph reasoning

SMoES improves MoE-VLM efficiency and effectiveness with soft modality guidance

Hugging Face 推出用于高效 LLM 的先进量化技术