ENTITY visual question answering

visual question answering

PulseAugur coverage of visual question answering — every cluster mentioning visual question answering across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

13 over 90d

Releases · 30d

0 over 90d

Papers · 30d

13 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

7 day(s) with sentiment data

RECENT · PAGE 1/1 · 13 TOTAL

TOOL · CL_106761 · Jun 20 · 09:49

ViRGo framework optimizes VLM performance with adaptive routing

Researchers have developed ViRGo, a novel framework designed to optimize the performance of Vision-Language Models (VLMs) by adaptively routing queries. ViRGo addresses the trade-off between resolution and context by es…
RESEARCH · CL_99963 · Jun 18 · 17:22

Quantum entropy estimation uses VQAs for small systems, CNNs for larger ones

Researchers have explored entropy estimation in multi-qutrit quantum systems using both variational quantum algorithms (VQAs) and classical convolutional neural networks (CNNs). For smaller systems (up to three qutrits)…
RESEARCH · CL_99696 · Jun 18 · 09:06

New AI framework improves cancer prognosis analysis using semantic anchors

Researchers have developed a new framework called Semantic-Anchored Evidential Fusion Survival (SAEFS) to improve the accuracy and reliability of whole-slide image analysis for cancer prognosis. SAEFS leverages Visual Q…
TOOL · CL_106618 · Jun 17 · 17:20

New protocol measures commonsense knowledge in VLA models

Researchers have developed Act2Answer, a new evaluation protocol designed to assess the commonsense and world knowledge retained by Vision-Language-Action (VLA) models after fine-tuning on robotics data. This protocol a…
TOOL · CL_93941 · Jun 16 · 04:00

New framework unifies segmentation and VQA for robotic surgery

Researchers have developed a novel framework that unifies pixel-level segmentation and visual question answering (VQA) for robotic surgery. This approach uses object tokens generated by a vision-language model (VLM) to …
RESEARCH · CL_93885 · Jun 16 · 04:00

Vision-language models lack agency and knowledge retention, new papers reveal

Two new research papers highlight limitations in current vision-language models (VLMs), particularly concerning their ability to retain knowledge after fine-tuning and their lack of "agency" in visual reasoning. The fir…
RESEARCH · CL_82085 · Jun 9 · 16:34

New framework models complex personalities in multimodal LLMs

Researchers have developed a new framework for conditioning and evaluating the personalities of multimodal large language models (MLLMs). Their experiments indicate that while personality induction can enhance image cap…
TOOL · CL_87109 · Jun 6 · 00:00

Robust-U1 framework enhances MLLMs' ability to recover corrupted visual content

Researchers have developed Robust-U1, a new framework designed to enhance the robustness of multimodal large language models (MLLMs) against visual corruptions. This framework enables MLLMs to self-recover corrupted vis…
RESEARCH · CL_65107 · May 30 · 00:00

New VQA benchmarks tackle memory, emotion, and interpretability

Researchers are developing new benchmarks and methods for advanced Visual Question Answering (VQA) tasks. One approach focuses on distilling answer-set programming rules from large language models to improve interpretab…
RESEARCH · CL_41927 · May 20 · 03:44

New VQA benchmarks and methods tackle knowledge, adaptation, and grounding

Researchers have introduced several new benchmarks and methods for Visual Question Answering (VQA) systems. HyLoVQA proposes a dynamic hypernetwork-generated low-rank adaptation technique for continual VQA, improving ad…
RESEARCH · CL_06542 · Apr 28 · 04:00

Researchers develop new methods for knowledge graph retrieval and completion

Researchers have developed new frameworks to enhance knowledge graph completion and visual question answering by integrating multimodal knowledge graphs with retrieval-augmented generation techniques. One approach, RADD…
RESEARCH · CL_06489 · Apr 28 · 04:00

HAC adapts CLIP to hyperbolic space for zero-shot VQA tasks

Researchers have introduced HAC, a novel framework that adapts pre-trained CLIP models to hyperbolic geometry for improved zero-shot Visual Question Answering (VQA). This parameter-efficient approach allows existing CLI…
RESEARCH · CL_06631 · Apr 28 · 01:57

New benchmarks SpecVQA and M3-VQA challenge multimodal LLMs in scientific and multi-hop reasoning

Researchers have introduced M$^3$-VQA, a new benchmark designed to evaluate multimodal large language models (MLLMs) on complex reasoning tasks involving multiple entities and multi-hop inference. The benchmark challeng…

ViRGo framework optimizes VLM performance with adaptive routing

Quantum entropy estimation uses VQAs for small systems, CNNs for larger ones

New AI framework improves cancer prognosis analysis using semantic anchors

New protocol measures commonsense knowledge in VLA models

New framework unifies segmentation and VQA for robotic surgery

Vision-language models lack agency and knowledge retention, new papers reveal

New framework models complex personalities in multimodal LLMs

Robust-U1 framework enhances MLLMs' ability to recover corrupted visual content

New VQA benchmarks tackle memory, emotion, and interpretability

New VQA benchmarks and methods tackle knowledge, adaptation, and grounding

Researchers develop new methods for knowledge graph retrieval and completion

HAC adapts CLIP to hyperbolic space for zero-shot VQA tasks

New benchmarks SpecVQA and M3-VQA challenge multimodal LLMs in scientific and multi-hop reasoning