ENTITY Llava

Llava

PulseAugur coverage of Llava — every cluster mentioning Llava across labs, papers, and developer communities, ranked by signal.

Total · 30d

29

29 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

24

24 over 90d

TIER MIX · 90D

research 11
tool 16
commentary 2

TOPICS

SENTIMENT · 30D

11 day(s) with sentiment data

RECENT · PAGE 1/2 · 29 TOTAL

TOOL · CL_110058 · Jun 25 · 04:00

New dataset GroundSet boosts LLM spatial understanding in remote sensing

Researchers have developed GroundSet, a new large-scale dataset designed to improve the spatial understanding capabilities of multimodal large language models in remote sensing. The dataset includes 3.8 million annotate…
RESEARCH · CL_107767 · Jun 23 · 12:01

New 'Latent Bridge' enhances real-time AI agents for gaming

Researchers have developed a novel 'Latent Bridge' technique to improve real-time AI agents for tasks like gaming. This method couples a slow, reasoning-capable VLM with a fast, reactive VLM by projecting the slow model…
TOOL · CL_102257 · Jun 21 · 01:58

RTX 6000 Pro Users Seek Best Open-Source Image Vision Models

A user on Reddit is seeking recommendations for the best open-source image vision models that can run on an RTX 6000 Pro graphics card. They are looking to perform OCR and classification on historical documents and have…
TOOL · CL_100234 · Jun 19 · 04:00

New framework uses LLMs for enhanced fashion image retrieval

Researchers have developed a new framework for fashion image retrieval that leverages multi-modal large language models (LLMs) and a two-stage fine-tuning strategy. This approach integrates models like LLaVA to generate…
TOOL · CL_97663 · Jun 17 · 04:45

New SPARE method slashes VLM visual tokens with minimal performance loss

Researchers have developed SPARE, a novel method for reducing the computational load of Vision Language Models (VLMs) by pruning visual tokens. Unlike previous diversity-maximizing strategies that ignore token magnitude…
TOOL · CL_93710 · Jun 16 · 04:00

HorusEye framework uses language as dynamic attention for emergency visual analysis

A new research paper introduces HorusEye, a framework designed for emergency visual analysis that treats language as dynamic attention. The study benchmarks various vision-language models (VLMs) like Gemini, Qwen2-VL, B…
RESEARCH · CL_93456 · Jun 16 · 04:00

New methods optimize LLM fine-tuning for efficiency and data quality · 2 sources tracked

Two research papers introduce novel methods for optimizing the supervised fine-tuning (SFT) of large language models (LLMs). The first, "Online Dynamic Batching" (ODB), addresses the challenge of variable sample process…
TOOL · CL_93358 · Jun 16 · 04:00

New CSAE Method Unlocks Hierarchical Visual Concepts in LLMs

Researchers have developed cascaded sparse autoencoders (CSAEs) to better interpret the visual representations within multimodal large language models (MLLMs). Unlike previous methods that produced flat feature dictiona…
TOOL · CL_84964 · Jun 11 · 04:00

New AI attack uses text-to-image models to impersonate faces

Researchers have developed a new adversarial attack framework called Adv-TGD, which uses text-guided diffusion models to create realistic faces that can impersonate specific individuals and fool facial recognition syste…
TOOL · CL_83293 · Jun 10 · 12:33

Developer seeks free vision API for AI image enhancement project

A developer is seeking a free vision API for a project that uses AI to enhance user-drawn images. The application exports a canvas drawing as a PNG, sends it with a text prompt to a vision model, and then uses the model…
TOOL · CL_77425 · Jun 8 · 04:00

AI assistant AIDEN aids visually impaired with haptic guidance

Researchers have developed AIDEN, an AI assistant designed to help visually impaired individuals with tasks like object identification, text reading, and navigation. Unlike audio-based assistants that can cause overload…
RESEARCH · CL_70477 · Jun 3 · 13:38

New adapter enables text integration in tabular foundation models

Researchers have developed a new method to integrate text data into tabular foundation models like TabPFN. The approach uses a lightweight "TabPFN Text Adapter" to map text embeddings directly into TabPFN's embedding sp…
SIGNIFICANT · CL_62104 · May 31 · 08:14

SenseTime's 8B model redefines open-source image generation

SenseTime has released SenseNova U1, an 8B parameter open-source model that redefines image generation capabilities by removing the VAE component. This new architecture, called NEO-unify, enables end-to-end modeling of …
RESEARCH · CL_53464 · May 26 · 12:31

UniNote model enhances industrial item-to-item retrieval with unified embedding

Researchers have developed UniNote, a unified embedding model designed to improve item-to-item retrieval in industrial applications. This model addresses challenges in balancing content representation with fine-grained …
RESEARCH · CL_48288 · May 22 · 07:24

New dataset and framework tackle abstract hazard detection

Researchers have introduced the CompliVision dataset, a novel resource for general hazard detection designed to overcome limitations in current systems. This dataset decouples hazard concepts from image examples by usin…
RESEARCH · CL_33607 · May 15 · 18:01

Vector RAG vs. LLM Wiki: Study reveals trade-offs in research synthesis

A new research paper compares Vector Retrieval-Augmented Generation (RAG) against an LLM-compiled wiki for answering questions over a small corpus of 24 research papers. While the wiki excelled at synthesizing informati…
TOOL · CL_32452 · May 15 · 01:31

Developer tool extracts code from videos using local AI

A developer has created a local tool called videocode that extracts runnable code from video tutorials. The tool utilizes scene detection, audio transcription via Whisper, and vision models like LLaVA and Llama3.2-visio…
TOOL · CL_27986 · May 11 · 16:05

LLVMs applied to SAR imagery for military target recognition

Researchers have developed a new benchmark and training methodology for applying large language-vision models (LLVMs) to automatic target recognition (ATR) using synthetic aperture radar (SAR) imagery. The study leverag…
TOOL · CL_27987 · May 11 · 16:00

New MPerS method uses MLLMs for remote sensing scene segmentation

Researchers have developed MPerS, a novel approach for remote sensing scene segmentation that leverages multimodal large language models (MLLMs). This method generates high-quality captions for remote sensing images usi…
TOOL · CL_15790 · May 5 · 04:00

BareBones benchmark reveals Vision-Language Models suffer texture bias cliff

Researchers have introduced BareBones, a new benchmark designed to test the geometric comprehension abilities of Vision-Language Models (VLMs). The benchmark uses pixel-level silhouettes to evaluate if VLMs can understa…