Researchers have developed LightSTAR, a new framework for efficient visual document retrieval. This system addresses the computational cost of current methods, which often rely on intensive Multi-modal Large Language Models (MLLMs). LightSTAR employs an LLM-free visual selection stage to quickly narrow down relevant pages using content-grounded query encoding and LLM-free visual embeddings. A subsequent vision-adaptive semantic refinement stage then performs fine-grained matching on these selected candidates, combining textual and layout cues for improved accuracy. Experiments show LightSTAR significantly reduces latency while maintaining state-of-the-art retrieval performance. AI
IMPACT Offers a more efficient alternative to LLM-based methods for visual document retrieval, potentially speeding up research and information access.
RANK_REASON This is a research paper detailing a new framework for visual document retrieval. [lever_c_demoted from research: ic=1 ai=1.0]
- alphaXiv
- arXiv
- CatalyzeX Code Finder for Papers
- Connected Papers
- CORE Recommender
- DagsHub
- Gotit.pub
- Hugging Face
- Influence Flower
- LightSTAR
- Litmaps
- LLM-free visual embeddings
- LLM-free Visual Selection
- Multi-modal Large Language Models
- ScienceCast
- scite Smart Citations
- Vision-adaptive Semantic Refinement
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →