Retrieval-as-a-Service:A System-Oriented Analysis of Industrial Retrieval Pipelines in Web Systems
A new paper analyzes the architecture and deployment of industrial retrieval pipelines, focusing on their implementation as a Retrieval-as-a-Service (RaaS) layer. It highlights how production constraints like latency, scalability, and resource limitations influence system design. The paper proposes a unified RaaS pipeline abstraction and examines the integration and impact of Large Language Model (LLM)-based retrieval mechanisms on performance and overhead. AI