New Layer-wise Token Compression boosts document reranking speed

By PulseAugur Editorial · [1 sources] · 2026-05-20 03:52

Researchers have developed a new method called Layer-wise Token Compression (LTC) to improve the efficiency of transformer-based document reranking models used in information retrieval. Unlike previous token compression techniques that only applied to the initial embedding layer, LTC adapts token pooling at intermediate transformer layers. This approach has shown significant speedups, increasing inference queries per second by up to 25% for passage ranking and 116% for document ranking, while maintaining ranking quality. The method is also adaptable to long-context listwise reranking and may even act as a beneficial regularizer for long-document tasks. AI

IMPACT Enhances efficiency of information retrieval systems, potentially leading to faster search results and better handling of long documents.

RANK_REASON The cluster contains an academic paper detailing a new method for improving AI model efficiency. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.IR (Information Retrieval) →

paper
infra

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New Layer-wise Token Compression boosts document reranking speed

COVERAGE [1]

arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Ivano Lauriola · 2026-05-20 03:52

Layer-wise Token Compression for Efficient Document Reranking

Transformer-based document cross-encoder rerankers are a central component of modern information retrieval systems. Despite their success, these models suffer from high computational costs due to processing long query-document sequences at inference time. A known approach to impr…

COVERAGE [1]

Layer-wise Token Compression for Efficient Document Reranking

RELATED TOPICS