Brief · PulseAugur

TOOL · arXiv cs.IR (Information Retrieval) English(EN) · 1w

Layer-wise Token Compression for Efficient Document Reranking

Researchers have developed a new method called Layer-wise Token Compression (LTC) to improve the efficiency of transformer-based document reranking models used in information retrieval. Unlike previous token compression techniques that only applied to the initial embedding layer, LTC adapts token pooling at intermediate transformer layers. This approach has shown significant speedups, increasing inference queries per second by up to 25% for passage ranking and 116% for document ranking, while maintaining ranking quality. The method is also adaptable to long-context listwise reranking and may even act as a beneficial regularizer for long-document tasks. AI

IMPACT Enhances efficiency of information retrieval systems, potentially leading to faster search results and better handling of long documents.

MS MARCO
Shengyao Zhuang
Layer-wise Token Compression