H$^{2}$MT: Semantic Hierarchy-Aware Hierarchical Memory Transformer
Researchers have developed a new Transformer-based model called H$^{2}$MT designed to handle long text inputs more efficiently. This model constructs a semantic hierarchy of the input data offline, allowing it to route queries more effectively during inference. By pruning irrelevant information early, H$^{2}$MT aims to reduce computation and latency compared to existing methods like prompt compression and retrieval-augmented generation. AI
IMPACT This new model architecture could enable more efficient processing of long documents for LLMs, improving performance on tasks requiring extensive context.