PulseAugur
EN
LIVE 07:18:15
ENTITY WikiText-2

WikiText-2

PulseAugur coverage of WikiText-2 — every cluster mentioning WikiText-2 across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
10
10 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
9
9 over 90d
TIER MIX · 90D
TOPICS
SENTIMENT · 30D

3 day(s) with sentiment data

RECENT · PAGE 1/1 · 10 TOTAL
  1. TOOL · CL_93123 ·

    CONCORD framework enhances device-cloud RAG with asynchronous sparse aggregation

    Researchers have introduced CONCORD, a new framework designed to optimize retrieval-augmented generation (RAG) in a device-cloud collaborative setting where private documents are kept on local devices and public knowled…

  2. RESEARCH · CL_93580 ·

    New LiFT Framework Uses Linear Programming to Control Transformer Overfitting

    Researchers have introduced LiFT, a novel framework for fine-tuning transformer models that utilizes linear programming to control overfitting. This method formulates fine-tuning as a bilevel optimization problem, joint…

  3. RESEARCH · CL_79133 ·

    Chiaroscuro Attention optimizes transformer compute with dynamic token routing

    Researchers have developed CHIAR-Former, a novel 4-layer transformer model that optimizes compute usage by dynamically routing tokens. Instead of applying self-attention uniformly, CHIAR-Former analyzes token spectral e…

  4. RESEARCH · CL_53609 ·

    Kan Extension Transformers unify attention, diffusion, and self-conditioning

    Researchers have introduced Kan Extension Transformers (KETs), a new framework that unifies various Transformer implementations under a categorical lens. KETs view Transformer layers as weighted structured extension ope…

  5. TOOL · CL_39127 ·

    Llama 3.1 8B benchmark reveals memory bandwidth bottleneck on Apple M4

    A benchmark of Llama 3.1 8B on an Apple M4 Mac Mini with 16GB unified memory revealed that the Q8_0 quantization, despite fitting entirely in memory, suffers from slow token generation due to memory bandwidth limitation…

  6. RESEARCH · CL_36932 ·

    New ScaleSearch method boosts generative model efficiency via optimized quantization

    Researchers have developed a new method called ScaleSearch to improve the efficiency of generative models through quantization. This technique optimizes the selection of scale factors in Block Floating Point (BFP) forma…

  7. TOOL · CL_28353 ·

    New BCJR-QAT method pushes LLM quantization to 2 bits per weight

    Researchers have developed BCJR-QAT, a novel method for quantizing large language models to 2 bits per weight, a significant advancement beyond current post-training quantization techniques. This new approach uses a dif…

  8. RESEARCH · CL_21794 ·

    New parameter E predicts Mixture-of-Experts model health, preventing dead experts.

    Researchers have introduced a new dimensionless control parameter, E = T*H/(O+B), to predict the health of expert ecologies in Mixture-of-Experts (MoE) models. This parameter, derived from four hyperparameters, can prev…

  9. TOOL · CL_20375 ·

    New MetaAdamW optimizer uses self-attention for adaptive learning rates

    Researchers have developed MetaAdamW, a novel optimizer that enhances adaptive learning rates and weight decay by employing a self-attention mechanism. This Transformer-based approach dynamically adjusts hyperparameters…

  10. RESEARCH · CL_10083 ·

    Associative-State Universal Transformers improve parameter efficiency with sparse retrieval

    Researchers have developed UniMatrix, a novel Universal Transformer architecture that integrates structured recurrence with sparse retrieval mechanisms. While initial versions showed parameter efficiency and competitive…