PulseAugur
EN
LIVE 08:56:07
ENTITY WikiText-103

WikiText-103

PulseAugur coverage of WikiText-103 — every cluster mentioning WikiText-103 across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
15
15 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
15
15 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
SENTIMENT · 30D

6 day(s) with sentiment data

RECENT · PAGE 1/1 · 15 TOTAL
  1. TOOL · CL_111778 ·

    New tPC-RTRL method learns long-range dependencies in recurrent systems

    Researchers have developed a novel method called Temporal Predictive Coding combined with Real-Time Recurrent Learning (tPC-RTRL) to enhance the learning capabilities of recurrent neural networks. This approach addresse…

  2. TOOL · CL_98119 ·

    Gaussian Mixture Attention offers linear-time sequence mixing

    Researchers have introduced Gaussian Mixture Attention (GMA), a novel sequence mixing technique designed to overcome the quadratic scaling bottleneck of standard Transformer attention. GMA replaces explicit token-to-tok…

  3. TOOL · CL_93842 ·

    New IGLU activation function offers improved gradient flow

    Researchers have introduced IGLU, a novel parametric activation function for deep neural networks designed to improve gradient flow and optimization stability. Derived from a mixture of GELU gates under a half-normal di…

  4. TOOL · CL_93350 ·

    New Hybrid Architecture Boosts Long-Context Language Model Efficiency

    Researchers have introduced a Parallel Hybrid Architecture (PHA) that combines Gated State Spaces (GSS), Grouped Query Attention (GQA), and Feed-Forward Networks (FFNs) to improve long-context language modeling. This ar…

  5. RESEARCH · CL_90780 ·

    New RAG and Long-Context Models Leverage Knowledge Graphs

    Two new research papers introduce advanced methods for improving retrieval-augmented generation (RAG) and long-context language modeling. The first paper, "A Unified Framework for Context-Aware and Relation-Aware Graph …

  6. TOOL · CL_87135 ·

    LongSpike: New SNN Framework Enhances Long Sequence Learning

    Researchers have introduced LongSpike, a new Spiking Neural Network (SNN) framework that utilizes fractional-order State-Space Modeling (f-SSM) to enhance the learning of long sequences. This approach overcomes the limi…

  7. RESEARCH · CL_79133 ·

    Chiaroscuro Attention optimizes transformer compute with dynamic token routing

    Researchers have developed CHIAR-Former, a novel 4-layer transformer model that optimizes compute usage by dynamically routing tokens. Instead of applying self-attention uniformly, CHIAR-Former analyzes token spectral e…

  8. RESEARCH · CL_53609 ·

    Kan Extension Transformers unify attention, diffusion, and self-conditioning

    Researchers have introduced Kan Extension Transformers (KETs), a new framework that unifies various Transformer implementations under a categorical lens. KETs view Transformer layers as weighted structured extension ope…

  9. TOOL · CL_51237 ·

    Successor representations reveal emergent word class structures in language models

    Researchers have applied successor representations (SRs), a principle from reinforcement learning, to natural language processing. By training a neural network on WikiText-103 to predict future word distributions across…

  10. RESEARCH · CL_21794 ·

    New parameter E predicts Mixture-of-Experts model health, preventing dead experts.

    Researchers have introduced a new dimensionless control parameter, E = T*H/(O+B), to predict the health of expert ecologies in Mixture-of-Experts (MoE) models. This parameter, derived from four hyperparameters, can prev…

  11. TOOL · CL_18622 ·

    New framework uses masked language models for efficient wireless token communication

    Researchers have developed a novel context-aware wireless token communication framework that utilizes a masked language model (MLM) to improve transmission efficiency. This system enables robust token inference over noi…

  12. RESEARCH · CL_20402 ·

    Jordan-RoPE: Non-Semisimple Relative Positional Encoding via Complex Jordan Blocks

    Researchers have introduced Jordan-RoPE, a novel relative positional encoding method for transformer models that utilizes complex Jordan blocks. This approach generates oscillatory-polynomial features, enabling a distan…

  13. RESEARCH · CL_15913 ·

    Researchers explore weight decay, in-context learning, and acceleration for Transformer models

    Researchers have developed several new methods to improve the efficiency and theoretical understanding of Transformer models. One paper provides a functional-analytic characterization of weight decay, demonstrating its …

  14. RESEARCH · CL_08625 ·

    Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space

    Researchers have introduced a novel complex-valued sequence model called Phase-Associative Memory (PAM) that utilizes a Hilbert space formalism to better capture the indeterminate nature of semantic expression meaning. …

  15. RESEARCH · CL_06744 ·

    AutoCompress method isolates critical transformer layers for efficient compression

    Researchers have developed AutoCompress, a novel method for compressing transformer models by isolating and preserving the critical first layer (Layer 0). This approach, termed Critical Layer Isolation (CLI), showed tha…