ENTITY Gelu

Gelu

PulseAugur coverage of Gelu — every cluster mentioning Gelu across labs, papers, and developer communities, ranked by signal.

Total · 30d

5

22 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

4

21 over 90d

TIER MIX · 90D

TOPICS

RELATIONSHIPS

SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/2 · 22 TOTAL

RESEARCH · CL_135430 · Jul 10 · 04:00

Tessera system unlocks heterogeneous GPUs for AI workloads

A new system called Tessera has been developed to improve the performance and cost-efficiency of running large AI models on heterogeneous GPU clusters. Unlike previous methods that operated at a coarse granularity, Tess…
RESEARCH · CL_139244 · Jul 9 · 21:15

New method trains transformers for enhanced legibility and editability

Researchers have developed a method to train more legible transformer models by incorporating a per-channel variance floor as a loss metric. This approach encourages the model to use crisp, contextual detectors rather t…
TOOL · CL_129269 · Jul 7 · 04:00

New analysis unifies gradient descent convergence for deep neural networks

Researchers have developed a unified convergence analysis for various gradient descent optimization methods used in training deep neural networks. This new analysis applies to a broad range of optimizers, including Adam…
RESEARCH · CL_128380 · Jul 4 · 02:34

New structural interpretation of GELU and other activation functions proposed

Researchers have proposed a new structural interpretation of activation functions like GELU, ReLU, SiLU/Swish, and hard swish. This work views GELU not just as a stochastic gate output, but through a Gaussian complement…
RESEARCH · CL_119621 · Jun 30 · 15:46

New NC-FFN architecture enhances transformer interpretability and efficiency

Researchers have developed a novel parameter-neutral replacement for transformer feed-forward networks, termed NC-FFN, which utilizes explicit fuzzy set operations. This new architecture demonstrates strong parameter ef…
RESEARCH · CL_107865 · Jun 22 · 21:04

DREG regularization method shows superior accuracy in deep learning

Researchers have introduced DREG, a layer-wise Jacobian regularization technique that functions as a general-purpose penalty for neural networks. In a large-scale empirical study, DREG demonstrated superior accuracy com…
RESEARCH · CL_100090 · Jun 19 · 04:00

New research probes Transformer energy use, learned linearity, and training dynamics

Recent research explores the intricacies of Transformer models, focusing on their energy consumption, internal linear properties, and training dynamics. One paper introduces a scaling model to predict energy usage durin…
TOOL · CL_93842 · Jun 16 · 04:00

New IGLU activation function offers improved gradient flow

Researchers have introduced IGLU, a novel parametric activation function for deep neural networks designed to improve gradient flow and optimization stability. Derived from a mixture of GELU gates under a half-normal di…
RESEARCH · CL_93236 · Jun 16 · 04:00

New neural network architectures tackle complex scientific computing problems · 8 sources tracked

Researchers are developing novel neural network architectures to solve complex partial differential equations (PDEs) and model dynamical systems. These include structure-oriented randomized neural networks (SO-RaNN) for…
RESEARCH · CL_90920 · Jun 12 · 08:43

Adam vs. SGD: No single factor explains performance gap, study finds

A new research paper explores the performance gap between the Adam and SGD optimization algorithms, finding that no single factor consistently explains the difference. The study indicates that the gap arises from comple…
TOOL · CL_86852 · Jun 12 · 04:00

Apple M4 Max GPU's Tensor Compute Path Emulated, Not Accelerated

Researchers have reverse-engineered the Metal 4.1 tensor compute path on Apple's M4 Max GPU, revealing that the fp8 matmul2d operation is emulated rather than hardware-accelerated. This means the operation runs on the G…
TOOL · CL_58915 · May 29 · 04:00

New algorithm offers robust learning for nonlinear AI models

Researchers have developed a novel algorithm for robustly learning Gaussian Single Index Models (SIMs) even when faced with heavy-tailed noise and adversarial corruption. This new method provides the first robust recove…
RESEARCH · CL_56422 · May 27 · 16:30

Paper analyzes floating-point neural network expressivity

Researchers have published a paper exploring the expressive power of neural networks operating with floating-point arithmetic, moving beyond theoretical models that assume exact real numbers. The study introduces a fram…
RESEARCH · CL_53504 · May 26 · 07:30

New MoA FFN Design Enhances LLM Expressivity and Scaling

Researchers have introduced a novel feedforward network (FFN) design called Mixture of Activations (MoA) for large language models (LLMs). MoA utilizes token-adaptive activation mixing, allowing different activation fun…
TOOL · CL_50240 · May 25 · 23:01

Activation functions enable neural networks to model complex, non-linear patterns

Neural networks rely on activation functions to introduce non-linearity, enabling them to model complex patterns beyond simple linear relationships. Without these functions, even deep networks would collapse into equiva…
TOOL · CL_45331 · May 22 · 23:10

Residual connections enable deeper LLM training by bypassing layers

This article explains residual connections, a key component in Transformer architectures essential for training deep neural networks like Large Language Models (LLMs). Residual connections help overcome the vanishing gr…
TOOL · CL_45000 · May 22 · 04:00

Neural network weight drift identified as a training dynamic issue

Researchers have identified a phenomenon called "weight drift" in neural networks, where optimization processes inadvertently push weights towards negative values. This drift, independent of the training data, occurs wi…
TOOL · CL_43959 · May 21 · 13:11

New method secures embedded neural networks against timing attacks

Researchers have developed a new methodology for implementing activation functions in embedded neural networks that prevents information leakage through timing side channels. This approach ensures consistent execution t…
TOOL · CL_41870 · May 20 · 07:29

Vision models ditch activations for polynomial alternatives

Researchers have developed new activation-free backbone architectures for vision models, utilizing polynomial functions instead of traditional pointwise nonlinearities like ReLU or GELU. These novel modules, integrated …
RESEARCH · CL_18833 · May 5 · 04:00

Neural networks achieve super-fast convergence and represent complex functions with floating-point arithmetic

Two new arXiv papers explore theoretical aspects of neural network convergence and representation capabilities. The first paper demonstrates that neural network classifiers can achieve super-fast convergence rates under…