Brief

last 24h

[4/4] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.LG English(EN) · 1mo

On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime

This paper investigates how preconditioned gradient descent (PGD) methods, like Gauss-Newton, influence spectral bias and the phenomenon of grokking in neural networks. Researchers propose that PGD can mitigate spectral bias, which typically causes networks to learn low frequencies first, potentially hindering the capture of fine-scale structures. The study suggests that PGD can also reduce delays associated with grokking, a delayed generalization effect hypothesized to occur during the transition from the Neural Tangent Kernel (NTK) to a feature-rich learning regime. Experimental results support the idea that grokking represents this transitional behavior, with PGD enabling more uniform exploration of the parameter space. AI

IMPACT Deepens understanding of neural network training dynamics, potentially leading to more efficient learning algorithms for complex tasks.
RESEARCH · arXiv cs.LG English(EN) · 1mo · [2 sources]

An adaptive wavelet-based PINN for problems with localized high-magnitude source

Researchers have developed an adaptive wavelet-based physics-informed neural network (AW-PINN) to address limitations in solving differential equations, particularly those with localized high-magnitude source terms. This new framework dynamically adjusts wavelet basis functions to manage extreme loss imbalances and avoid spectral bias inherent in standard neural networks. The AW-PINN method accelerates training by not relying on automatic differentiation and has demonstrated superior performance on various challenging partial differential equations compared to existing approaches. AI

IMPACT Introduces a novel neural network architecture for improved differential equation solving, potentially impacting scientific simulation and modeling.
RESEARCH · arXiv cs.CL English(EN) · 1mo · [4 sources]

SpectralLoRA: Is Low-Frequency Structure Sufficient for LoRA Adaptation? A Spectral Analysis of Weight Updates

Three new research papers explore methods to optimize LoRA fine-tuning for large language models. One paper proposes reducing the LoRA rank threshold to 1 for binary classification tasks, showing competitive performance with higher ranks. Another study introduces a Fisher-guided framework that uses data-aware sensitivity to select optimal LoRA subspaces, improving downstream performance. The third paper analyzes the spectral structure of LoRA weight updates, finding that low-frequency components dominate and suggesting spectral sparsity as a design principle for parameter-efficient fine-tuning. AI

IMPACT These studies offer potential methods to significantly reduce the computational cost and improve the efficiency of fine-tuning large language models.
- LoRA
- SGD
- BERT
- LLM
- MNLI
- RoBERTa
- GLUE
- CoLA
RESEARCH · arXiv stat.ML English(EN) · 1mo

Beyond ReLU: How Activations Affect Neural Kernels and Random Wide Networks

A new paper explores the theoretical underpinnings of neural network kernels, specifically focusing on activation functions beyond the standard ReLU. Researchers characterized the Reproducing Kernel Hilbert Spaces (RKHS) for various non-smooth activation functions, extending existing theory to functions like SELU, ELU, and LeakyReLU. The findings indicate that many common activations result in equivalent RKHS across different network depths, while polynomial activations show depth-dependent RKHS. The study also provides insights into the smoothness of Neural Network Gaussian Process (NNGP) sample paths in infinitely wide networks. AI

IMPACT Extends theoretical understanding of neural network behavior, potentially informing future model architectures and training strategies.
- ReLU
- ELU
- RKHS
- LeakyReLU

Brief

On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime

An adaptive wavelet-based PINN for problems with localized high-magnitude source

SpectralLoRA: Is Low-Frequency Structure Sufficient for LoRA Adaptation? A Spectral Analysis of Weight Updates

Beyond ReLU: How Activations Affect Neural Kernels and Random Wide Networks