Brief

last 24h

[12/12] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv stat.ML English(EN) · 6d

Inducing Spatial Locality in Vision Transformers through the Training Protocol

Researchers have found that specific training techniques can encourage spatial locality in Vision Transformers. By using a 'Modern' protocol involving data augmentation like CutMix and ColorJitter, along with label smoothing, early layers of ViTs showed more concentrated attention patterns. An ablation study revealed that CutMix was the primary driver of this effect, significantly reducing the Mean Attention Distance compared to baseline methods. AI

IMPACT Training protocols like CutMix can improve the efficiency and interpretability of Vision Transformers by promoting localized attention.
RESEARCH · arXiv stat.ML English(EN) · 1w · [2 sources]

Federated Martingale Posterior Samping

Researchers have introduced Federated Martingale Posterior (FMP) sampling, a novel protocol for federated Bayesian neural networks. This method addresses the difficulty of specifying priors in large models by using a predictive distribution and refitting. FMP sampling allows clients to upload data embeddings, enabling the server to run the predictive sampler centrally, thus avoiding the need to share local datasets. Experiments on standard datasets demonstrate that FMP closely matches centralized performance and offers improved calibration compared to existing consensus methods. AI

IMPACT Introduces a more efficient and calibrated approach for training Bayesian neural networks in federated settings, potentially improving privacy and accuracy.
TOOL · arXiv cs.AI English(EN) · 3d

FAIR-Pruner: A Flexible Framework for Automatic Layer-Wise Pruning via Tolerance of Difference

Researchers have developed FAIR-Pruner, a new framework designed for automatic, layer-wise structured pruning of deep neural networks. This method adaptively allocates sparsity across network layers by using both removal-oriented and protection-oriented signals. Experiments across various datasets and model architectures, including vision models and a Qwen1.5-MoE model, demonstrate that FAIR-Pruner achieves strong accuracy-compression trade-offs. The framework is available as an open-source package. AI

IMPACT Enables more efficient deployment of large neural networks by improving compression techniques.
- ImageNet
- CIFAR-10
- CIFAR-100
- DenseNet
- ResNet
- SVHN
- ConvNeXt
- FAIR-Pruner
- Qwen1.5-MoE-A2.7B-Chat
- Chengyao Yu
TOOL · arXiv cs.LG English(EN) · 3d

AutoMCU: Feasibility-First MCU Neural Network Customization via LLM-based Multi-Agent Systems

Researchers have developed AutoMCU, a novel system that leverages LLM-based multi-agent approaches to customize neural networks for microcontroller units (MCUs). This method prioritizes feasibility by integrating vendor toolchain feedback early in the design process, significantly reducing the search cost and time compared to traditional hardware-aware neural architecture search methods. AutoMCU has demonstrated competitive accuracy on benchmark datasets and successful deployment on STM32 microcontrollers, making edge intelligence more accessible. AI

IMPACT Automates neural network deployment on resource-constrained MCUs, enabling more edge AI applications.
TOOL · arXiv cs.LG English(EN) · 3d

How Sparsity Allocation Shapes Label-Free Post-Pruning Recoverability

A new research paper investigates how the allocation of sparsity in neural networks impacts their ability to recover accuracy after pruning, especially when labeled retraining data is unavailable. The study compares different sparsity allocation methods like ERK and LAMP across various datasets and architectures, finding that the choice of allocation significantly affects post-repair accuracy. Researchers identified a critical transition regime where standard repair methods begin to fail, highlighting the need to jointly consider pruning allocation and repair strategies. AI

IMPACT Investigates methods to maintain neural network performance after aggressive pruning, crucial for efficient deployment in resource-constrained environments.
- CIFAR-10
- CIFAR-100
- ResNet-50
- Imagenette
- ResNet-34
- ResNet-18
- ImageNet-100
- DenseNet-121
TOOL · arXiv cs.LG English(EN) · 3d

Attacking the Spike: On the Transferability and Security of Spiking Neural Networks to Adversarial Examples

Researchers have developed a new adversarial attack method called Mixed Dynamic Spiking Estimation (MDSE) specifically for Spiking Neural Networks (SNNs). This attack demonstrates that the effectiveness of white-box adversarial attacks on SNNs is heavily influenced by the choice of surrogate gradient estimator. The MDSE attack is designed to exploit multiple surrogate gradient estimators simultaneously, enabling it to generate adversarial examples that can fool both SNNs and traditional non-SNN models like Vision Transformers and CNNs. AI

IMPACT Introduces a novel attack that can fool both SNNs and traditional neural networks, highlighting security vulnerabilities in energy-efficient AI models.
TOOL · arXiv cs.CV English(EN) · 5d

Early High-Frequency Injection for Geometry-Sensitive OOD Detection

Researchers have developed a new method called Early High-Frequency Injection (EIHF) to improve out-of-distribution (OOD) detection in computer vision models. EIHF works by injecting high-frequency information into the input data before it's processed by the first convolution layer, without altering the training objective. This approach enhances the model's ability to distinguish between in-distribution and out-of-distribution data, particularly for geometry-sensitive tasks, by reshaping feature geometry and reducing overlap in scores. Experiments on CIFAR-100 and ImageNet-100 datasets showed promising results, including improved false positive rates and area under the receiver operating characteristic curve. AI

IMPACT Improves the robustness of computer vision models to unseen data, potentially enhancing reliability in real-world applications.
- CIFAR-100
- ImageNet-100
- Places
- EIHF
TOOL · arXiv cs.AI English(EN) · 6d

Beyond Isotropy in JEPAs: Hamiltonian Geometry and Symplectic Prediction

Researchers have introduced HamJEPA, a novel approach to Joint Embedding Predictive Architectures (JEPAs) that moves beyond isotropic regularization. This new method encodes views as phase-space states and uses a learned Hamiltonian leapfrog map for cross-view prediction. Experiments on CIFAR-100 and ImageNet-100 show significant improvements in kNN and linear probe accuracy compared to existing methods like SIGReg. AI

IMPACT Introduces a new method for representation learning that improves performance on downstream tasks.
RESEARCH · arXiv stat.ML English(EN) · 1w · [2 sources]

A Fourier perspective on the learning dynamics of neural networks: from sample complexities to mechanistic insights

Researchers have explored the learning dynamics of neural networks through a Fourier perspective, focusing on how they learn simpler features before more complex ones. Their work introduces a synthetic data model for translation-invariant inputs, demonstrating that while phase information alone is difficult for SGD to learn, power-law spectra can significantly accelerate this process. This approach provides mechanistic insights into the efficient learning of natural image distributions by deep neural networks. AI

IMPACT Provides mechanistic insights into how neural networks learn complex image distributions, potentially informing future model architectures and training strategies.
TOOL · arXiv cs.NE (Neural & Evolutionary) English(EN) · 4d

SpikingMoE: SDPrompt-Guided Dynamic Expert Fusion in Spiking Neural Networks

Researchers have introduced SpikingMoE, a novel framework that combines Spiking Neural Networks (SNNs) with a Mixture-of-Experts (MoE) architecture. This approach utilizes a spike-driven prompt (SDprompt) for biologically plausible, input-dependent routing of information to different expert modules. Designed for neuromorphic hardware, SpikingMoE aims to enhance energy efficiency in visual recognition tasks while maintaining competitive performance, achieving high accuracy on CIFAR-10 and CIFAR-100 datasets. AI

IMPACT Introduces a new architecture for energy-efficient visual recognition on neuromorphic hardware, potentially impacting specialized AI applications.
RESEARCH · Hugging Face Daily Papers English(EN) · 1w · [2 sources]

Decoupled Conformal Optimisation: Efficient Prediction Sets via Independent Tuning and Calibration

Two new research papers introduce novel approaches to conformal prediction, a method for quantifying uncertainty in machine learning models. The first paper, "Decoupled Conformal Optimisation," proposes a train-tune-calibrate framework that uses independent data splits for structural selection and final calibration, leading to smaller prediction sets and interval widths on various benchmarks. The second paper, "Decomposition-Based Modular Conformal Prediction," extends conformal prediction to two-stage modeling, allowing for the attribution of uncertainty to specific pipeline stages and offering diagnostic advantages over standard methods. AI

IMPACT These new conformal prediction techniques offer improved uncertainty quantification and diagnostic capabilities for machine learning models.
TOOL · arXiv cs.NE (Neural & Evolutionary) English(EN) · 1w

Towards Code-Oriented LM Embeddings for Surrogate-Assisted Neural Architecture Search

Researchers have developed a novel method called Code-Oriented LM Embeddings (COLE) to improve Neural Architecture Search (NAS). This technique uses off-the-shelf language models to generate embeddings from code representations of neural architectures, bypassing the need for expensive fine-tuning or complex feature engineering. Experiments on NAS-Bench-201 and einspace demonstrated that COLE embeddings outperform other text-based encodings and significantly reduce the evaluation budget required to find high-performing architectures. AI

IMPACT Introduces a more efficient method for designing neural networks, potentially accelerating AI model development.