Brief

last 24h

[8/8] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI English(EN) · 4d

FAIR-Pruner: A Flexible Framework for Automatic Layer-Wise Pruning via Tolerance of Difference

Researchers have developed FAIR-Pruner, a new framework designed for automatic, layer-wise structured pruning of deep neural networks. This method adaptively allocates sparsity across network layers by using both removal-oriented and protection-oriented signals. Experiments across various datasets and model architectures, including vision models and a Qwen1.5-MoE model, demonstrate that FAIR-Pruner achieves strong accuracy-compression trade-offs. The framework is available as an open-source package. AI

IMPACT Enables more efficient deployment of large neural networks by improving compression techniques.
- ImageNet
- CIFAR-10
- CIFAR-100
- DenseNet
- ResNet
- SVHN
- ConvNeXt
- FAIR-Pruner
- Qwen1.5-MoE-A2.7B-Chat
- Chengyao Yu
TOOL · arXiv cs.AI English(EN) · 4d

A Comprehensive Comparison of Deep Learning Architectures for COVID-19 Classification on CT & X-ray Imagery

Researchers have conducted a comprehensive comparison of various deep learning architectures for classifying COVID-19 from CT and X-ray lung imagery. The study utilized pre-trained models including VGG, Densenet, Resnet, MobileNet, Xception, EfficientNet, and NasNet. Results indicated that Resnet and VGG architectures achieved high accuracy, between 95% and 98%, in differentiating COVID-19 positive cases from healthy lungs, outperforming previous literature findings. AI

IMPACT Demonstrates high accuracy of deep learning models in medical image analysis, potentially improving diagnostic speed and accuracy for infectious diseases.
- COVID-19
- CT
- Densenet
- Resnet
- EfficientNet
- MobileNet
- Xception
- X-ray
- NasNet
TOOL · arXiv cs.LG English(EN) · 4d

Bug or Feature$^2$: Weight Drift, Activation Sparsity and Spikes

Researchers have identified a phenomenon called "weight drift" in neural networks, where optimization processes inadvertently push weights towards negative values. This drift, independent of the training data, occurs with standard loss functions and common activation functions like ReLU and GELU. The study demonstrates that this drift can lead to significant activation sparsity, potentially impacting model accuracy, and can also amplify activation spikes in transformer layers. AI

IMPACT Identifies a fundamental training dynamic that could impact model performance and efficiency across various architectures.
- ReLU
- ViT
- GELU
- ResNet
- MP-SENe
- GPT-nano
- Egor Shvetsov
RESEARCH · arXiv cs.AI English(EN) · 4d · [2 sources]

MambaGaze: Bidirectional Mamba with Explicit Missing Data Modeling for Cognitive Load Assessment from Eye-Gaze Tracking Data

Researchers have developed MambaGaze, a new framework designed to accurately assess cognitive load using eye-gaze tracking data. This system utilizes bidirectional Mamba-2 to efficiently model long-range temporal dependencies and an XMD encoding method to explicitly handle missing data, such as that caused by blinks. MambaGaze demonstrated superior performance over existing models on benchmark datasets and is feasible for real-time deployment on edge devices like NVIDIA Jetson platforms. AI

IMPACT Introduces a novel approach for real-time cognitive load assessment, potentially enabling more responsive human-AI interaction in safety-critical systems.
- NVIDIA Jetson
- Transformer
- CNN
- Mamba-2
- CLARE
- Amir Mousavi Seyed
- MambaGaze
- CL-Drive
- ResNet
TOOL · arXiv cs.AI English(EN) · 6d

StableGrad: Backward Scale Control without Batch Normalization

Researchers have introduced StableGrad, a novel optimizer-level mechanism designed to control the scale of activations and gradients in deep neural networks. This method aims to prevent training instability without relying on traditional batch normalization, which can be problematic for applications like Physics-Informed Neural Networks (PINNs). StableGrad operates by adjusting weight-gradient imbalances after backpropagation but before the optimizer update, thereby preserving the network's forward pass and physical residual accuracy. Evaluations on deep PINNs and standard architectures like ResNet and EfficientNet demonstrated StableGrad's effectiveness in improving accuracy and stabilizing optimization, even when batch normalization is removed. AI

IMPACT Offers a new technique to stabilize deep neural network training, particularly beneficial for physics-informed models where standard normalization methods are unsuitable.
RESEARCH · arXiv cs.CV English(EN) · 3d · [2 sources]

Recursive Block-Diagonal Coupling for Resource-Efficient Training of Vision Models

Researchers have developed a new training protocol called RBDC to make training large vision models more resource-efficient. This method involves recursively coupling independently trained, narrower models in a parameter-free block-diagonal manner. Evaluations on ImageNet using Vision Transformers and ResNets demonstrated a 30% FLOPs reduction with comparable accuracy and improved performance at the same training FLOPs compared to existing growth methods. The RBDC-trained models also showed enhanced utility as backbones for downstream tasks like object detection and instance segmentation. AI

IMPACT Reduces computational costs for training large vision models, potentially accelerating research and deployment.
RESEARCH · arXiv cs.LG English(EN) · 4d · [2 sources]

What Linear Probes Miss: Multi-View Probing for Weight-Space Learning

Researchers have developed MVProbe, a novel multi-view probing framework designed to analyze large open-source AI models directly from their parameters. This method addresses the computational limitations of processing full model weights by extracting representations through learnable probe vectors. MVProbe enhances existing single-view probing techniques by incorporating higher-order correlation patterns, outperforming previous methods on the Model Jungle benchmark across various architectures like ResNet and Stable Diffusion LoRA adapters. AI

IMPACT Provides a more efficient method for analyzing and understanding the vast number of open-source AI models available.
- MAE
- DINO
- Stable Diffusion LoRA
- MVProbe
- SupViT
- Model Jungle
- ResNet
RESEARCH · arXiv cs.NE (Neural & Evolutionary) English(EN) · 1w · [5 sources]

Stability and Discretization Error of State Space Model Neural Operators

Researchers are exploring advanced neural operator frameworks to enhance scientific computing. One paper introduces the Infinite-order Kernel Neural Operator (IKNO), which uses infinite-order kernel integrals for improved expressivity and achieves state-of-the-art accuracy on various benchmarks. Another study presents a unified abstract neural flow framework, demonstrating universal approximation properties for both finite-dimensional function approximation and infinite-dimensional operator approximation, applicable to both neural networks and neural operators. AI

IMPACT These advancements in neural operator frameworks could lead to more accurate and efficient solutions for complex scientific and engineering problems.