Brief

last 24h

[50/166] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · arXiv cs.LG English(EN) · 3d · [2 sources]

SegmentAnyTreeV2: Scaling Transformer-Based Tree Instance Segmentation Across Sensors, Platforms, and Forests

Researchers have developed SegmentAnyTreeV2, a new framework for segmenting individual trees within forest point cloud data. This system utilizes a Point Transformer v3 backbone and a specialized mask decoder to achieve high accuracy in identifying and outlining trees, even in dense and complex environments. The accompanying FOR-instance v3 benchmark dataset includes over 26,000 annotated trees, enabling robust evaluation and demonstrating SegmentAnyTreeV2's superior performance and cross-domain generalization capabilities. AI

IMPACT Sets a new benchmark for tree instance segmentation in forestry, potentially improving ecological monitoring and resource management.
RESEARCH · arXiv cs.LG English(EN) · 3d · [2 sources]

Neural Field Tokenizations with Hierarchy and Spatial Locality Priors

Researchers have developed LH-NeF, a new framework for learning tokenized representations of continuous signals using neural fields. This approach incorporates hierarchy and spatial locality priors, enabling a feed-forward encoding method that significantly reduces memory usage and increases batch sizes compared to previous meta-learning techniques. LH-NeF demonstrates strong performance across various data types, including images, 3D shapes, and climate fields, matching or surpassing existing specialized and general baselines. AI

IMPACT Introduces a more memory-efficient and scalable method for learning representations from continuous signals using neural fields.
- LH-NeF
- arXiv
RESEARCH · arXiv cs.AI English(EN) · 3d · [2 sources]

GlobeAudio: A Multilingual Multicultural Benchmark for Naturalistic Evaluation of Large Audio-Language Models

Researchers have introduced GlobeAudio, a new benchmark designed to evaluate Large Audio-Language Models (LALMs) in more realistic, naturalistic settings. The benchmark features 5,637 multiple-choice questions in six diverse languages, created by native speakers using naturally occurring audio. Initial evaluations using GlobeAudio revealed significant performance disparities, especially for open-source models and less common languages, highlighting current limitations in LALM capabilities. AI

IMPACT Highlights critical limitations in current LALMs and emphasizes the need for more realistic audio evaluation methods.
RESEARCH · r/MachineLearning English(EN) · 1d · [2 sources]

Open image generation models are closer to closed-source quality than this sub thinks [D]

Open-source image generation models are now nearly on par with closed-source alternatives in terms of quality and capabilities. Recent evaluations show that open models are closing the gap in areas like compositional accuracy and prompt adherence. Furthermore, open models are demonstrating improved text rendering in images and faster generation speeds on consumer hardware, challenging previous assumptions about their limitations. AI

IMPACT Open-source models are becoming competitive with closed-source alternatives, potentially democratizing advanced image generation capabilities.
RESEARCH · arXiv cs.CL English(EN) · 3d · [5 sources]

Diffusion Language Model Parallel Decoding via Product-of-Experts Bridge

Researchers are developing new methods to improve the decoding process for diffusion language models (DLMs), which enable parallel text generation but currently lag behind auto-regressive models in quality. Several papers propose novel techniques to bridge this gap by better capturing token relationships and improving the interface between the diffusion decoder and the language model. These advancements aim to enhance both the speed and accuracy of DLM generation, making them more competitive for complex tasks like mathematical reasoning and code generation. AI

IMPACT These advancements could significantly improve the efficiency and effectiveness of parallel text generation, making diffusion models more viable for complex AI applications.
RESEARCH · Hugging Face Daily Papers English(EN) · 2d · [2 sources]

OmniCap-IF: Benchmarking and Improving Instruction Following Abilities for Omni-Video Captioning

Researchers have introduced OmniCap-IF, a new benchmark designed to evaluate how well omni-modal large language models (OLLMs) can follow complex user instructions for video captioning. The benchmark revealed significant performance gaps and a trade-off where increased formatting complexity degrades reasoning abilities. To address these issues, a 54K instruction-tuning dataset, OmniCap-IF-54K, was created, along with a model called OmniCaptioner-IF that shows improved instruction adherence. AI

IMPACT Establishes a new standard for evaluating multimodal instruction following, potentially driving improvements in controllable video generation.
RESEARCH · arXiv cs.CL English(EN) · 3d · [2 sources]

Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

Researchers have developed Robust-U1, a new framework designed to enhance the robustness of Multimodal Large Language Models (MLLMs) when dealing with corrupted visual content. This approach enables MLLMs to self-recover damaged images, improving their ability to understand and reason about visual information. The framework utilizes a three-stage process involving supervised fine-tuning, reinforcement learning with dual rewards, and multimodal reasoning to achieve state-of-the-art performance on corruption benchmarks. AI

IMPACT Enhances MLLM robustness against visual corruption, potentially improving real-world application reliability.
- Robust-U1
- MLLMs
RESEARCH · arXiv cs.CL English(EN) · 3d · [2 sources]

What's the Point? Spatial Grammar & Index Resolution for Sign Language Processing

Researchers have developed a new framework to improve sign language models by focusing on spatial indexing, a crucial but often overlooked aspect of sign language. This approach decomposes the resolution of spatial references into index detection and discourse entity linking, aiming to better capture pointing gestures used for co-reference. The proposed method establishes a baseline for index-aware sign language modeling and can augment existing models to improve their understanding of non-lexical constructions. AI

IMPACT Enhances AI's ability to understand and process sign language, potentially improving accessibility and communication tools for the deaf and hard-of-hearing community.
- arXiv
- Sign Language Recognition
RESEARCH · Mastodon — fosstodon.org 日本語(JA) · 22h

Siri x Gemini's Ultimate Combo Begins! New OS Drastically Changes iPhone Usability | Lifehacker Japan https://www.yayafa.com/?p=2818338 # AgenticAi # AI # ArtificialGeneralIntelligence # ArtificialInte

Apple is reportedly integrating Google's Gemini AI into its upcoming iOS operating system, potentially enhancing Siri's capabilities. This collaboration aims to significantly alter the user experience on iPhones by leveraging Gemini's advanced AI features. The move suggests a strategic partnership to boost the intelligence and functionality of Apple's native AI assistant. AI

IMPACT This integration could significantly enhance mobile AI capabilities and set new standards for virtual assistants on smartphones.
- Apple
- Siri
- Gemini
- iOS
- Google
RESEARCH · Mastodon — mastodon.social 日本語(JA) · 18h

The era has arrived where a 20 billion parameter AI runs on an iPhone. https://ascii.jp/elem/000/004/409/4409094/?rss # ascii # AI

Apple's latest iPhones are now capable of running AI models with up to 20 billion parameters directly on the device. This advancement enables more sophisticated AI applications to function locally, enhancing privacy and reducing reliance on cloud processing. The integration signifies a major step towards on-device AI, making powerful AI features accessible without an internet connection. AI

IMPACT Accelerates the trend of powerful AI running locally on consumer devices, enhancing privacy and offline functionality.
- AI
- iPhone
RESEARCH · r/StableDiffusion Português(PT) · 1d · [2 sources]

Ideogram 4 - 80s Anime Lora

A user has released version 2 of their "80s Anime Lora" for Stable Diffusion, which is trained on the Ideogram 4 model. This updated version uses an expanded dataset of 65 images and was trained for an additional 6000 steps, resulting in increased detail and contrast while maintaining the desired retro aesthetic. The creator is pleased with the results and is moving on to new concepts, encouraging others to experiment with Lora training. AI

IMPACT Enables users to generate images with a specific retro anime aesthetic using Stable Diffusion.
RESEARCH · Hugging Face Daily Papers English(EN) · 3d · [2 sources]

Phase Marginalization for Patch-Grid Instability in Vision Transformers

Researchers have developed a new technique called Phase Marginalization to address instability issues in Vision Transformers (ViTs) when performing dense prediction tasks. This method tackles the problem where fixed patch grids in ViTs can lead to inconsistent results, particularly near image boundaries. By evaluating different patch-grid configurations and aggregating the outputs, Phase Marginalization offers a training-free approach that improves accuracy in tasks like segmentation and depth estimation. AI

IMPACT Introduces a method to improve the robustness of Vision Transformers for dense prediction tasks like segmentation and depth estimation.
RESEARCH · Hugging Face Daily Papers English(EN) · 3d · [3 sources]

Chiaroscuro Attention: Spending Compute in the Dark

Researchers have developed CHIAR-Former, a novel 4-layer transformer model that optimizes compute usage by dynamically routing tokens. Instead of applying self-attention uniformly, CHIAR-Former analyzes token spectral entropy to direct each token to one of three operators: DCT spectral mixing, RBF kernel mixing, or full self-attention. This approach significantly improves performance on large-scale naturalistic text, achieving a 45% perplexity improvement on WikiText-103 with 62.5% fewer attention FLOPs compared to a standard transformer. AI

IMPACT Introduces a method to significantly reduce computational cost for transformers on large text datasets.
RESEARCH · arXiv cs.LG English(EN) · 4d · [3 sources]

RETROSPECT: RETROsynthesis via Sequential Prediction, and Chemically Transformed-ranking

Researchers have developed RETROSPECT, a novel system for chemical retrosynthesis that improves prediction accuracy and candidate selection. The system utilizes a Transformer-based proposal model, ChemAlign, combined with a LambdaMART reranker. This approach achieved 55.00% top-1 exact-match accuracy on the USPTO-50K dataset, demonstrating a significant advancement in predicting chemical reactions. AI

IMPACT Enhances AI's capability in scientific discovery, potentially accelerating drug development and chemical research.
RESEARCH · arXiv stat.ML English(EN) · 4d · [2 sources]

Vessel Traffic Flow Prediction on Sparse Data via Spatio-Temporal Graph Neural Networks with a Learnable Tweedie Head

Researchers have developed a new plug-and-play output module, the learnable Tweedie head, designed to enhance spatio-temporal graph neural networks (ST-GNNs) for predicting vessel traffic flow. This module specifically addresses the challenge of sparse and intermittent maritime data, which often causes conventional ST-GNNs to produce overly conservative predictions. By optimizing the Tweedie unit deviance and learning node-level variance, the new head improves forecasting accuracy, particularly for non-zero events, as demonstrated in experiments using real-world AIS data from the Ports of Los Angeles and Long Beach. AI

IMPACT Enhances forecasting accuracy for sparse maritime data, potentially improving smart port operations and navigational safety.
RESEARCH · arXiv stat.ML English(EN) · 4d · [2 sources]

Transfer learning for causal forest

Researchers have developed a novel transfer learning approach for causal forests, specifically the HTERF model, which estimates Conditional Average Treatment Effects (CATE). This method adapts knowledge from a source domain with ample data to a target domain with limited data, employing an offset technique to bridge distribution differences. The study provides a theoretical bound on CATE error and demonstrates strong performance through simulations and a real-world dataset. AI

IMPACT Introduces a refined method for estimating treatment effects in low-data scenarios, potentially improving decision-making in fields like medicine and policy.
RESEARCH · arXiv cs.AI English(EN) · 4d · [4 sources]

CT-VAM: A Cerebello-Thalamic-Inspired Vision-Action Model for Efficient Visuomotor Control

Researchers have developed new models for robot visuomotor control, focusing on efficient and predictive coordination. CT-VAM, a cerebello-thalamic-inspired model, uses a compact architecture for fast, task-conditioned action prediction, enabling cloud-edge paradigms. Chameleon addresses observation-action delay by incorporating control-indexed prospective memory, significantly improving performance on challenging benchmarks. Separately, a diffusion-based framework learns predictive visuomotor coordination by integrating multimodal signals for forecasting human motion. AI

IMPACT Advances in visuomotor control could accelerate robot autonomy and human-robot interaction.
- EgoExo4D
- Chameleon
- Camo-Dataset
- LIBERO-10
- Xinying Guo
- MIKASA-Robo
- MemoryBench
- LIBERO
- TARS
- CT-VAM
RESEARCH · Hugging Face Daily Papers English(EN) · 3d · [2 sources]

Light-WAM: Efficient World Action Models with State-Fusion Action Decoding

Researchers have developed Light-WAM, a new lightweight model designed for efficient robot manipulation. This model incorporates future video prediction into its training objectives, enabling it to encode temporal structures for better representation learning. Light-WAM utilizes a compact video backbone and a downsampled latent space to reduce training costs and inference latency, making it suitable for real-time applications. AI

IMPACT Introduces a more efficient approach to robot manipulation by integrating future prediction, potentially lowering the barrier for real-time robotic applications.
RESEARCH · arXiv cs.IR (Information Retrieval) English(EN) · 4d · [2 sources]

Constrained Dominant Sets for Multimodal Document Question Answering

Researchers have developed a new retrieval method called Constrained Dominant Sets (CDS) for multimodal document question answering. This technique addresses limitations in current systems that struggle with long documents by selecting complementary evidence rather than near-duplicates. CDS encodes the query as a structural constraint, automatically balances relevance and redundancy, and avoids greedy heuristics by achieving global equilibrium. When used with a Qwen3-VL-32B reader, CDS sets a new state-of-the-art on VisDoMBench and significantly improves performance on MMLongBench-Doc. AI

IMPACT Establishes new SOTA on multimodal QA benchmarks, improving retrieval for long documents.
RESEARCH · Mastodon — fosstodon.org 日本語(JA) · 3d · [17 sources]

Tokenization in Transformers v5: Simpler, More Understandable, More Modular https:// huggingface.co/blog/tokenizers ※AI-generated automatic post (headline + link) # AI # GenerativeAI # LLM # AIGenerated

Hugging Face has published a series of blog posts detailing advancements in AI development. These posts cover topics such as building custom CUDA kernels with Codex and Claude, the release of OpenClaw, and methods for constructing deep research capabilities. Additionally, they highlight the ease of building and sharing ROCm kernels on Hugging Face, the use of OpenAI Codex vouchers in hackathons, and the evaluation of tool-using agents in real-world environments with OpenEnv. Further topics include Mixture-of-Experts (MoE) transformers, multimodal embedding models for re-ranking, and Waypoint-1.5 for enhanced interactive worlds on consumer GPUs. Finally, DeepSeek-V4 is introduced, offering a 1 million token context window for agents. AI

IMPACT Showcases diverse AI research, from custom kernel development and agent evaluation to new model architectures and large context windows, pushing the boundaries of AI capabilities.
RESEARCH · arXiv cs.IR (Information Retrieval) English(EN) · 5d · [2 sources]

A Vision-language Framework for Comparative Reasoning in Radiology

Researchers have developed new frameworks to improve AI's ability to interpret medical images, particularly in radiology. One approach, MedReCo, focuses on comparative reasoning across different patient scans and historical data to aid in diagnosis and follow-up. Another framework, CheXanatomy, integrates explicit anatomical knowledge into vision-language models for more precise tasks like segmentation, by training models to generate anatomical masks. Both methods aim to make AI more aligned with clinical practice by learning from large-scale medical data. AI

IMPACT These advancements could lead to more accurate and clinically relevant AI tools for radiology, improving diagnostic capabilities and patient care.
RESEARCH · Hugging Face Trending Models English(EN) · 4d · [3 sources]

CohereLabs/North-Mini-Code-1.0

Cohere has released North-Mini-Code-1.0, a 30 billion parameter coding model. While its general artificial analysis score is lower than some competitors, it performs competitively in coding benchmarks. The model is available on Hugging Face for users to download and utilize. AI

IMPACT Provides a new option for developers needing coding assistance, potentially improving code generation efficiency.
RESEARCH · arXiv cs.NE (Neural & Evolutionary) English(EN) · 6d · [2 sources]

Seq103: A Unified Neuroevolution Framework for Compact Sequence Architecture Discovery

Researchers have developed Seq103, a novel neuroevolution framework designed to discover compact sequence architectures. This unified system utilizes a shared evolutionary backbone with an optional recurrent extension to handle both step-wise recurrent and sample-wise feedforward sequence classification tasks. Seq103 demonstrates significant parameter reduction, retaining a high percentage of baseline accuracy across various text classification and time-series datasets. AI

IMPACT This framework could enable more efficient development of sequence models by reducing parameter count while maintaining performance.
- Seq103
- UCRArchive2018
RESEARCH · Mastodon — sigmoid.social 日本語(JA) · 4d · [7 sources]

📝 The Democratization of Training Begins - Why Huawei's Ascend 910C Accelerates the Break from NVIDIA Dependency. Huawei's cutting-edge chip 'Ascend 910C' successfully post-trained DeepSeek-V4-Pro. This is not just a technological achievement, but signifies the geopolitical decentralization of AI training resources. 🔗 htt

A research group, including Huawei and institutions from Shenzhen, claims to have successfully completed full-parameter post-training on DeepSeek's 1.6 trillion parameter V4-Pro model. This was achieved using a cluster of at least 1,000 Huawei Ascend 910C AI chips. This development is seen as a significant step towards China's AI self-reliance, particularly in overcoming challenges with training complex models on domestic hardware, though specific performance benchmarks are currently absent. AI

IMPACT Demonstrates progress in China's domestic AI training capabilities, potentially reducing reliance on foreign hardware for complex model refinement.
RESEARCH · Hugging Face Daily Papers English(EN) · 5d · [2 sources]

A Geometric Account of Activation Steering through Angle-Norm Decomposition

Researchers have developed a geometric framework to better understand activation steering in language models. Their study reveals that while concepts are primarily encoded in the angular direction of hidden states, the norm (magnitude) of these states is crucial for steering stability and effectiveness. The findings suggest that activation steering interventions should be parameterized by both angular and radial components, rather than a single additive coefficient, to disentangle their distinct roles. AI

IMPACT Provides a more interpretable framework for controlling LLM behavior, potentially leading to more stable and effective interventions.
RESEARCH · Hugging Face Daily Papers English(EN) · 6d · [3 sources]

Why Muon Outperforms Adam: A Curvature Perspective

A new research paper and accompanying analysis explore the performance advantages of the Muon optimizer over Adam, particularly in the training of large language models and vision classifiers. Studies indicate that Muon learns more robust and transferable features, showing better performance on corrupted data and improved transferability to downstream tasks. This superiority is attributed to Muon's ability to reduce curvature penalties by maintaining lower normalized directional sharpness, especially in later stages of training, an effect amplified by data imbalance. AI

IMPACT Muon's demonstrated ability to learn more robust and transferable features could lead to more efficient and effective training of future large language models and AI systems.
RESEARCH · arXiv cs.LG English(EN) · 1w · [24 sources]

Sparse Subspace-to-Expert Sharing for Task-Agnostic Continual Learning

Researchers are exploring new methods to improve continual learning in AI systems, focusing on how models can learn from sequential experiences without forgetting past knowledge. New benchmarks like CL-Bench are being developed to rigorously evaluate these systems across diverse domains. Papers also introduce novel techniques such as TailLoR for parameter-efficient fine-tuning and reframe catastrophic forgetting not as knowledge erasure but as an accessibility problem. AI

IMPACT Advances in continual learning could lead to more adaptable and efficient AI systems that learn continuously in real-world, dynamic environments.
- Srijith Nair
- PMF-CL
- Pareto-minimal-forgetting continual learner
- IPBT
- T5-Large
- LLaMA
- Qwen
- SABER
- TailLoR
- CL-Bench
- arXiv
- CIFAR-100
- ResNet-18
- CLaaS
RESEARCH · arXiv cs.LG English(EN) · 1w · [11 sources]

TabSwift: An Efficient Tabular Foundation Model with Row-Wise Attention

Researchers are developing new tabular foundation models (TFMs) to improve efficiency and performance. TabSwift enhances the TabPFN architecture with row-wise attention and learnable tokens for competitive accuracy and faster inference. LimiX-2M, a smaller model, also outperforms larger baselines by addressing attention bottlenecks and using a novel tokenization framework. Additionally, efforts are underway to speed up TFM pretraining through community-driven 'speedruns' and to compress datasets for faster inference and reduced memory usage. AI

IMPACT These advancements aim to make tabular foundation models more efficient and accessible, potentially accelerating their adoption in real-world applications.
- TabArena
- TACO
- Guri Zabërgja
- Salih Bora Ozturk
- nanoTabPFN
- RaBEL
- SNF
- TabPFN-v2
- TabICL
- LimiX-2M
- Tabular Foundation Models
- TabSwift
- TabPFN
- GOTabPFN
RESEARCH · arXiv cs.LG English(EN) · 1w · [24 sources]

Prediction Under Imperfect Compression: A Theory of Approximate MDL

Researchers are exploring novel methods for compressing large models and datasets to improve efficiency. Papers discuss unifying dataset pruning and distillation, bootstrapped tokenization for image generation, and activation-informed low-rank compression for LLMs and VLMs. Other work focuses on generic triple-latent sequence models, theoretical aspects of prediction under imperfect compression, and jointly optimizing architectural and quantization choices for LLM compression. AI

IMPACT Advances in compression techniques could significantly reduce deployment costs and increase the accessibility of large AI models.
- Snigdha Chandan Khilar
- Pythia
- SVD
- LLM
- Qwen2.5
- LLMs
- ProjQ
- LLaMA-2
- Qwen3
- SVD LLM
- SubFit
- OpenAI
- Entropy Gate
- low-rank compression
- dataset distillation
- dataset pruning
- VLM
- PGSVD
- SelfBootTok
- tokenization
RESEARCH · r/StableDiffusion English(EN) · 2d · [2 sources]

Some posters I generated with Ideogram 4.

Users are experimenting with Ideogram 4, an AI image generation model, to create high-resolution images. One user shared examples of 17MP images, including a Warhammer 40k-esque ship and a Millennium Falcon, noting the challenges of previewing composition at such large scales and the significant processing time required. Another user showcased posters generated with Ideogram 4, utilizing SeedVR2 for upscaling. AI

IMPACT Demonstrates advanced capabilities in AI image generation for high-resolution outputs.
RESEARCH · Hugging Face Daily Papers English(EN) · 1w · [21 sources]

LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

Researchers have introduced several new models and frameworks for advancing video generation and editing capabilities. LoomVideo, a 5B-parameter model, unifies video generation and editing with an efficient architecture that accelerates inference speed. Echo-Infinity tackles real-time infinite video generation using an evolving memory system and a unified relative RoPE approach. Additionally, LongLive-RAG and COVRAG propose retrieval-augmented generation techniques to improve temporal coherence and geometric consistency in long-horizon video synthesis. AI

IMPACT Advances in video generation models promise more efficient and coherent content creation, impacting creative industries and AI-driven media.
RESEARCH · arXiv cs.LG English(EN) · 2w · [66 sources]

Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization

Researchers are developing new methods to improve policy optimization in reinforcement learning, particularly for large language models and robotics. Techniques like Physics-Guided Policy Optimization (PGPO) and Hint-Guided Diversified Policy Optimization (HDPO) aim to enhance stability and reasoning capabilities by incorporating physics-based modulation and diverse solution exploration. Other advancements include MeanFlow models for efficient policy representation, zero-shot off-policy learning for adaptability, and strategies to mitigate advantage collapse in algorithms like GRPO. These efforts collectively seek to make reinforcement learning more robust, efficient, and effective across various AI applications. AI

IMPACT Advances in policy optimization could lead to more capable and stable AI systems in complex reasoning and control tasks.
RESEARCH · r/LocalLLaMA English(EN) · 3d · [10 sources]

[3090] Gemma4 QAT + MTP quick TPS numbers [TLDR 1.2-1.8x better]

Users on r/LocalLLaMA are discussing their experiences with the Quantization-Aware Training (QAT) variants of Google's Gemma 4 models. Some users report improved performance, particularly with longer contexts and more varied responses in roleplaying scenarios, while others note accuracy inconsistencies and degradation compared to non-QAT versions. There is ongoing discussion about the best methodologies to compare QAT models against their original counterparts and to evaluate the impact of quantization on different model sizes. AI

IMPACT User experiences highlight potential trade-offs between quantization methods and model performance, influencing local LLM deployment choices.
- Gemma 4 31B
- UD
- Heretic
- Gemma 4
- Google
- Gemma 4 12B
- r/LocalLLaMA
- Qwen 3.6 27B
- Gemma 4 26B
RESEARCH · r/StableDiffusion Italiano(IT) · 3d · [5 sources]

Ideogram 4.0 Realism Engine Lora (Beta)

Users on Reddit are exploring the capabilities of Ideogram 4.0 for training LoRAs, which are custom models used to fine-tune AI image generation. Discussions revolve around achieving accurate multi-character LoRAs and applying specific artistic styles, such as an "Arcane" theme. Some users are sharing experimental results and tips for training, while others are encountering technical issues like out-of-memory errors. AI

IMPACT Users are experimenting with custom model training for Ideogram 4.0, sharing techniques and results for LoRA creation.
RESEARCH · arXiv cs.LG English(EN) · 2w · [38 sources]

Pre-Registering the Detectable Effect: A Paired-MDE Budget for 4-bit Quantization Benchmarks, with a Pilot Audit

Researchers have developed several new methods to improve the efficiency and accuracy of quantizing large language models (LLMs). These techniques aim to reduce the memory footprint and computational cost of LLMs, making them more accessible for deployment on resource-constrained devices. Innovations include calibration-free bit allocation for Mixture-of-Experts (MoE) models, outlier injection to exploit quantization vulnerabilities, and hardware-friendly mixed-precision quantization frameworks. AI

IMPACT These advancements in LLM quantization could significantly lower deployment costs and increase accessibility for a wider range of applications and hardware.
- arXiv
- GEMQ
- MoE-LLMs
- Mixture-of-Experts Large Language Models
- INT8
- INT4
- LLaMA
- MoBiQuant
- InfoQuant
- WINDQuant
- ReSpinQuant
- NeUQI
- Qwen
- FP8
- LLaMA-2-7B
- Mixture-of-Experts (MoE)
- LLM
- LLaMA-3.1-8B
- EmaQ
- EmaQ-LT
- AlphaQ
- OASIS
- WaterSIC
- GPTQ
- GGUF
- Qwen1.5-MoE
RESEARCH · arXiv cs.IR (Information Retrieval) English(EN) · 3w · [104 sources]

FAiT: Frequency-Aware Inverted Transformer for Multivariate Time Series Forecasting

Researchers are developing new methods for time series forecasting, focusing on improving accuracy and robustness. Several papers introduce novel attention mechanisms and model architectures designed to better capture complex dependencies, including positive and negative relationships, and to handle non-stationarity and limited data. New benchmarks and evaluation frameworks are also being proposed to rigorously assess these advancements and identify specific failure modes in financial and general time series forecasting. AI

IMPACT Advances in time series forecasting models and benchmarks will improve predictive accuracy and robustness across various domains, including finance and operations.
RESEARCH · Hugging Face Daily Papers English(EN) · 2w · [4 sources]

WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models

Researchers have developed new methods for generating controllable video world models. DisCo focuses on using discrete action primitives to improve control over camera motion, addressing issues with continuous trajectories. Prisma-World tackles the challenge of multi-agent video generation by ensuring cross-view consistency through a joint geometry-aware denoising process and introduces a new dataset for training and evaluation. AI

IMPACT These advancements in controllable video generation could enable more realistic and interactive virtual environments for training and simulation.
- PrismaDataset
- Prisma-World
RESEARCH · Hugging Face Daily Papers English(EN) · 3w · [97 sources]

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

Researchers are exploring novel approaches to enhance the efficiency and effectiveness of attention mechanisms in transformers. Several papers introduce methods to mitigate issues like over-smoothing and computational bottlenecks, particularly in graph transformers and large language models. Techniques include capacity-controlled attention gating, analyzing attention sinks to differentiate between adaptive no-op and broadcast mechanisms, and developing sparse attention strategies for ultra-long contexts. These advancements aim to improve model performance on various benchmarks while reducing computational costs. AI

IMPACT These research papers introduce techniques to improve transformer efficiency and performance, potentially leading to more capable and cost-effective AI models for various applications.
RESEARCH · arXiv cs.IR (Information Retrieval) English(EN) · 3w · [53 sources]

Fairness-Aware Retrieval Optimization for Retrieval-Augmented Generation

Researchers are developing new methods to improve Retrieval-Augmented Generation (RAG) systems, which ground large language models with external evidence. Several papers introduce novel techniques to address issues like hallucinations, irrelevant information retrieval, and inefficient processing. These advancements include graph-based expert mixtures, structured critic frameworks for error correction, and mindscape-aware approaches for better long-context understanding. Additionally, new benchmarks are being created to evaluate RAG performance in specialized domains like Canadian law, and methods for quantifying uncertainty in multimodal RAG are being explored. AI

IMPACT Advances in RAG aim to reduce hallucinations and improve reasoning, leading to more reliable AI systems across various applications.
RESEARCH · Hugging Face Daily Papers English(EN) · 3w · [55 sources]

Matérn Noise for Triangulation-Agnostic Flow Matching on Meshes

Researchers are advancing flow matching techniques for generative modeling across various domains. New methods like Kinetic Path Energy (KPE) and Kinetic Trajectory Shaping (KTS) aim to improve generation quality by analyzing trajectory energy. PrismFlow introduces dynamical experts for better time-series generation, while Random Process Flow Matching (RP Flow) focuses on sparse data and uncertainty estimation. STFlow enhances trajectory simulation by incorporating data-dependent couplings, and Recursive Flow Matching (RecFM) offers speed-fidelity improvements for spatiotemporal dynamics. Additionally, Guided Flow Matching (FM4PDE) addresses PDE problems with sparse observations, and AdvantageFlow and Flow-OPD explore reinforcement learning applications within flow models for improved policy optimization and multi-task alignment. AI

IMPACT These advancements in flow matching techniques promise improved generative model performance, efficiency, and applicability across scientific and RL domains.
RESEARCH · Hugging Face Daily Papers English(EN) · 3w · [60 sources]

Watch, Remember, Reason: Human-View Video Understanding with MLLMs

Researchers are developing new methods for real-time video understanding, moving beyond traditional offline analysis. Several papers propose architectures that decouple visual perception from language generation to improve efficiency and responsiveness. These approaches aim to enable models to process video frames continuously, revise answers as new information emerges, and maintain synchrony with video playback. AI

IMPACT These advancements could lead to more interactive and responsive AI systems for analyzing video content in real-time.
RESEARCH · arXiv cs.CL English(EN) · 3w · [87 sources]

Dynamic Chunking for Diffusion Language Models

Researchers are exploring new methods to improve diffusion language models (DLMs), which offer faster inference than autoregressive models. Several recent papers introduce techniques to enhance DLM performance, including NAVIRA for decoupled remasking, SARDI for retrieval-augmented generation using discarded tokens, and AXON for supportive token revealing. Another study identifies limitations in DLMs, such as a locality bias and distraction from mask tokens, proposing a mask-agnostic loss function to improve context comprehension. Additionally, a survey provides a comprehensive overview of the DLM landscape, covering foundational principles, state-of-the-art models, and future research directions. AI

IMPACT New techniques aim to improve the speed and accuracy of diffusion language models, potentially making them more competitive with autoregressive models.
RESEARCH · llama.cpp — Releases (SO) · 2w · [166 sources]

b9301

The llama.cpp project has released several updates, including version b9580 which adds Vulkan support for matrix-matrix multiplication and Flash Attention, along with optimizations for FP16 dot2 extensions. Other recent releases like b9578 and b9577 include refactoring for video subprocess handling and server prompt logging, respectively. These updates provide pre-compiled binaries for various platforms including macOS, Linux, Android, and Windows, with support for different hardware accelerators like CUDA, ROCm, and Vulkan. AI

IMPACT These updates enhance performance and stability for local LLM inference, potentially improving user experience and enabling broader adoption on diverse hardware.
- llama.cpp
- CMake
- OpenMP
- Vulkan
- iOS
- OpenVINO
- ROCm
- CUDA
- Windows
- Android
- macOS
- Linux
- EXAONE 4.5
- Qwen2.5-VL
- Flash Attention
- FP16
RESEARCH · Mastodon — fosstodon.org English(EN) · 2w · [8 sources]

#AI #Coding #Harness Origin | Interest | Match

DeepSeek has released an open-source AI model that demonstrates strong performance in coding tasks. The model, named DeepSeek-Coder, is available in various parameter sizes and has shown competitive results on benchmarks like HumanEval and MBPP. This release aims to provide a powerful, accessible tool for developers and researchers in the AI community. AI

IMPACT Provides developers with a powerful, open-source coding assistant, potentially accelerating software development.
- DeepSeek
- DeepSeek-Coder
RESEARCH · Hugging Face Daily Papers English(EN) · 1mo · [2 sources]

Liberating LLM Capabilities in Full-Duplex Speech Models

Researchers have introduced a new paradigm called Listen-Write-Speak (LWS) for large language models interacting through speech. This approach allows a single LLM to simultaneously listen to audio, generate visible free-form text as its primary output, and produce a spoken response in real-time. The LWS system, implemented via a token schema without architectural changes, aims to unlock text-native capabilities like code generation and structured reasoning within speech interactions. AI

IMPACT Enables LLMs to perform text-native tasks like coding and structured reasoning during real-time voice conversations.
RESEARCH · X — SemiAnalysis English(EN) · 1mo · [3 sources]

@manicely6005 The public documentation can be found here too (3/3)

NVIDIA has open-sourced parts of its cuDNN library, a significant move after 12 years of it being closed-source. This release includes over 20 Mixture-of-Experts (MoE) kernels and NSA sparse attention kernels. The codebase for these kernels is largely written in Python CuTe-DSL, with public documentation now available. AI

IMPACT Open-sourcing of cuDNN kernels could accelerate research and development in AI infrastructure and model optimization.
- Mixture-of-Experts
- NSA
- NVIDIA
- CuTe-DSL
- cuDNN
- Python
RESEARCH · X — Qwen (Alibaba) English(EN) · 1mo · [3 sources]

Forward and backward benchmark results across common configurations. https://t.co/IHMCZRw9AW

Alibaba's Qwen team has released FlashQLA, a new set of high-performance linear attention kernels developed using TileLang. These kernels are designed to improve the efficiency of attention mechanisms in large language models. The team also shared benchmark results for their Qwen models, showcasing performance across various configurations. AI

IMPACT Introduces optimized kernels that could improve LLM inference speed and efficiency.
- FlashQLA
- Alibaba
- Qwen
- TileLang
RESEARCH · X — Google DeepMind English(EN) · 1mo · [6 sources]

This is Decoupled DiLoCo: our new resilient and flexible way to train advanced AI models across multiple data centres. 🧵 https://t.co/YRmPrqIbYE

Google DeepMind has introduced Decoupled DiLoCo, a novel approach to training advanced AI models that enhances resilience and flexibility across data centers. This system can train models like Google's 12B Gemma model across geographically dispersed regions using low-bandwidth networks and can even mix different generations of hardware, such as TPU6e and TPUv5p. Decoupled DiLoCo is designed to be self-healing, isolating and continuing training through artificial hardware failures and reintegrating units when they come back online, addressing the synchronization issues that typically stall AI training. AI

IMPACT Enables more robust and flexible large-scale AI model training, potentially reducing costs and increasing accessibility.
- Google DeepMind
- Decoupled DiLoCo
- TPU6e
- TPUv5p
- Pathways
- DiLoCo
- Google Gemma
RESEARCH · X — Runway (video gen) English(EN) · 1mo · [9 sources]

Have a big idea but no advertising budget? Make it yourself with Runway. All you need is a concept to start creating high impact ads for TV, social and more. Tr

Runway has released several updates to its video generation platform. Seedance 2.0 is now available in 1080p, via the iOS app, and through the Runway API. Additionally, users can now animate Runway Characters using scripts, bringing them to life with text prompts. AI
RESEARCH · X — Google AI English(EN) · 1mo · [3 sources]

Last week, we launched Gemini 3.1 TTS, our latest and best text-to-speech model. This new model introduces [awe] audio tags, an intuitive way to guide vocal sty

Google AI has released Gemini 3.1 TTS and Gemini 3.1 Flash TTS, their newest text-to-speech models. These models offer enhanced expressiveness and control, introducing audio tags to guide vocal style, pace, and delivery through natural language commands. The audio tags are designed to be an intuitive way for users to shape the output of the text-to-speech models. AI