PulseAugur / Brief
EN
LIVE 20:42:27

Brief

last 24h
[50/55] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Llama 4: Meta's Latest — Scout, Maverick, and the MoE Revolution

    Meta has released Llama 4 in April 2025, featuring a new Mixture of Experts (MoE) architecture. Two variants, Scout and Maverick, are available, with Scout serving as a balanced default and Maverick offering broader knowledge for specialized tasks. Both models leverage MoE to activate approximately 17 billion parameters per token, enabling high performance comparable to much larger models while remaining runnable on consumer hardware. AI

    IMPACT Sets a new standard for locally runnable large models, potentially accelerating adoption of advanced AI capabilities on consumer hardware.

  2. Alibaba’s Qwen catches up with ‘Sharif speed’ to help forge Pakistan deal

    Alibaba Chairman Joe Tsai utilized the company's Qwen AI tool to rapidly draft a strategic technology partnership agreement with Pakistan's Prime Minister Shehbaz Sharif. Sharif, known for his swift approach to development, requested the comprehensive pact during a visit to Alibaba's headquarters. The agreement, facilitated by Qwen's generative AI capabilities, covers areas such as AI infrastructure, cloud computing, healthcare, e-commerce, and digital payments, aiming to accelerate Pakistan's digital economy. AI

    Alibaba’s Qwen catches up with ‘Sharif speed’ to help forge Pakistan deal

    IMPACT Demonstrates AI's potential to accelerate international business and policy agreements, streamlining complex negotiations.

  3. RT @TeksEdge: 🚀 New MTP support for Strix Halo released! more on Arint.info # AI # AMD # MTP # Qwen # ROCm # StrixHalo # arint_info https://x.com/

    Arint.info has announced new support for Strix Halo, a significant development for AI hardware acceleration. This update integrates MTP (Multi-Threaded Processing) capabilities, enhancing performance for AI workloads. The announcement highlights compatibility with Qwen and ROCm, indicating a focus on optimizing deep learning tasks on AMD hardware. AI

    IMPACT Enhances AI hardware performance by enabling MTP support for Strix Halo, potentially improving deep learning task efficiency.

  4. How We Built Dynamic NPC Dialogue with LLMs — Lessons from Early Access

    Vantage Digital Labs has developed an LLM-powered engine for dynamic NPC dialogue in video games, moving beyond static, pre-written lines. Their architecture involves a context builder, LLM API, response parser, and memory system, with a focus on prompt engineering over model size for cost-effectiveness. Key lessons learned include prioritizing response parsing and low latency, with smaller models like DeepSeek and Qwen proving viable for indie games. AI

    IMPACT Enables more interactive and responsive non-player characters in games, potentially enhancing player immersion.

  5. Chinese LLMs Top Every Agentic Benchmark. Production Teams Pick Sonnet Anyway.

    A new benchmark evaluating LLMs on agentic tasks reveals that Chinese models like Qwen and Kimi outperform others. However, production teams often still prefer Anthropic's Claude Sonnet for real-world applications. This suggests a gap between theoretical performance on specific benchmarks and practical utility in development environments. AI

    Chinese LLMs Top Every Agentic Benchmark. Production Teams Pick Sonnet Anyway.

    IMPACT Highlights a discrepancy between benchmark performance and real-world utility, influencing model selection for production.

  6. Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency

    Alibaba's Qwen team has released Qwen3.5-LiveTranslate-Flash, a real-time multimodal translation model that significantly reduces latency to 2.8 seconds. This new model expands language support to 60 input languages and 29 output languages, while also incorporating visual cues like lip movements to improve accuracy in noisy environments. A standout feature is its ability to clone the original speaker's voice in real-time for translated output, creating a more natural listening experience. AI

    Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency

    IMPACT Enhances real-time multilingual communication by reducing latency and improving accuracy through multimodal input and voice cloning.

  7. Qwen's latest 3.7 Max preview version lands! Two generations of ultra-large cups iterate in parallel, Lin Junyang has left but is still accelerating

    Alibaba's Qwen team has released preview versions of its Qwen 3.7 Max and Qwen 3.7 Plus models, showcasing rapid iteration cycles. The Qwen 3.7 Max model has achieved top rankings among Chinese models in text-based benchmarks on Arena, placing 13th overall and within the top ten for specific categories like math and coding. The Qwen 3.7 Plus model also performed strongly in visual benchmarks, securing the top spot for Chinese models in that domain. AI

    IMPACT Accelerates the pace of frontier model development and competition among leading AI labs globally.

  8. Qwen 3.6 & 2.5: The Most Versatile Local Models

    Alibaba Cloud's Qwen models are highlighted as versatile open-source options in mid-2026, offering a range of sizes from 0.5B to 72B parameters. Qwen 3.6 and 2.5 boast impressive features like a 262K context window, strong tool-calling capabilities, and an Apache 2.0 license for commercial use. The models are easily accessible via Ollama, with specific recommendations based on available VRAM, and are presented as competitive local alternatives to models like GPT-4o and DeepSeek-R1, particularly for tasks requiring long context or function calling. AI

    IMPACT Provides powerful, locally runnable open-source models with long context capabilities, reducing reliance on cloud APIs for certain tasks.

  9. Alibaba signals next phase of AI growth from investment to commercialisation

    Alibaba is transitioning its AI efforts from initial investment to full-scale commercialization, aiming to become China's leading full-stack AI provider. The company projects 30 billion yuan in AI revenue by 2026, with AI agents expected to account for over half of its cloud sales. Alibaba's comprehensive AI ecosystem includes its own T-Head chips, cloud infrastructure, model-as-a-service platforms, and the Qwen foundation models, alongside consumer products like the Qwen app and the Wukong enterprise agent platform. AI

    Alibaba signals next phase of AI growth from investment to commercialisation

    IMPACT Alibaba's strategic shift to AI commercialization and projected revenue targets signal a major push in the Chinese AI market.

  10. Why does off-model SFT degrade capabilities?

    Researchers have found that Supervised Fine-Tuning (SFT) using outputs from a different AI model can significantly degrade the capabilities of the trained model. This degradation appears to be linked to the model adopting an unfamiliar reasoning style that it struggles to utilize effectively. The issue is not necessarily due to imitating a less capable teacher model, as degradation occurs even when the teacher is superior. Fortunately, this performance drop seems to be a shallow property, as a small amount of training to restore the original reasoning style can recover most of the lost performance. AI

    Why does off-model SFT degrade capabilities?

    IMPACT Understanding how off-model SFT impacts AI capabilities is crucial for developing safer and more aligned AI systems.

  11. 🧠 Qwen presented 3.7-Max, a model designed for the era of autonomous AI agents, with a focus on prolonged task execution.

    Qwen has launched Qwen 3.7-Max, a new AI model specifically engineered for autonomous agents. This model is designed to handle complex, long-duration tasks, marking a step forward in AI agent capabilities. The release emphasizes the model's potential for extended operational sequences. AI

    🧠 Qwen presented 3.7-Max, a model designed for the era of autonomous AI agents, with a focus on prolonged task execution.

    IMPACT Enables more sophisticated and prolonged autonomous agent operations.

  12. Returning from a trip almost always means finding yourself with an unmanageable amount of photos. In the case of Lisbon, the problem wasn't so much archiving the boxes

    A developer created an AI tool to automatically select the best photos from a trip, addressing the challenge of curating a large number of images into a shareable album. The application uses PhotoPrism to access image thumbnails and Ollama to run AI models. Initially, the AI focused on aesthetic scoring, but this led to monotonous selections. The tool was improved to cluster images based on semantic similarity, ensuring variety in the final album by selecting top photos from different clusters. AI

    IMPACT Automates photo curation, potentially improving user experience for managing large image libraries.

  13. Local LLMs in Production: Squeezing Qwen to Match Claude

    A developer details their experience optimizing local LLMs for production use, aiming to replicate the performance of cloud-based models like Claude 3.5 Sonnet. They found that certain Qwen models, while powerful, exhibited an unhelpful "thinking out loud" behavior that hindered their specific use case of generating clean JSON. After experimenting with different Qwen versions and prompt engineering techniques, they settled on Qwen2.5-32B-Instruct-fp8, which offered significantly faster response times compared to Claude 3.5 Sonnet for routine tasks. AI

    Local LLMs in Production: Squeezing Qwen to Match Claude

    IMPACT Demonstrates techniques for improving local LLM performance and reducing reliance on costly cloud APIs for routine tasks.

  14. EyeingAI (@EyeingAI) points out that while AI-generated tools are improving, the workflow for managing assets remains complex, and notes that Renoise Canvas aims to provide an integrated canvas for managing characters, scenes, references, versions, images, and videos on a single screen.

    Qwen 3.7 has been released, marking an update to the Qwen model series, though specific performance details are not yet available. Separately, QuiverAI's Arrow 1.1 can convert fashion sketches into editable SVGs, focusing on practical vector design generation. Additionally, Renoise Canvas aims to streamline asset management for AI-generated content by offering a unified interface for characters, scenes, and various media types. AI

    EyeingAI (@EyeingAI) points out that while AI-generated tools are improving, the workflow for managing assets remains complex, and notes that Renoise Canvas aims to provide an integrated canvas for managing characters, scenes, references, versions, images, and videos on a single screen.

    IMPACT These updates offer incremental improvements in model capabilities, design tool functionality, and asset management workflows for AI-generated content.

  15. AI emotions and aligned behavior

    A researcher explored AI safety by investigating the potential for emotional nudges to influence model behavior, drawing parallels to human psychology. The study suggests that models, like humans, exhibit internal states that drive actions and can be influenced by emotional cues. This approach aims to incentivize ethical actions and disincentivize unethical ones by manipulating the emotional stakes of decision-making, rather than relying solely on alignment or control mechanisms. AI

    AI emotions and aligned behavior

    IMPACT Suggests a novel approach to AI safety by leveraging emotional nudges, potentially influencing future model development and alignment strategies.

  16. Airbnb CEO Brian Chesky Called Chinese AI Fast And Cheap. Now, Congress Wants Answers

    Airbnb CEO Brian Chesky is facing scrutiny from U.S. lawmakers regarding the company's use of Chinese AI models, specifically Alibaba's Qwen. Chesky defended the practice, stating that Airbnb primarily uses open-source models and does not share data with Chinese companies, arguing that concerns about data access are a misunderstanding of the technology. This situation highlights the growing tension between U.S. national security interests and the availability of cost-effective AI solutions from China, as evidenced by a recent bipartisan bill aimed at promoting American technology procurement among allies. AI

    Airbnb CEO Brian Chesky Called Chinese AI Fast And Cheap. Now, Congress Wants Answers

    IMPACT Highlights geopolitical tensions in AI development and the trade-offs between cost-effectiveness and national security for AI adoption.

  17. 267 tok/s local inference on RTX 5090 – llama.cpp MTP + Qwen3-35B-A3B MoE

    Recent developments in local LLM inference focus on optimizing performance and VRAM usage for models like Qwen 3.6 and 3.5. One approach involves detailed backend comparisons for Qwen 3.6 27B on consumer GPUs, identifying optimal quantization and processing settings for high token counts. Another key technique is quantizing the Multi-token Prediction (MTP) KV cache, which significantly reduces VRAM demands for Qwen models without sacrificing quality. Additionally, a new local-first UI called MemoTree has been developed to improve context management for Ollama users, offering a branching chat interface. AI

    267 tok/s local inference on RTX 5090 – llama.cpp MTP + Qwen3-35B-A3B MoE

    IMPACT Optimizations for local LLM inference, particularly for Qwen models, enable more powerful AI capabilities on consumer hardware.

  18. UPDATE corrections and visual update of my web UI using comfy backend.

    A user has released an updated web interface for the Comfy backend, designed to streamline workflows for Stable Diffusion and other image generation models. The interface now supports predefined templates for various models including SDXL, Illustrous, FLUX, and QWEN, and integrates with LTX 2.3 Director. Users can import or edit nodes directly, and the interface includes additional features like upscaling and background removal. AI

    UPDATE corrections and visual update of my web UI using comfy backend.

    IMPACT Enhances user experience for AI image generation tools, offering more streamlined workflows and broader model compatibility.

  19. Qwen3.7 Max vs Open-Weight LLMs: Practical Migration Notes

    The author discusses practical considerations for migrating inference workloads from closed LLM APIs to open-weight models, driven by cost, data sensitivity, and latency concerns. They highlight Qwen as a strong contender with a rapid release cycle, alongside other notable models like Llama, DeepSeek, and Mistral. The article provides code examples demonstrating how to adapt existing OpenAI SDK calls to interface with self-hosted models via compatible API endpoints, such as those offered by vLLM. AI

    IMPACT Provides practical guidance for developers and organizations considering the shift to self-hosted open-weight LLMs.

  20. Google AI Edge Gallery Just Added MCP. Here's What On-Device Agents Can Actually Do Now

    Google has updated its AI Edge Gallery app to support the Model Context Protocol (MCP) on Android devices, enabling on-device AI agents. This update allows LLMs like Gemma 4 to run entirely locally, enhancing privacy and reducing latency by keeping all processing and data on the user's phone. The app now supports agent skills, calendar integration, and persistent chat history, moving it from a simple model playground to a functional on-device agent runtime. AI

    IMPACT Enables more private and capable AI agents to run directly on mobile devices.

  21. SPATIOROUTE: Dynamic Prompt Routing for Zero-Shot Spatial Reasoning

    Researchers have developed SpatioRoute, a novel method for enhancing zero-shot spatial reasoning in Vision-Language Models (VLMs). This approach dynamically routes incoming questions to tailored prompt templates without requiring additional training or 3D sensor data. SpatioRoute demonstrated consistent accuracy gains of up to 5% on the SQA3D benchmark, setting a new state-of-the-art for video-only spatial VQA. AI

    SPATIOROUTE: Dynamic Prompt Routing for Zero-Shot Spatial Reasoning

    IMPACT Enhances VLM capabilities in spatial reasoning, potentially improving applications requiring understanding of object relationships and scene context.

  22. A Tiny First-Call Checklist Before Trusting Any LLM Gateway

    A developer shared a concise checklist for evaluating new LLM gateways, emphasizing auditable first calls over pricing alone. The process involves verifying API keys, checking logs for model usage and costs, and testing error handling before proceeding to more complex features. This approach is particularly useful for gateways that route across multiple providers or integrate with less common models like Qwen or DeepSeek. AI

    IMPACT Provides a practical guide for developers integrating with LLM services, focusing on reliability and cost transparency.

  23. Qwen 3.7 🤖, Cursor Composer 2.5 👨‍💻, Anthropic acquires Stainless 🛠️

    Qwen has released version 3.7 of its language model, which features a specific circuit for political censorship that can be modified without losing factual knowledge. NVIDIA's Cosmos Predict 2.5 model can now be fine-tuned for robot video generation using efficient LoRA/DoRA methods. Additionally, the new HRM-Text model offers a more accessible and cost-effective approach to pre-training foundation models. AI

    Qwen 3.7 🤖, Cursor Composer 2.5 👨‍💻, Anthropic acquires Stainless 🛠️

    IMPACT New model releases and fine-tuning techniques offer improved control and accessibility for AI development.

  24. IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions

    Researchers have introduced IdioLink, a new benchmark designed to evaluate language models' ability to understand idiomatic expressions. The benchmark consists of over 10,000 documents and 2,000 queries, covering 107 idioms to test if models can link figurative language to its conceptual meaning. Current embedding models struggle with this task, often relying on topical cues rather than true semantic understanding, highlighting a significant gap in idiom-aware semantic retrieval. AI

    IMPACT IdioLink challenges current language models to go beyond literal meaning, pushing for deeper semantic understanding and potentially improving AI's grasp of nuanced human language.

  25. PLA Daily Translation: Reflections on Warfare Brought by AGI

    DeepSeek, a Chinese AI lab, is reportedly in discussions for a significant funding round of 70 billion yuan, with a Chinese state AI fund potentially contributing 10 billion yuan. This potential deal would transition the open-source lab from private backing to state-linked capital, serving as a test case for Beijing's involvement in the global AI race. The company continues to pursue its AGI ambitions despite acknowledging a substantial compute gap compared to US labs. AI

    PLA Daily Translation: Reflections on Warfare Brought by AGI

    IMPACT This funding could signal increased state support for China's AI ambitions and potentially accelerate its pursuit of AGI capabilities.

  26. ByteDance and HKUST researchers prove that traditional AI model training on OCR tasks hinders document work. Their MMProLong project shows that key

    Researchers at Nous Research have developed a new method called Contrastive Neuron Attribution (CNA) to identify and manipulate specific neurons within large language models that control refusal behavior. By targeting just 0.1% of these neurons, CNA can reduce harmful request refusal rates by over 50% in models like Llama and Qwen, while maintaining high output quality. This technique operates without requiring additional training or modification of model weights, and importantly, it reveals that the underlying neural structures for distinguishing harmful from benign prompts exist even in base models before alignment fine-tuning. AI

    IMPACT Enables precise control over LLM safety mechanisms, potentially leading to more robust alignment techniques and a deeper understanding of model behavior.

  27. Qwen Plays ̶p̶̶o̶̶k̶̶e̶̶m̶̶o̶̶n̶ ? / QWEN PLAYS DCSS! - qwen3.6-35b-a3b@q4_k_xl plays open source roguelike adventure DCSS (and does a decent job)

    The Qwen 3.5-35B model, in its non-MTP version, has demonstrated the ability to play the open-source roguelike game Dungeon Crawl Stone Soup (DCSS) effectively. While the MTP version of Qwen exhibited issues with tool calls, the standard version performed well, even on smaller quantized models. This capability is being explored as a benchmark for LLM performance beyond traditional benchmarks, with the model successfully navigating game levels, defeating enemies, and managing inventory. AI

    IMPACT Demonstrates LLM capability in complex, interactive environments, potentially leading to new benchmarking methods and applications beyond text generation.

  28. Hot To Run LLMs Locally

    This series of guides provides comprehensive instructions for setting up and running large language models (LLMs) locally on Linux systems. It details hardware and software prerequisites, recommends using llama.cpp for its balance of performance and ease of use, and covers model selection, quantization, and API integration. The guides also include steps for setting up systemd services for 24/7 operation, monitoring performance, and optimizing for various hardware constraints. AI

    IMPACT Enables developers to run and experiment with LLMs locally, reducing reliance on cloud services and facilitating custom application development.

  29. The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

    A new research paper reveals a significant shortcut in how small language models perform arithmetic tasks using chain-of-thought (CoT) prompting. Instead of relying on logical sequencing, these models tend to copy the number positioned just before the answer delimiter, regardless of the intermediate reasoning steps. This positional copying accounts for a large portion of their accuracy, even when the preceding steps are incorrect or shuffled, highlighting a potential failure mode in evaluating CoT faithfulness. AI

    IMPACT Reveals a critical flaw in evaluating arithmetic reasoning in small LLMs, suggesting current faithfulness evaluations may be misleading.

  30. Choosing an abliterated version of Gemma 4 31B and 26B-A4B

    New developments in local LLM inference are enhancing performance on consumer hardware. The BeeLlama v0.2.0 release, utilizing a DFlash update, significantly boosts token generation speeds for models like Qwen and Gemma on GPUs such as the RTX 3090, offering up to a 5x speedup. Additionally, ByteShape quantizations are improving Qwen model performance on laptops with limited VRAM, providing a notable speed increase. These advancements aim to make larger, more capable open-weight models practical for everyday local use. AI

    IMPACT Enhances local LLM inference performance, making larger models more accessible on consumer hardware.

  31. Qwen multi angle workflow

    A user on Reddit is seeking advice on how to achieve a "multi-angle workflow" using the Qwen model without the generated images appearing "plastic." The user is specifically asking for a workflow that avoids this common artifact in AI-generated imagery. AI

  32. Modal's Series C: Raising $355M at a $4.65B valuation

    Modal has secured $355 million in Series C funding, valuing the company at $4.65 billion post-money. The company has experienced significant growth, with annualized revenue surpassing $300 million and a fivefold increase in size since September. This funding will support Modal's mission to provide a cloud infrastructure specifically designed for AI workloads, offering elastic compute, safe isolation, and programmatic control for diverse applications. AI

    Modal's Series C: Raising $355M at a $4.65B valuation

    IMPACT Accelerates development of specialized cloud infrastructure for AI, potentially lowering costs and improving performance for AI workloads.

  33. Agentic Chunking and Bayesian De-chunking of AI Generated Fuzzy Cognitive Maps: A Model of the Thucydides Trap

    A new paper from Anthropic and research from arXiv explore the complex relationship between US and Chinese AI development, challenging the notion of a simple race. While the US currently leads in frontier AI, the research highlights deep interconnections in talent, research, and shared inspiration between the two nations' AI ecosystems. Despite geopolitical tensions and calls for export controls, collaboration remains significant, with both countries adopting algorithms and inspiration from each other. Public perception also differs, with China showing greater optimism towards AI compared to the US, a sentiment potentially rooted in historical economic transformations. AI

    Agentic Chunking and Bayesian De-chunking of AI Generated Fuzzy Cognitive Maps: A Model of the Thucydides Trap

    IMPACT Highlights the complex, collaborative nature of AI development between the US and China, challenging simplistic notions of a competitive race.

  34. Token Ledger Digest – 2026-05-20

    Several LLM providers have adjusted their pricing and model availability. Qwen saw mixed changes, with some variants increasing in price while others decreased, and new models like Qwen3.7 Max were introduced. Google's Gemini Flash Latest experienced a significant price hike, while Z.ai's GLM 5.1 became free. Additionally, Alibaba's Tongyi DeepResearch 30B A3B model was removed from catalogs, prompting users to seek alternatives. AI

    Token Ledger Digest – 2026-05-20

    IMPACT Operators should monitor LLM pricing changes and model availability for cost optimization and workflow continuity.

  35. Overcoming Situational Depression Via Generative AI Including Tapping Into ChatGPT

    Generative AI, including models like ChatGPT, Gemini, and Claude, is increasingly being explored for mental health support, particularly for situational depression. While these tools offer accessible, 24/7 assistance, they are not a replacement for human therapists and carry risks of dispensing inappropriate advice. Concurrently, the technical underpinnings of AI agents are being scrutinized, focusing on how they process information, potential biases, and the mechanisms behind brand mentions in their outputs. Developers are advised to understand core AI concepts like LLMs, tokens, and RAG before building agent frameworks, while new infrastructure is emerging to enable AI agents to interact with regulated financial markets. AI

    Overcoming Situational Depression Via Generative AI Including Tapping Into ChatGPT

    IMPACT Explores diverse applications of AI agents and LLMs, from mental health support to financial trading, highlighting technical considerations and potential risks.

  36. 🚀🚀Qwen3.7 Preview lands on Arena!

    Alibaba's Qwen team has released previews of their Qwen3.7-Max and Qwen3.7-Plus models. These new models are now available on the Arena platform for evaluation. The release positions Alibaba as a top-tier lab in both text and vision AI capabilities. AI

    IMPACT Positions Alibaba among top AI labs, potentially increasing competition in the frontier model space.

  37. Anthropic’s Mythos breach was humiliating

    Anthropic's highly capable cybersecurity AI model, Claude Mythos, was reportedly accessed by unauthorized users shortly after its limited preview began. The breach occurred through a combination of insider knowledge from a contractor and information from a separate data leak, rather than a sophisticated hack. This incident raises concerns about supply chain security and Anthropic's ability to manage access to its most powerful, potentially dangerous AI systems, despite its strong emphasis on AI safety. AI

    Anthropic’s Mythos breach was humiliating

    IMPACT Highlights critical supply chain vulnerabilities in AI safety protocols, potentially impacting enterprise trust and the rollout of powerful AI models.

  38. Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

    Qwen has released Qwen3.6-27B, a dense 27-billion-parameter multimodal model designed for advanced coding tasks. This model aims to provide flagship-level agentic coding performance, surpassing previous open-source models in this category. Various community members have already made different quantized versions of Qwen3.6-27B available on Hugging Face, facilitating its use across different platforms and libraries. AI

    Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

    IMPACT Sets a new benchmark for dense coding models, potentially influencing future development in agentic AI and code generation.

  39. SciEval: A Benchmark for Automatic Evaluation of K-12 Science Instructional Materials

    A new paper proposes that LLM hallucinations stem not from a lack of knowledge, but from a failure in commitment, where models disperse probability mass across alternatives instead of concentrating on the correct answer. This phenomenon is observed to increase with model scale and is exacerbated by instruction tuning. Another paper introduces GAMMA, a framework for mixed-precision quantization that optimizes bit allocation for LLMs, significantly improving accuracy under memory constraints and outperforming existing methods on Llama and Qwen models. Additionally, a benchmark called SciEval has been developed to automatically evaluate K-12 science instructional materials, revealing that current mainstream LLMs perform poorly on this task without domain-specific fine-tuning. AI

    SciEval: A Benchmark for Automatic Evaluation of K-12 Science Instructional Materials

    IMPACT New research sheds light on LLM hallucination mechanisms and introduces novel methods for model optimization and evaluation, potentially improving reliability and efficiency.

  40. GPU VRAM only for small models with llama.cpp: is it possible?

    A user on the r/LocalLLaMA subreddit is seeking assistance with optimizing their GPU VRAM usage for running smaller language models. Despite successfully running larger models like Gemma4 26B and Qwen 3.6 35B MoEs, they are encountering issues with smaller models like Gemma4-2B still utilizing system RAM. The user has experimented with various command-line options for llama.cpp but has not yet achieved full VRAM utilization without relying on host memory. AI

  41. Thinking about running AI models like Llama 3, Qwen, or Mistral on your own computer? Two of the best local AI tools in 2026 are Ollama and LM Studio. Both tool

    Creators are increasingly adopting local AI solutions in 2026, moving away from cloud-based services for benefits like unlimited usage, enhanced privacy, faster workflows, and lower long-term costs. Tools such as Ollama, LM Studio, and Open-WebUI are making it easier for beginners to run powerful open-source models like Llama 3, Qwen, and Mistral directly on their personal computers. This shift offers users full control over their data and content creation processes, with some even developing portable AI solutions that run entirely offline from a USB stick. AI

    Thinking about running AI models like Llama 3, Qwen, or Mistral on your own computer? Two of the best local AI tools in 2026 are Ollama and LM Studio. Both tool

    IMPACT Accelerates adoption of personal AI infrastructure, offering cost-effective and private alternatives to cloud-based LLM services.

  42. New Unsloth API Inference Endpoint

    Unsloth has released a new API inference endpoint that allows users to run local large language models with enhanced features. This endpoint supports both Anthropic-compatible and OpenAI-compatible dialects, enabling seamless integration with various AI agents and chat clients. The update also introduces new models like NVIDIA Nemotron 3 Nano Omni and Mistral 3.5 Medium, alongside several bug fixes and improvements to the Unsloth Studio. AI

    New Unsloth API Inference Endpoint

    IMPACT Enables easier local deployment and integration of various LLMs with enhanced features like self-healing tool calling and code execution.

  43. FlashQLA: CP-/Bwd-Friendly Fused Linear Attention Kernels for GDN

    Qwen has developed FlashQLA, a new set of fused linear attention kernels designed to be compatible with both forward and backward passes in deep learning. These kernels are optimized for Gated Delta Networks (GDN), which are now a core component in Qwen's model family, including Qwen3-Next and its subsequent iterations like Qwen3.5 and Qwen3.6. The development aims to improve efficiency and scalability for large models with extended context windows. AI

    FlashQLA: CP-/Bwd-Friendly Fused Linear Attention Kernels for GDN

    IMPACT Optimizes attention mechanisms for large language models, potentially improving training and inference efficiency for Qwen's model family.

  44. froggeric/Qwen-Fixed-Chat-Templates

    A Hugging Face model repository, froggeric/Qwen-Fixed-Chat-Templates, has been updated with significant improvements to its chat templates for Qwen 3.5 and 3.6 models. These updates address issues such as "empty think" poisoning, system prompt logic traps, and KV cache inconsistencies. The changes aim to enhance the model's ability to use tools, transition between thinking and conversational responses, and maintain a consistent memory during multi-step processes. AI

    IMPACT Fixes to chat templates improve Qwen model reliability and tool usage, potentially enhancing agentic capabilities.

  45. 🚀Qwen3.7-Max just landed at 56.6 on the Artificial Analysis Intelligence Index — a solid 4.8pt jump over Qwen3.6-Max-Preview. @ArtificialAnlys

    Alibaba's Qwen has released Qwen3.7-Max, a new flagship model designed for the Agent Era. This model demonstrates significant improvements in scientific reasoning, coding, and agentic capabilities, achieving a score of 56.6 on the Artificial Analysis Intelligence Index. Qwen3.7-Max also showcases enhanced performance in autonomous execution and generalization across various benchmarks, with features like implicit caching now live. AI

    IMPACT Sets a new benchmark for agentic capabilities and reasoning, potentially accelerating the development of autonomous AI systems.

  46. Patch release: v5.5.2

    Hugging Face's `transformers` library has seen a series of releases and patches, introducing new models and fixing various bugs. Notably, version 5.9.0 added Cohere's Command A+ (Cohere2Moe) and HRM-Text, while also improving audio support and generation capabilities. Earlier releases, such as v5.8.0, integrated models like DeepSeek-V4, Gemma 4 Assistant, GraniteSpeechPlus, Granite4Vision, EXAONE 4.5, and PP-FormulaNet. Several patch releases have addressed specific issues, including problems with DeepSeek V4 integration, flash attention, Qwen MoE models with FP8, and Gemma4 device map support. AI

    Patch release: v5.5.2

    IMPACT New model integrations and bug fixes in a widely used library accelerate research and development across the AI ecosystem.

  47. 📣We're calling for ambassadors!

    Alibaba's Qwen team is seeking ambassadors to join their community. They are looking for individuals with strong technical skills or local community leadership experience. Selected ambassadors will receive early access to resources and opportunities. AI

    📣We're calling for ambassadors!
  48. Qwen3.5-Omni: Scaling Up, Toward Native Omni-Modal AGI

    Alibaba's Qwen team has released Qwen3.5-Omni, a new generation of omnimodal large language models capable of processing text, images, audio, and audio-visual content. This series features models named Plus, Flash, and Light, all supporting a 256k context window and capable of handling over 10 hours of audio. The architecture utilizes a Hybrid-Attention Mixture-of-Experts (MoE) approach for both its reasoning and generation components. AI

    Qwen3.5-Omni: Scaling Up, Toward Native Omni-Modal AGI

    IMPACT Expands LLM capabilities into native audio and video processing, potentially enabling more sophisticated AI agents and applications.

  49. Together AI expands fine-tuning service with tool calling, reasoning, and vision support

    Together AI has enhanced its fine-tuning service to better support advanced AI workflows. The update includes native support for tool call, reasoning, and vision-language model fine-tuning, addressing common issues like unreliable tool execution and degraded reasoning in complex interactions. These improvements aim to increase iteration speed and accuracy for AI teams building agentic applications, with enhanced throughput and larger dataset handling for models up to 1T parameters. AI

    IMPACT Enables more reliable and efficient fine-tuning of AI agents, potentially accelerating the development of complex AI applications.