Brief

last 24h

[50/55] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

SIGNIFICANT · dev.to — LLM tag English(EN) · 4h

Llama 4: Meta's Latest — Scout, Maverick, and the MoE Revolution

Meta has released Llama 4 in April 2025, featuring a new Mixture of Experts (MoE) architecture. Two variants, Scout and Maverick, are available, with Scout serving as a balanced default and Maverick offering broader knowledge for specialized tasks. Both models leverage MoE to activate approximately 17 billion parameters per token, enabling high performance comparable to much larger models while remaining runnable on consumer hardware. AI

IMPACT Sets a new standard for locally runnable large models, potentially accelerating adoption of advanced AI capabilities on consumer hardware.
- Meta
- Mixture of Experts
- Qwen
- Ollama
- RTX 4090
- Llama 4
- DeepSeek-R1
- Scout
- Maverick
SIGNIFICANT · SCMP — Tech English(EN) · 9h · [2 sources]

Alibaba’s Qwen catches up with ‘Sharif speed’ to help forge Pakistan deal

Alibaba Chairman Joe Tsai utilized the company's Qwen AI tool to rapidly draft a strategic technology partnership agreement with Pakistan's Prime Minister Shehbaz Sharif. Sharif, known for his swift approach to development, requested the comprehensive pact during a visit to Alibaba's headquarters. The agreement, facilitated by Qwen's generative AI capabilities, covers areas such as AI infrastructure, cloud computing, healthcare, e-commerce, and digital payments, aiming to accelerate Pakistan's digital economy. AI

IMPACT Demonstrates AI's potential to accelerate international business and policy agreements, streamlining complex negotiations.
- Qwen
- Alibaba
- Pakistan
- Shehbaz Sharif
- Joe Tsai
TOOL · Mastodon — sigmoid.social Deutsch(DE) · 10h

RT @TeksEdge: 🚀 New MTP support for Strix Halo released! more on Arint.info # AI # AMD # MTP # Qwen # ROCm # StrixHalo # arint_info https://x.com/

Arint.info has announced new support for Strix Halo, a significant development for AI hardware acceleration. This update integrates MTP (Multi-Threaded Processing) capabilities, enhancing performance for AI workloads. The announcement highlights compatibility with Qwen and ROCm, indicating a focus on optimizing deep learning tasks on AMD hardware. AI

IMPACT Enhances AI hardware performance by enabling MTP support for Strix Halo, potentially improving deep learning task efficiency.
- AMD
- Qwen
- Arint.info
- Strix Halo
- ROCm
TOOL · dev.to — LLM tag English(EN) · 16h

How We Built Dynamic NPC Dialogue with LLMs — Lessons from Early Access

Vantage Digital Labs has developed an LLM-powered engine for dynamic NPC dialogue in video games, moving beyond static, pre-written lines. Their architecture involves a context builder, LLM API, response parser, and memory system, with a focus on prompt engineering over model size for cost-effectiveness. Key lessons learned include prioritizing response parsing and low latency, with smaller models like DeepSeek and Qwen proving viable for indie games. AI

IMPACT Enables more interactive and responsive non-player characters in games, potentially enhancing player immersion.
TOOL · Medium — Claude tag English(EN) · 1d

Chinese LLMs Top Every Agentic Benchmark. Production Teams Pick Sonnet Anyway.

A new benchmark evaluating LLMs on agentic tasks reveals that Chinese models like Qwen and Kimi outperform others. However, production teams often still prefer Anthropic's Claude Sonnet for real-world applications. This suggests a gap between theoretical performance on specific benchmarks and practical utility in development environments. AI

IMPACT Highlights a discrepancy between benchmark performance and real-world utility, influencing model selection for production.
SIGNIFICANT · MarkTechPost English(EN) · 5d · [3 sources]

Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency

Alibaba's Qwen team has released Qwen3.5-LiveTranslate-Flash, a real-time multimodal translation model that significantly reduces latency to 2.8 seconds. This new model expands language support to 60 input languages and 29 output languages, while also incorporating visual cues like lip movements to improve accuracy in noisy environments. A standout feature is its ability to clone the original speaker's voice in real-time for translated output, creating a more natural listening experience. AI

IMPACT Enhances real-time multilingual communication by reducing latency and improving accuracy through multimodal input and voice cloning.
SIGNIFICANT · 量子位 (QbitAI) 中文(ZH) · 6d

Qwen's latest 3.7 Max preview version lands! Two generations of ultra-large cups iterate in parallel, Lin Junyang has left but is still accelerating

Alibaba's Qwen team has released preview versions of its Qwen 3.7 Max and Qwen 3.7 Plus models, showcasing rapid iteration cycles. The Qwen 3.7 Max model has achieved top rankings among Chinese models in text-based benchmarks on Arena, placing 13th overall and within the top ten for specific categories like math and coding. The Qwen 3.7 Plus model also performed strongly in visual benchmarks, securing the top spot for Chinese models in that domain. AI

IMPACT Accelerates the pace of frontier model development and competition among leading AI labs globally.
TOOL · dev.to — LLM tag Deutsch(DE) · 2d

Qwen 3.6 & 2.5: The Most Versatile Local Models

Alibaba Cloud's Qwen models are highlighted as versatile open-source options in mid-2026, offering a range of sizes from 0.5B to 72B parameters. Qwen 3.6 and 2.5 boast impressive features like a 262K context window, strong tool-calling capabilities, and an Apache 2.0 license for commercial use. The models are easily accessible via Ollama, with specific recommendations based on available VRAM, and are presented as competitive local alternatives to models like GPT-4o and DeepSeek-R1, particularly for tasks requiring long context or function calling. AI

IMPACT Provides powerful, locally runnable open-source models with long context capabilities, reducing reliance on cloud APIs for certain tasks.
- GPT-4o
- Qwen
- Ollama
- Alibaba Cloud
- Llama 4
- Qwen 2.5
- Qwen 3.6
- DeepSeek-R1
RESEARCH · SCMP — Tech English(EN) · 4d

Alibaba signals next phase of AI growth from investment to commercialisation

Alibaba is transitioning its AI efforts from initial investment to full-scale commercialization, aiming to become China's leading full-stack AI provider. The company projects 30 billion yuan in AI revenue by 2026, with AI agents expected to account for over half of its cloud sales. Alibaba's comprehensive AI ecosystem includes its own T-Head chips, cloud infrastructure, model-as-a-service platforms, and the Qwen foundation models, alongside consumer products like the Qwen app and the Wukong enterprise agent platform. AI

IMPACT Alibaba's strategic shift to AI commercialization and projected revenue targets signal a major push in the Chinese AI market.
- T-Head
- Qwen
- Alibaba
- Wukong
- Joe Tsai
- Liu Weiguang
- Eddie Wu Yongming
TOOL · LessWrong (AI tag) Español(ES) · 4d

Why does off-model SFT degrade capabilities?

Researchers have found that Supervised Fine-Tuning (SFT) using outputs from a different AI model can significantly degrade the capabilities of the trained model. This degradation appears to be linked to the model adopting an unfamiliar reasoning style that it struggles to utilize effectively. The issue is not necessarily due to imitating a less capable teacher model, as degradation occurs even when the teacher is superior. Fortunately, this performance drop seems to be a shallow property, as a small amount of training to restore the original reasoning style can recover most of the lost performance. AI

IMPACT Understanding how off-model SFT impacts AI capabilities is crucial for developing safer and more aligned AI systems.
- AI
- GPT-5.5
- Claude Opus 4.7
- Qwen
- SFT
SIGNIFICANT · Mastodon — sigmoid.social Italiano(IT) · 2d

🧠 Qwen presented 3.7-Max, a model designed for the era of autonomous AI agents, with a focus on prolonged task execution.

Qwen has launched Qwen 3.7-Max, a new AI model specifically engineered for autonomous agents. This model is designed to handle complex, long-duration tasks, marking a step forward in AI agent capabilities. The release emphasizes the model's potential for extended operational sequences. AI

IMPACT Enables more sophisticated and prolonged autonomous agent operations.
- Qwen
- Qwen 3.7-Max
TOOL · Mastodon — fosstodon.org Italiano(IT) · 1d

Returning from a trip almost always means finding yourself with an unmanageable amount of photos. In the case of Lisbon, the problem wasn't so much archiving the boxes

A developer created an AI tool to automatically select the best photos from a trip, addressing the challenge of curating a large number of images into a shareable album. The application uses PhotoPrism to access image thumbnails and Ollama to run AI models. Initially, the AI focused on aesthetic scoring, but this led to monotonous selections. The tool was improved to cluster images based on semantic similarity, ensuring variety in the final album by selecting top photos from different clusters. AI

IMPACT Automates photo curation, potentially improving user experience for managing large image libraries.
- Qwen
- Ollama
- PhotoPrism
TOOL · dev.to — LLM tag English(EN) · 6d

Local LLMs in Production: Squeezing Qwen to Match Claude

A developer details their experience optimizing local LLMs for production use, aiming to replicate the performance of cloud-based models like Claude 3.5 Sonnet. They found that certain Qwen models, while powerful, exhibited an unhelpful "thinking out loud" behavior that hindered their specific use case of generating clean JSON. After experimenting with different Qwen versions and prompt engineering techniques, they settled on Qwen2.5-32B-Instruct-fp8, which offered significantly faster response times compared to Claude 3.5 Sonnet for routine tasks. AI

IMPACT Demonstrates techniques for improving local LLM performance and reducing reliance on costly cloud APIs for routine tasks.
RESEARCH · Mastodon — fosstodon.org 한국어(KO) · 6d · [3 sources]

EyeingAI (@EyeingAI) points out that while AI-generated tools are improving, the workflow for managing assets remains complex, and notes that Renoise Canvas aims to provide an integrated canvas for managing characters, scenes, references, versions, images, and videos on a single screen.

Qwen 3.7 has been released, marking an update to the Qwen model series, though specific performance details are not yet available. Separately, QuiverAI's Arrow 1.1 can convert fashion sketches into editable SVGs, focusing on practical vector design generation. Additionally, Renoise Canvas aims to streamline asset management for AI-generated content by offering a unified interface for characters, scenes, and various media types. AI

IMPACT These updates offer incremental improvements in model capabilities, design tool functionality, and asset management workflows for AI-generated content.
- Qwen
- QuiverAI
- Arrow 1.1
- Renoise Canvas
- Qwen 3.7
TOOL · LessWrong (AI tag) English(EN) · 6d

AI emotions and aligned behavior

A researcher explored AI safety by investigating the potential for emotional nudges to influence model behavior, drawing parallels to human psychology. The study suggests that models, like humans, exhibit internal states that drive actions and can be influenced by emotional cues. This approach aims to incentivize ethical actions and disincentivize unethical ones by manipulating the emotional stakes of decision-making, rather than relying solely on alignment or control mechanisms. AI

IMPACT Suggests a novel approach to AI safety by leveraging emotional nudges, potentially influencing future model development and alignment strategies.
RESEARCH · Forbes — Innovation English(EN) · 4d

Airbnb CEO Brian Chesky Called Chinese AI Fast And Cheap. Now, Congress Wants Answers

Airbnb CEO Brian Chesky is facing scrutiny from U.S. lawmakers regarding the company's use of Chinese AI models, specifically Alibaba's Qwen. Chesky defended the practice, stating that Airbnb primarily uses open-source models and does not share data with Chinese companies, arguing that concerns about data access are a misunderstanding of the technology. This situation highlights the growing tension between U.S. national security interests and the availability of cost-effective AI solutions from China, as evidenced by a recent bipartisan bill aimed at promoting American technology procurement among allies. AI

IMPACT Highlights geopolitical tensions in AI development and the trade-offs between cost-effectiveness and national security for AI adoption.
RESEARCH · dev.to — LLM tag English(EN) · 1w · [2 sources]

267 tok/s local inference on RTX 5090 – llama.cpp MTP + Qwen3-35B-A3B MoE

Recent developments in local LLM inference focus on optimizing performance and VRAM usage for models like Qwen 3.6 and 3.5. One approach involves detailed backend comparisons for Qwen 3.6 27B on consumer GPUs, identifying optimal quantization and processing settings for high token counts. Another key technique is quantizing the Multi-token Prediction (MTP) KV cache, which significantly reduces VRAM demands for Qwen models without sacrificing quality. Additionally, a new local-first UI called MemoTree has been developed to improve context management for Ollama users, offering a branching chat interface. AI

IMPACT Optimizations for local LLM inference, particularly for Qwen models, enable more powerful AI capabilities on consumer hardware.
- RTX 5090
- Claude Haiku
- Qwen3-35B-A3B
- llama.cpp
- MemoTree
- Ollama
- RTX 3090
- Qwen
TOOL · r/StableDiffusion English(EN) · 19h

UPDATE corrections and visual update of my web UI using comfy backend.

A user has released an updated web interface for the Comfy backend, designed to streamline workflows for Stable Diffusion and other image generation models. The interface now supports predefined templates for various models including SDXL, Illustrous, FLUX, and QWEN, and integrates with LTX 2.3 Director. Users can import or edit nodes directly, and the interface includes additional features like upscaling and background removal. AI

IMPACT Enhances user experience for AI image generation tools, offering more streamlined workflows and broader model compatibility.
COMMENTARY · dev.to — LLM tag English(EN) · 3d

Qwen3.7 Max vs Open-Weight LLMs: Practical Migration Notes

The author discusses practical considerations for migrating inference workloads from closed LLM APIs to open-weight models, driven by cost, data sensitivity, and latency concerns. They highlight Qwen as a strong contender with a rapid release cycle, alongside other notable models like Llama, DeepSeek, and Mistral. The article provides code examples demonstrating how to adapt existing OpenAI SDK calls to interface with self-hosted models via compatible API endpoints, such as those offered by vLLM. AI

IMPACT Provides practical guidance for developers and organizations considering the shift to self-hosted open-weight LLMs.
- OpenAI
- GPT-4o
- Meta
- DeepSeek
- Qwen
- Llama
- vLLM
- Qwen2.5-32B-Instruct
- Qwen3.7 Max
SIGNIFICANT · dev.to — MCP tag English(EN) · 5d · [4 sources]

Google AI Edge Gallery Just Added MCP. Here's What On-Device Agents Can Actually Do Now

Google has updated its AI Edge Gallery app to support the Model Context Protocol (MCP) on Android devices, enabling on-device AI agents. This update allows LLMs like Gemma 4 to run entirely locally, enhancing privacy and reducing latency by keeping all processing and data on the user's phone. The app now supports agent skills, calendar integration, and persistent chat history, moving it from a simple model playground to a functional on-device agent runtime. AI

IMPACT Enables more private and capable AI agents to run directly on mobile devices.
TOOL · arXiv cs.CV English(EN) · 1w

SPATIOROUTE: Dynamic Prompt Routing for Zero-Shot Spatial Reasoning

Researchers have developed SpatioRoute, a novel method for enhancing zero-shot spatial reasoning in Vision-Language Models (VLMs). This approach dynamically routes incoming questions to tailored prompt templates without requiring additional training or 3D sensor data. SpatioRoute demonstrated consistent accuracy gains of up to 5% on the SQA3D benchmark, setting a new state-of-the-art for video-only spatial VQA. AI

IMPACT Enhances VLM capabilities in spatial reasoning, potentially improving applications requiring understanding of object relationships and scene context.
COMMENTARY · dev.to — LLM tag English(EN) · 4d

A Tiny First-Call Checklist Before Trusting Any LLM Gateway

A developer shared a concise checklist for evaluating new LLM gateways, emphasizing auditable first calls over pricing alone. The process involves verifying API keys, checking logs for model usage and costs, and testing error handling before proceeding to more complex features. This approach is particularly useful for gateways that route across multiple providers or integrate with less common models like Qwen or DeepSeek. AI

IMPACT Provides a practical guide for developers integrating with LLM services, focusing on reliability and cost transparency.
- DeepSeek
- Qwen
- AnLink API
- LLM
RESEARCH · TLDR AI English(EN) · 6d

Qwen 3.7 🤖, Cursor Composer 2.5 👨‍💻, Anthropic acquires Stainless 🛠️

Qwen has released version 3.7 of its language model, which features a specific circuit for political censorship that can be modified without losing factual knowledge. NVIDIA's Cosmos Predict 2.5 model can now be fine-tuned for robot video generation using efficient LoRA/DoRA methods. Additionally, the new HRM-Text model offers a more accessible and cost-effective approach to pre-training foundation models. AI

IMPACT New model releases and fine-tuning techniques offer improved control and accessibility for AI development.
- NVIDIA
- Anthropic
- xAI
- Langchain
- Grok
- Qwen
- LoRA
- DoRA
- Cosmos Predict 2.5
- Qwen 3.7
- HRM-Text
RESEARCH · arXiv cs.CL English(EN) · 4d · [2 sources]

IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions

Researchers have introduced IdioLink, a new benchmark designed to evaluate language models' ability to understand idiomatic expressions. The benchmark consists of over 10,000 documents and 2,000 queries, covering 107 idioms to test if models can link figurative language to its conceptual meaning. Current embedding models struggle with this task, often relying on topical cues rather than true semantic understanding, highlighting a significant gap in idiom-aware semantic retrieval. AI

IMPACT IdioLink challenges current language models to go beyond literal meaning, pushing for deeper semantic understanding and potentially improving AI's grasp of nuanced human language.
SIGNIFICANT · LessWrong (AI tag) English(EN) · 3d · [2 sources]

PLA Daily Translation: Reflections on Warfare Brought by AGI

DeepSeek, a Chinese AI lab, is reportedly in discussions for a significant funding round of 70 billion yuan, with a Chinese state AI fund potentially contributing 10 billion yuan. This potential deal would transition the open-source lab from private backing to state-linked capital, serving as a test case for Beijing's involvement in the global AI race. The company continues to pursue its AGI ambitions despite acknowledging a substantial compute gap compared to US labs. AI

IMPACT This funding could signal increased state support for China's AI ambitions and potentially accelerate its pursuit of AGI capabilities.
RESEARCH · Mastodon — sigmoid.social Polski(PL) · 2d · [4 sources]

ByteDance and HKUST researchers prove that traditional AI model training on OCR tasks hinders document work. Their MMProLong project shows that key

Researchers at Nous Research have developed a new method called Contrastive Neuron Attribution (CNA) to identify and manipulate specific neurons within large language models that control refusal behavior. By targeting just 0.1% of these neurons, CNA can reduce harmful request refusal rates by over 50% in models like Llama and Qwen, while maintaining high output quality. This technique operates without requiring additional training or modification of model weights, and importantly, it reveals that the underlying neural structures for distinguishing harmful from benign prompts exist even in base models before alignment fine-tuning. AI

IMPACT Enables precise control over LLM safety mechanisms, potentially leading to more robust alignment techniques and a deeper understanding of model behavior.
TOOL · r/LocalLLaMA English(EN) · 1d

Qwen Plays ̶p̶̶o̶̶k̶̶e̶̶m̶̶o̶̶n̶ ? / QWEN PLAYS DCSS! - qwen3.6-35b-a3b@q4_k_xl plays open source roguelike adventure DCSS (and does a decent job)

The Qwen 3.5-35B model, in its non-MTP version, has demonstrated the ability to play the open-source roguelike game Dungeon Crawl Stone Soup (DCSS) effectively. While the MTP version of Qwen exhibited issues with tool calls, the standard version performed well, even on smaller quantized models. This capability is being explored as a benchmark for LLM performance beyond traditional benchmarks, with the model successfully navigating game levels, defeating enemies, and managing inventory. AI

IMPACT Demonstrates LLM capability in complex, interactive environments, potentially leading to new benchmarking methods and applications beyond text generation.
TOOL · dev.to — LLM tag English(EN) · 4d · [35 sources]

Hot To Run LLMs Locally

This series of guides provides comprehensive instructions for setting up and running large language models (LLMs) locally on Linux systems. It details hardware and software prerequisites, recommends using llama.cpp for its balance of performance and ease of use, and covers model selection, quantization, and API integration. The guides also include steps for setting up systemd services for 24/7 operation, monitoring performance, and optimizing for various hardware constraints. AI

IMPACT Enables developers to run and experiment with LLMs locally, reducing reliance on cloud services and facilitating custom application development.
- Qwen2.5-coder
- Claude API
- Llama-3
- OpenAI API
- Ollama
- VS Code
- Large Language Models
- Cursor
- Continue.dev
- NVIDIA GPU
- RTX 4090
- DeepSeek-R1
- RTX 3090
- Qwen 2.5
- Apple Silicon
- NVIDIA RTX 3060
- Mac
- llama.cpp
- Mistral-7B
- Ubuntu
- CPU
- RAM
- VRAM
- Linux
- RTX 3060
- Q4_K_M
- Q5_K_M
- NVIDIA
- Llama 2
- Qwen
- CodeLlama
- Phi-3
- Q8_0
- AMD
RESEARCH · Hugging Face Daily Papers English(EN) · 5d · [2 sources]

The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

A new research paper reveals a significant shortcut in how small language models perform arithmetic tasks using chain-of-thought (CoT) prompting. Instead of relying on logical sequencing, these models tend to copy the number positioned just before the answer delimiter, regardless of the intermediate reasoning steps. This positional copying accounts for a large portion of their accuracy, even when the preceding steps are incorrect or shuffled, highlighting a potential failure mode in evaluating CoT faithfulness. AI

IMPACT Reveals a critical flaw in evaluating arithmetic reasoning in small LLMs, suggesting current faithfulness evaluations may be misleading.
TOOL · r/LocalLLaMA English(EN) · 2d · [5 sources]

Choosing an abliterated version of Gemma 4 31B and 26B-A4B

New developments in local LLM inference are enhancing performance on consumer hardware. The BeeLlama v0.2.0 release, utilizing a DFlash update, significantly boosts token generation speeds for models like Qwen and Gemma on GPUs such as the RTX 3090, offering up to a 5x speedup. Additionally, ByteShape quantizations are improving Qwen model performance on laptops with limited VRAM, providing a notable speed increase. These advancements aim to make larger, more capable open-weight models practical for everyday local use. AI

IMPACT Enhances local LLM inference performance, making larger models more accessible on consumer hardware.
- r/LocalLLaMA
- Gemma 4 31B
- Gemma4-26B-A4B
- llmfan46
- Qwen
- Gemma
- Qwen3.6-35B-A3B
- llama.cpp
- LLaMA 3.1
- Ollama
- RTX 3090
- BeeLlama
- ByteShape
MEME · r/StableDiffusion English(EN) · 11h

Qwen multi angle workflow

A user on Reddit is seeking advice on how to achieve a "multi-angle workflow" using the Qwen model without the generated images appearing "plastic." The user is specifically asking for a workflow that avoids this common artifact in AI-generated imagery. AI
- Qwen
- StableDiffusion
RESEARCH · Modal blog English(EN) · 4d

Modal's Series C: Raising $355M at a $4.65B valuation

Modal has secured $355 million in Series C funding, valuing the company at $4.65 billion post-money. The company has experienced significant growth, with annualized revenue surpassing $300 million and a fivefold increase in size since September. This funding will support Modal's mission to provide a cloud infrastructure specifically designed for AI workloads, offering elastic compute, safe isolation, and programmatic control for diverse applications. AI

IMPACT Accelerates development of specialized cloud infrastructure for AI, potentially lowering costs and improving performance for AI workloads.
- DeepSeek
- Redpoint
- Modal
- Suno
- Qwen
- SGLang
- Ramp
- vLLM
- DoorDash
- General Catalyst
- Bain Capital Ventures
- Accel
- Chai Discovery
- Menlo
RESEARCH · arXiv cs.CL English(EN) · 1w · [19 sources]

Agentic Chunking and Bayesian De-chunking of AI Generated Fuzzy Cognitive Maps: A Model of the Thucydides Trap

A new paper from Anthropic and research from arXiv explore the complex relationship between US and Chinese AI development, challenging the notion of a simple race. While the US currently leads in frontier AI, the research highlights deep interconnections in talent, research, and shared inspiration between the two nations' AI ecosystems. Despite geopolitical tensions and calls for export controls, collaboration remains significant, with both countries adopting algorithms and inspiration from each other. Public perception also differs, with China showing greater optimism towards AI compared to the US, a sentiment potentially rooted in historical economic transformations. AI

IMPACT Highlights the complex, collaborative nature of AI development between the US and China, challenging simplistic notions of a competitive race.
- Elon Musk
- Anthropic
- Jensen Huang
- AI
- China
- ChatGPT
- DeepSeek
- U.S.
- Qwen
- United States
- KPMG Australia
- Xi Jinping
- arXiv
- University of Queensland
- Donald Trump
- Stanford University
- National Science Board
TOOL · dev.to — LLM tag Dansk(DA) · 5d · [4 sources]

Token Ledger Digest – 2026-05-20

Several LLM providers have adjusted their pricing and model availability. Qwen saw mixed changes, with some variants increasing in price while others decreased, and new models like Qwen3.7 Max were introduced. Google's Gemini Flash Latest experienced a significant price hike, while Z.ai's GLM 5.1 became free. Additionally, Alibaba's Tongyi DeepResearch 30B A3B model was removed from catalogs, prompting users to seek alternatives. AI

IMPACT Operators should monitor LLM pricing changes and model availability for cost optimization and workflow continuity.
COMMENTARY · Forbes — Innovation English(EN) · 4d · [15 sources]

Overcoming Situational Depression Via Generative AI Including Tapping Into ChatGPT

Generative AI, including models like ChatGPT, Gemini, and Claude, is increasingly being explored for mental health support, particularly for situational depression. While these tools offer accessible, 24/7 assistance, they are not a replacement for human therapists and carry risks of dispensing inappropriate advice. Concurrently, the technical underpinnings of AI agents are being scrutinized, focusing on how they process information, potential biases, and the mechanisms behind brand mentions in their outputs. Developers are advised to understand core AI concepts like LLMs, tokens, and RAG before building agent frameworks, while new infrastructure is emerging to enable AI agents to interact with regulated financial markets. AI

IMPACT Explores diverse applications of AI agents and LLMs, from mental health support to financial trading, highlighting technical considerations and potential risks.
- OpenAI
- Grok
- ChatGPT
- Claude
- Gemini
- Generative AI
- LangChain
- Google
- AI agents
- Perplexity
- LLMs
- Qwen
- LangGraph
- CrewAI
- Hashlock Markets
SIGNIFICANT · X — Qwen (Alibaba) Nederlands(NL) · 1w

🚀🚀Qwen3.7 Preview lands on Arena!

Alibaba's Qwen team has released previews of their Qwen3.7-Max and Qwen3.7-Plus models. These new models are now available on the Arena platform for evaluation. The release positions Alibaba as a top-tier lab in both text and vision AI capabilities. AI

IMPACT Positions Alibaba among top AI labs, potentially increasing competition in the frontier model space.
SIGNIFICANT · The Verge — AI English(EN) · 1mo · [22 sources]

Anthropic’s Mythos breach was humiliating

Anthropic's highly capable cybersecurity AI model, Claude Mythos, was reportedly accessed by unauthorized users shortly after its limited preview began. The breach occurred through a combination of insider knowledge from a contractor and information from a separate data leak, rather than a sophisticated hack. This incident raises concerns about supply chain security and Anthropic's ability to manage access to its most powerful, potentially dangerous AI systems, despite its strong emphasis on AI safety. AI

IMPACT Highlights critical supply chain vulnerabilities in AI safety protocols, potentially impacting enterprise trust and the rollout of powerful AI models.
- Anthropic
- OpenAI
- ExploitBench
- Claude Mythos
- GPT-5.5
- CLAUDE.md
- Claude Code
- CMU
- The Verge
- Bloomberg
- Mercor
- Qwen
- The AI Report
- Qwen3.5-27B
- Qwen-Scope
FRONTIER RELEASE · Qwen tech blog English(EN) · 1mo · [17 sources]

Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

Qwen has released Qwen3.6-27B, a dense 27-billion-parameter multimodal model designed for advanced coding tasks. This model aims to provide flagship-level agentic coding performance, surpassing previous open-source models in this category. Various community members have already made different quantized versions of Qwen3.6-27B available on Hugging Face, facilitating its use across different platforms and libraries. AI

IMPACT Sets a new benchmark for dense coding models, potentially influencing future development in agentic AI and code generation.
RESEARCH · arXiv cs.AI English(EN) · 3w · [5 sources]

SciEval: A Benchmark for Automatic Evaluation of K-12 Science Instructional Materials

A new paper proposes that LLM hallucinations stem not from a lack of knowledge, but from a failure in commitment, where models disperse probability mass across alternatives instead of concentrating on the correct answer. This phenomenon is observed to increase with model scale and is exacerbated by instruction tuning. Another paper introduces GAMMA, a framework for mixed-precision quantization that optimizes bit allocation for LLMs, significantly improving accuracy under memory constraints and outperforming existing methods on Llama and Qwen models. Additionally, a benchmark called SciEval has been developed to automatically evaluate K-12 science instructional materials, revealing that current mainstream LLMs perform poorly on this task without domain-specific fine-tuning. AI

IMPACT New research sheds light on LLM hallucination mechanisms and introduces novel methods for model optimization and evaluation, potentially improving reliability and efficiency.
- SciEval
- Qwen
- Gemini
- Qwen3
- GPT
- Llama
- LLMs
- generative AI
- K-12
- EQuIP rubric
- GAMMA
- LLM
MEME · r/LocalLLaMA English(EN) · 1d

GPU VRAM only for small models with llama.cpp: is it possible?

A user on the r/LocalLLaMA subreddit is seeking assistance with optimizing their GPU VRAM usage for running smaller language models. Despite successfully running larger models like Gemma4 26B and Qwen 3.6 35B MoEs, they are encountering issues with smaller models like Gemma4-2B still utilizing system RAM. The user has experimented with various command-line options for llama.cpp but has not yet achieved full VRAM utilization without relying on host memory. AI
- Qwen
- llama.cpp
- Gemma4
SIGNIFICANT · X — Qwen (Alibaba) English(EN) · 1w

🚀Qwen3.6-Plus is on Nous Portal now and FREE for a limited time.

Alibaba's Qwen team has released their Qwen3.6-Plus model on the Nous Research portal. The model is currently available for free for a limited time, with a mention of Hermes Agent integration. AI

IMPACT Makes a new frontier model available for broader testing and integration.
TOOL · Mastodon — fosstodon.org English(EN) · 2w · [6 sources]

Thinking about running AI models like Llama 3, Qwen, or Mistral on your own computer? Two of the best local AI tools in 2026 are Ollama and LM Studio. Both tool

Creators are increasingly adopting local AI solutions in 2026, moving away from cloud-based services for benefits like unlimited usage, enhanced privacy, faster workflows, and lower long-term costs. Tools such as Ollama, LM Studio, and Open-WebUI are making it easier for beginners to run powerful open-source models like Llama 3, Qwen, and Mistral directly on their personal computers. This shift offers users full control over their data and content creation processes, with some even developing portable AI solutions that run entirely offline from a USB stick. AI

IMPACT Accelerates adoption of personal AI infrastructure, offering cost-effective and private alternatives to cloud-based LLM services.
- LM Studio
- Qwen
- Llama 3
- Ollama
- ChatGPT
- Open-WebUI
- Docker
TOOL · Unsloth — Releases English(EN) · 2w

New Unsloth API Inference Endpoint

Unsloth has released a new API inference endpoint that allows users to run local large language models with enhanced features. This endpoint supports both Anthropic-compatible and OpenAI-compatible dialects, enabling seamless integration with various AI agents and chat clients. The update also introduces new models like NVIDIA Nemotron 3 Nano Omni and Mistral 3.5 Medium, alongside several bug fixes and improvements to the Unsloth Studio. AI

IMPACT Enables easier local deployment and integration of various LLMs with enhanced features like self-healing tool calling and code execution.
TOOL · Qwen tech blog English(EN) · 3w

FlashQLA: CP-/Bwd-Friendly Fused Linear Attention Kernels for GDN

Qwen has developed FlashQLA, a new set of fused linear attention kernels designed to be compatible with both forward and backward passes in deep learning. These kernels are optimized for Gated Delta Networks (GDN), which are now a core component in Qwen's model family, including Qwen3-Next and its subsequent iterations like Qwen3.5 and Qwen3.6. The development aims to improve efficiency and scalability for large models with extended context windows. AI

IMPACT Optimizes attention mechanisms for large language models, potentially improving training and inference efficiency for Qwen's model family.
TOOL · Hugging Face Trending Models English(EN) · 1mo

froggeric/Qwen-Fixed-Chat-Templates

A Hugging Face model repository, froggeric/Qwen-Fixed-Chat-Templates, has been updated with significant improvements to its chat templates for Qwen 3.5 and 3.6 models. These updates address issues such as "empty think" poisoning, system prompt logic traps, and KV cache inconsistencies. The changes aim to enhance the model's ability to use tools, transition between thinking and conversational responses, and maintain a consistent memory during multi-step processes. AI

IMPACT Fixes to chat templates improve Qwen model reliability and tool usage, potentially enhancing agentic capabilities.
FRONTIER RELEASE · X — Qwen (Alibaba) English(EN) · 1mo · [11 sources]

🚀Qwen3.7-Max just landed at 56.6 on the Artificial Analysis Intelligence Index — a solid 4.8pt jump over Qwen3.6-Max-Preview. @ArtificialAnlys

Alibaba's Qwen has released Qwen3.7-Max, a new flagship model designed for the Agent Era. This model demonstrates significant improvements in scientific reasoning, coding, and agentic capabilities, achieving a score of 56.6 on the Artificial Analysis Intelligence Index. Qwen3.7-Max also showcases enhanced performance in autonomous execution and generalization across various benchmarks, with features like implicit caching now live. AI

IMPACT Sets a new benchmark for agentic capabilities and reasoning, potentially accelerating the development of autonomous AI systems.
RESEARCH · Transformers — Releases English(EN) · 1mo · [10 sources]

Patch release: v5.5.2

Hugging Face's `transformers` library has seen a series of releases and patches, introducing new models and fixing various bugs. Notably, version 5.9.0 added Cohere's Command A+ (Cohere2Moe) and HRM-Text, while also improving audio support and generation capabilities. Earlier releases, such as v5.8.0, integrated models like DeepSeek-V4, Gemma 4 Assistant, GraniteSpeechPlus, Granite4Vision, EXAONE 4.5, and PP-FormulaNet. Several patch releases have addressed specific issues, including problems with DeepSeek V4 integration, flash attention, Qwen MoE models with FP8, and Gemma4 device map support. AI

IMPACT New model integrations and bug fixes in a widely used library accelerate research and development across the AI ecosystem.
MEME · X — Qwen (Alibaba) English(EN) · 2w

📣We're calling for ambassadors!

Alibaba's Qwen team is seeking ambassadors to join their community. They are looking for individuals with strong technical skills or local community leadership experience. Selected ambassadors will receive early access to resources and opportunities. AI
- Qwen
- Alibaba
SIGNIFICANT · Qwen tech blog English(EN) · 1mo

Qwen3.5-Omni: Scaling Up, Toward Native Omni-Modal AGI

Alibaba's Qwen team has released Qwen3.5-Omni, a new generation of omnimodal large language models capable of processing text, images, audio, and audio-visual content. This series features models named Plus, Flash, and Light, all supporting a 256k context window and capable of handling over 10 hours of audio. The architecture utilizes a Hybrid-Attention Mixture-of-Experts (MoE) approach for both its reasoning and generation components. AI

IMPACT Expands LLM capabilities into native audio and video processing, potentially enabling more sophisticated AI agents and applications.
TOOL · Together AI blog English(EN) · 2mo

Together AI expands fine-tuning service with tool calling, reasoning, and vision support

Together AI has enhanced its fine-tuning service to better support advanced AI workflows. The update includes native support for tool call, reasoning, and vision-language model fine-tuning, addressing common issues like unreliable tool execution and degraded reasoning in complex interactions. These improvements aim to increase iteration speed and accuracy for AI teams building agentic applications, with enhanced throughput and larger dataset handling for models up to 1T parameters. AI

IMPACT Enables more reliable and efficient fine-tuning of AI agents, potentially accelerating the development of complex AI applications.
- OpenAI
- Together AI
- Qwen
- Z.AI
- Moonshot AI
- XY.AI Labs