Brief

last 24h

[38/838] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · HN — claude cli stories English(EN) · 2mo · [2 sources]

Show HN: Frontend-VisualQA — give coding agents eyes to verify their own UI work

Two new open-source CLIs, Frontend-VisualQA and ProofShot, aim to enhance AI coding agents' ability to verify their own work. These tools provide agents with visual capabilities, allowing them to AI
RESEARCH · HN — AI startup stories English(EN) · 3mo

Yann LeCun's AI startup raises $1B in Europe's largest ever seed round

AI startup Mistral AI has secured a significant $1 billion in seed funding, marking the largest seed round ever raised in Europe. The funding round was led by Andreessen Horowitz and Lightspeed Venture Partners, with participation from other major investors including General Catalyst, Nvidia, and Salesforce. This substantial investment underscores the growing interest and capital flowing into the competitive AI landscape. AI

IMPACT This massive funding round for Mistral AI signals strong investor confidence in European AI companies and intensifies competition in the frontier model space.
RESEARCH · Apple Machine Learning Research English(EN) · 3mo · [76 sources]

EpiCache: Episodic KV Cache Management for Long-Term Conversation on Resource-Constrained Environments

Multiple research papers released in May and June 2026 propose novel methods for compressing the Key-Value (KV) cache in large language models (LLMs). These techniques aim to reduce the significant memory overhead associated with long context lengths, enabling more efficient inference on resource-constrained environments. Approaches include episodic management, global regression for merging, drift-robust retrieval, and low-rank approximations, all seeking to maintain model accuracy while drastically cutting memory usage and latency. AI

IMPACT These methods aim to significantly reduce memory and latency for LLMs, potentially enabling wider deployment and more complex applications on less powerful hardware.
- transformer models
- KV cache
- attention
- X-LLMs
- OScaR
- LLMs
- Transformers
- TurboQuant
- PolarQuant
- Llama
- CacheClip
- InnerQ
- OCTOPUS
- Together AI
- KVServe
- LLM
- S3
- NIXL
- Ceph RGW
- DAOS
- VideoMLA
- EpiCache
- GRKV
- Moment-KV
- CriticalKV
- StiefAttention
- Apple Machine Learning Research
- LongConvQA
- LongBench
- RULER
- Qwen3
- Gemma 3
- Llama 3
RESEARCH · HN — AI startup stories Français(FR) · 3mo

OpenAI raises $110B on $730B pre-money valuation

OpenAI has secured $110 billion in private funding, with Amazon contributing $50 billion and Nvidia and SoftBank each adding $30 billion, valuing the company at $730 billion pre-money. This significant investment includes substantial infrastructure partnerships, with OpenAI expanding its AWS collaboration by $100 billion and committing to significant compute usage. The funding round is still open, and OpenAI anticipates further investor participation as it focuses on scaling infrastructure to meet the growing demand for AI services. AI

IMPACT This massive funding and infrastructure deal will likely accelerate OpenAI's ability to scale its AI services and develop new products, potentially setting new benchmarks for compute and AI deployment.
- Nvidia
- Andy Jassy
- OpenAI
- Amazon
- Jensen Huang
- Vera Rubin
- SoftBank
- AWS
- Bedrock
RESEARCH · HN — AI startup stories English(EN) · 3mo

Fei-Fei Li's World Labs raised $1B from A16Z, Nvidia to advance its world models

Fei-Fei Li's AI startup, World Labs, has secured $1 billion in a new funding round. The investment was backed by major players including Autodesk, Andreessen Horowitz, Nvidia, and Advanced Micro Devices. This funding aims to advance the company's unique approach to developing AI. AI

IMPACT This substantial investment could accelerate novel AI development approaches and potentially shift the landscape of AI research and application.
RESEARCH · METR (Model Evaluation & Threat Research) 中文(ZH) · 4mo · [101 sources]

Frontier AI Safety Regulations: A Reference Guide for AI Company Employees

Researchers are developing new methods to attack and defend AI agents used in software reverse engineering and cybersecurity. One approach uses genetic algorithms to inject malicious prompts into AI agents, causing them to misinterpret code and bypass detection systems. Other studies focus on detecting and obfuscating these prompt injection attacks, as well as defending against multi-step trojan attacks that embed persistent control within agent workflows. Additionally, a framework called CVE-Factory automates the creation of executable vulnerability tasks for training and evaluating code security agents, showing significant improvements in models like Qwen3-32B. AI

IMPACT New attack vectors and defense mechanisms for AI agents highlight critical security vulnerabilities in AI-powered tools.
RESEARCH · HN — AI startup stories English(EN) · 4mo

Apple buys Israeli startup Q.ai

Apple has acquired the Israeli AI startup Q.ai for nearly $2 billion, aiming to bolster its capabilities in audio processing and machine learning. The startup, founded in 2022, specializes in technologies that can interpret whispered speech and enhance audio in noisy environments. This acquisition is Apple's second-largest to date and follows previous AI-focused feature integrations in products like AirPods and the Vision Pro headset. AI

IMPACT Strengthens Apple's AI hardware and audio capabilities, potentially impacting future product development and competition in the AI race.
- PrimeSense
- Avi Barliya
- Yonatan Wexler
- Kleiner Perkins
- Aviad Maizels
- Beats Electronics
- The Financial Times
- Vision Pro
- AirPods
- Reuters
- Q.ai
- Apple
- GV
RESEARCH · HN — AI startup stories English(EN) · 5mo

OpenAI is paying employees more than any major tech startup in history

OpenAI is reportedly offering compensation packages that exceed those of other major tech startups throughout history. This strategy aims to retain top talent amidst intense competition in the AI field. The company's aggressive approach to employee compensation highlights the high stakes and significant investment involved in developing advanced AI. AI

IMPACT Aggressive compensation by OpenAI may set new benchmarks for talent acquisition in the AI sector.
- OpenAI
RESEARCH · Hugging Face Daily Papers English(EN) · 7mo · [285 sources]

LambdaPO: A Lambda Style Policy Optimization for Reasoning Language Models

Several recent research papers explore methods to enhance the reasoning capabilities of large language models (LLMs). One study suggests that increasing a model's long-context capacity improves reasoning performance across various tasks. Another paper introduces OckBench, a benchmark focused on measuring the token efficiency of LLM reasoning, highlighting significant room for optimization. Additional research proposes frameworks for evaluating inductive reasoning, improving robustness through invariant gradient alignment, and enabling belief-aware reasoning in multimodal models. AI

IMPACT New benchmarks and training techniques aim to improve LLM reasoning accuracy, efficiency, and robustness, potentially leading to more reliable AI agents.
RESEARCH · HN — AI infrastructure stories English(EN) · 9mo

Nvidia results show spending on A.I. infrastructure remains robust

Nvidia's latest financial results indicate a continued strong demand for AI infrastructure, with significant revenue generated from its AI chip sales. The company's performance highlights the ongoing substantial investment in hardware necessary to support the rapidly expanding AI sector. This robust spending suggests that the development and deployment of advanced AI models remain a top priority for many organizations. AI

IMPACT Confirms that the demand for AI hardware remains strong, suggesting continued investment in AI development and deployment.
- Nvidia
- AI
RESEARCH · HN — AI infrastructure stories English(EN) · 9mo

The U.S. grid is so weak, the AI race may be over

The rapid expansion of AI is creating a significant bottleneck in the United States due to the limitations of its power grid, contrasting sharply with China's robust energy infrastructure. While U.S. AI growth is hampered by debates over data center power consumption and grid stability, China has proactively addressed this by overbuilding its power capacity over decades. This strategic oversupply allows China to integrate AI data centers as a means to absorb excess energy, a situation unimaginable in the U.S. where grids often operate with minimal reserve margins, leading to concerns about the sustainability of AI development. AI

IMPACT AI development in the US faces a critical bottleneck due to power grid limitations, potentially hindering growth compared to China's energy-secure infrastructure.
- David Fishman
- Goldman Sachs
- McKinsey
- Stifel Nicolaus
- S&P 500
- Deloitte
- Ohio
- Germany
- India
- California
- Texas
- Rui Ma
- Tech Buzz China
- Fortune
- X
RESEARCH · Google AI / Research English(EN) · 10mo · [633 sources]

Unlocking dependable responses with Gemini Enterprise Agent Platform’s Agentic RAG

Researchers are developing advanced agent frameworks to improve AI reliability and efficiency across various domains. Google introduced an agentic RAG system that enhances enterprise query handling by iteratively searching for complete context, boosting accuracy by up to 34%. Hugging Face demonstrated a multi-agent economy simulation using a small 3B model, highlighting the trade-offs between model size and real-time performance. Other research explores methods for reliable tool use, regulatory compliance through agent-to-agent protocols, dynamic benchmarking for agent behavior, and robust self-evolution mechanisms for AI agents. AI

IMPACT New agentic frameworks and evaluation methods promise more reliable, efficient, and compliant AI systems across enterprise, simulation, and regulatory domains.
RESEARCH · Qwen tech blog English(EN) · 11mo · [355 sources]

Qwen3.6-35B-A3B: Agentic Coding Power, Now Open to All

Multiple research papers released on arXiv explore advancements in AI agents, focusing on improving their reasoning, memory, and training efficiency. Qwen3.6-35B-A3B, an open-source sparse MoE model, demonstrates strong agentic coding capabilities. Other studies introduce methods for better skill presentation, long-context reasoning through RL, skill reuse as compression, and adaptive context management for agents tackling complex, long-horizon tasks. Additionally, research presents AutoSci, a system for automating the scientific research lifecycle, and PithTrain, a compact training framework for MoE models designed for agent-native development. AI

IMPACT Advances in agent capabilities, memory management, and training efficiency could accelerate the development of more sophisticated AI systems.
- BRIGHT
- SIRA
- Gemini-3-Flash
- Qwen3-Reranker
- MemReranker
- LatentRAG
- AgenticRAG
- BeliefMem
- LLM
- GPT-4o-mini
- ALFWorld
- InterLV-Search
- SuperIntelligent Retrieval Agent (SIRA)
- MemReread
- AI agents
- Grok-4-Fast
- Qwen3-235B
- Llama-4-Maverick
- Gemini 2.5 Flash
- MeMo
- H-Mem
- DimMem
- RecMem
- SocialMemBench
- EvoMemBench
- LongMINT
- Qwen2.5-3B-Instruct
- Qwen3.6-35B-A3B
- Qwen
- ReuseRL
- AdaCoM
- ASH
- ElasticMem
- PithTrain
- AutoSci
- DeepSeek V4-Flash
- GPT-5.5
- Qwen2.5-7B-Instruct
- SCALE
- LongTraceRL
RESEARCH · HN — AI startup stories Français(FR) · 11mo

Grammarly acquires Superhuman

Grammarly has acquired the email startup Superhuman, signaling a strategic move to enhance its AI platform. The acquisition aims to integrate Superhuman's advanced AI capabilities into Grammarly's existing offerings, potentially expanding its reach into new communication tools and workflows. AI

IMPACT This acquisition could lead to more integrated AI-powered communication tools, enhancing productivity for users.
RESEARCH · HN — machine learning stories English(EN) · 11mo · [3 sources]

Normalizing Flows Are Capable Generative Models

Researchers have developed a new generative modeling framework utilizing cumulative flow maps for long-range transport in probability space. This approach aims to connect local updates with finite-time transport, allowing generative models to reason about global state transitions. The framework supports few-step and even one-step generation with minimal changes to existing models and no increase in capacity, demonstrating effectiveness across various tasks like image and SDF generation with reduced inference costs. AI

IMPACT Introduces novel generative modeling techniques that could lead to more efficient and capable AI systems for various synthesis tasks.
RESEARCH · HN — AI startup stories English(EN) · 11mo

Apple executives have held internal talks about buying Perplexity

Apple executives have reportedly held preliminary discussions regarding the potential acquisition of AI startup Perplexity AI. These talks, involving key figures like Adrian Perica and Eddy Cue, are aimed at bolstering Apple's AI capabilities and talent pool. The discussions are in their nascent stages and may not result in a formal offer. AI

IMPACT Potential acquisition could significantly boost Apple's AI integration and competitive standing.
RESEARCH · Hugging Face Daily Papers English(EN) · 12mo · [361 sources]

Rule2DRC: Benchmarking LLM Agents for DRC Script Synthesis with Execution-Guided Test Generation

Researchers are developing new methods to improve the evaluation and training of large language models (LLMs). One approach, SCOPE, calibrates LLM judges to ensure reliable pairwise evaluations with controlled error rates. Another technique, D3, uses dynamic influence graphs to optimize data scheduling during LLM training by considering sample interactions. Additionally, OBCache offers a principled framework for pruning key-value caches to reduce memory overhead during long-context inference, improving accuracy. AI

IMPACT New research introduces methods for more reliable LLM evaluation, efficient training data scheduling, and optimized inference, potentially improving LLM performance and resource utilization.
- FlashAttention
- LLMs
- PagedAttention
- Nested WAIT
- A100 GPU
- Llama-2-7B
- LLM
- Asteria
- A100
- vLLM
- FasterTransformer
- Sarathi-Serve
- Orca
- KVDrive
- SCICONVBENCH
- LLMEval-Logic
- DeepSeek-R1-Distill-7B
- V* benchmark
- POPE benchmark
- LLaDA2.0-flash
- LLaDA2.0-mini
- TIDE
- Frontier
- FT-Dojo
- arXiv
- WebGPU
- llama.cpp
- LlamaWeb
- rePIRL
- PALS
- FT-Agent
- Charon
- Qwen
- AxBench
- Gemini 3 Pro
- Lean
- Item Response Theory
- LoRA
- GPT-5
- LLaMA
- Hermes
- SCOPE
- OBCache
- FEM-Bench
RESEARCH · arXiv cs.CL English(EN) · 13mo · [53 sources]

FlexDraft: Flexible Speculative Decoding via Attention Tuning and Bonus-Guided Calibration

Researchers have developed several new methods to accelerate large language model (LLM) inference through speculative decoding. AdaPLD improves retrieval and draft construction by using semantic similarity and branched hypotheses, achieving up to 3.10x speedup. SSSD combines n-gram matching with hardware-aware speculation for up to 2.9x latency reduction without training. D^2SD uses a dual diffusion model and confidence-guided prefix trees to enhance acceptance rates, while TAPS optimizes prefix tree selection for diffusion-drafted decoding, yielding up to 7.9x speedup. KnapSpec treats draft model selection as a knapsack problem to maximize throughput, achieving up to 1.47x speedup, and Vegas uses verification-guided sparse attention for improved decoding throughput. Additionally, LK Losses directly optimize the acceptance rate during training, leading to gains of 8-10% in average acceptance length. AI

IMPACT These advancements in speculative decoding promise significant speedups and efficiency gains for LLM inference, potentially lowering costs and increasing accessibility.
- FlexDraft
- Graft
- Qwen3-235B
- vLLM
- Speculative Decoding
- Llama-3-8B
- Llama-3-70B
- GPT-4
- Claude Sonnet
- Ollama
- Qwen3
- Speculative Pipeline Decoding
- EvoSpec
- LLM
- ToolSpec
- Bastion
- D^2SD
- AdaPLD
- arXiv
- Hugging Face
- LK Losses
- KnapSpec
RESEARCH · HN — machine learning stories English(EN) · 14mo · [2 sources]

Understanding Aggregate Trends for Apple Intelligence Using Differential Privacy

Apple is advancing research in privacy-preserving machine learning and AI, hosting a workshop to discuss techniques like federated learning and differential privacy. The company is applying these methods to its upcoming Apple Intelligence features, such as Genmoji, Image Playground, and writing tools, to understand usage trends without compromising user data. Apple is also exploring the creation of synthetic data that mimics real user content to improve these features while maintaining strict privacy standards. AI

IMPACT Apple's focus on privacy-preserving AI techniques for Apple Intelligence features may set new standards for user data protection in generative AI.
RESEARCH · HN — AI infrastructure stories English(EN) · 14mo

FOSS infrastructure is under attack by AI companies

AI companies are aggressively crawling open-source infrastructure, causing significant outages and disruptions for projects like SourceHut, KDE GitLab, and GNOME. These AI scrapers often disregard robots.txt and mimic legitimate user agents, making it difficult to implement effective defenses. As a result, some projects have resorted to implementing challenging proof-of-work systems to block these bots, which can also impact legitimate users. AI

IMPACT AI data scraping practices are straining open-source infrastructure, potentially hindering collaboration and development.
- KDE GitLab
- Drew DeVault
- SourceHut
- Alibaba
- Anthropic
- OpenAI
- Anubis
- GNOME GitLab
RESEARCH · HN — AI startup stories English(EN) · 16mo

Intel ruined an Israeli startup it bought for $2B–and lost the AI race

Intel has effectively dismantled Habana Labs, an Israeli AI chip startup it acquired for $2 billion, marking a significant failure in its attempt to compete with Nvidia. Despite initial optimism and a deal with Amazon for its Gaudi chips, Intel's internal issues and integration problems led to key personnel departing and the cancellation of next-generation products like Falcon Shores. This outcome represents a rare misstep for Habana's founder, Avigdor Willenz, who has a history of successful ventures in the semiconductor industry. AI

IMPACT Highlights the intense competition and challenges in the AI hardware market, potentially impacting the supply chain for AI model training.
- Avigdor Willenz
- Marvell
- Annapurna Labs
- Astera Labs
- Nervana
- AMD
- Intel
- Habana Labs
- Nvidia
- Amazon
- Gaudi
- LLMs
- Falcon Shores
- Raja Koduri
- Mobileye
RESEARCH · 36氪 (36Kr) 中文(ZH) · 16mo · [2 sources]

Samsung announces it will stop selling all home appliance products in the Chinese market

Samsung Electronics has announced it will cease sales of all home appliance products, including televisions and monitors, in the Chinese market. This decision comes in response to a rapidly changing market environment. The company has assured customers that it will continue to provide after-sales service and uphold consumer rights according to relevant laws and regulations. AI
- China
- Samsung Electronics
RESEARCH · HN — AI infrastructure stories English(EN) · 17mo

Executive order on advancing United States leadership in AI infrastructure

The White House has issued an executive order aimed at bolstering U.S. leadership in AI infrastructure. The order focuses on expanding access to computing resources, developing AI talent, and promoting responsible AI innovation. It also emphasizes the importance of international collaboration and the development of safety standards for AI technologies. AI

IMPACT This executive order aims to solidify U.S. leadership in AI by focusing on infrastructure and talent, potentially accelerating domestic AI development and deployment.
- United States
- White House
RESEARCH · HN — AI startup stories English(EN) · 17mo

Anthropic raising funding valuing it at $60B

Anthropic is reportedly in talks to raise a significant funding round that would value the AI company at approximately $60 billion. This potential investment comes as the company continues to develop its large language models and compete in the rapidly evolving AI landscape. The substantial valuation underscores the high investor interest in cutting-edge AI development. AI

IMPACT Confirms continued high investor confidence and capital flow into frontier AI development.
- AI
- Anthropic
RESEARCH · HN — AI startup stories Suomi(FI) · 17mo

Vultr Raises $333M at $3.5B Valuation

Vultr, a cloud computing provider focused on AI workloads, has secured $333 million in funding at a $3.5 billion valuation. The investment round was led by existing investor Thoma Bravo. The company plans to use the funds to expand its global infrastructure and enhance its AI-specific offerings. AI

IMPACT Expansion of Vultr's infrastructure could lower costs and increase accessibility for AI development and deployment.
- Vultr
- Thoma Bravo
RESEARCH · arXiv cs.LG English(EN) · 23mo · [2 sources]

Sequential Learning and Catastrophic Forgetting in Differentiable Resistor Networks

Researchers have developed a novel analog network of resistors capable of performing machine learning tasks without a traditional processor. This system, based on transistors, can learn and adapt to new tasks, demonstrating potential for highly energy-efficient computation. While currently a prototype, the technology shows promise for applications in edge devices and could eventually outperform conventional digital processors for specific machine learning workloads. AI

IMPACT This research could lead to more energy-efficient AI hardware, particularly for edge computing applications.
RESEARCH · HN — AI infrastructure stories English(EN) · 24mo

OpenAI Selects Oracle Cloud Infrastructure to Extend Microsoft Azure AI Platform

OpenAI has entered into a new agreement to utilize Oracle Cloud Infrastructure (OCI) for its artificial intelligence workloads. This partnership aims to expand OpenAI's existing AI platform, which is primarily hosted on Microsoft Azure. The collaboration will leverage OCI's high-performance computing capabilities to support OpenAI's growing demand for AI training and inference. AI

IMPACT Expands AI training and inference capacity by diversifying cloud infrastructure providers.
RESEARCH · HN — machine learning stories English(EN) · 24mo · [2 sources]

Apple's On-Device and Server Foundation Models

Apple has detailed its new foundation language models powering Apple Intelligence, including a ~3 billion parameter on-device model and a larger server-based model. These models are designed for multilingual and multimodal tasks, supporting image understanding and tool execution. The company emphasizes its Responsible AI approach, focusing on user privacy through innovations like Private Cloud Compute and on-device processing, ensuring user data is not used for training. AI

IMPACT Apple's detailed technical report on its foundation models may influence the development of efficient on-device and specialized server-based AI systems.
- JAX
- AXLearn
- Private Cloud Compute
- macOS Sequoia
- iPadOS 18
- iOS 18
- Apple Intelligence
- Apple
- XLA
RESEARCH · HN — machine learning stories English(EN) · 26mo

USAF Test Pilot School, DARPA announce aerospace machine learning breakthrough

The USAF Test Pilot School and DARPA have announced a significant advancement in aerospace machine learning. This breakthrough involves the development and successful testing of a new AI system designed to enhance the capabilities of military aircraft. The system aims to improve decision-making and operational efficiency in complex aerial environments. AI

IMPACT Potential to enhance military aviation capabilities through advanced AI decision-making.
- DARPA
- USAF Test Pilot School
RESEARCH · HN — machine learning stories English(EN) · 26mo · [21 sources]

A Visual Introduction to Machine Learning (2015)

This collection of resources offers a broad overview of machine learning, from foundational concepts and visual introductions to theoretical underpinnings and practical applications. It includes a visual guide to classification tasks, a discussion on the science and ethics of machine learning benchmarks, and pointers to comprehensive textbooks and course materials. Additionally, it highlights tools for interpretable machine learning and the engineering practices required for deploying models in production. AI

IMPACT Provides foundational knowledge and practical tools for understanding, developing, and deploying machine learning models.
RESEARCH · HN — machine learning stories English(EN) · 26mo

The AI industry spent 17x more on Nvidia chips than it brought in in revenue

The AI sector's expenditure on Nvidia chips significantly outpaced its revenue generation, with a reported 17x difference. This highlights a substantial investment phase in AI infrastructure, potentially indicating a focus on future growth and capability development over immediate profitability. The data suggests a considerable capital outlay is being made to acquire the necessary hardware for training and deploying advanced AI models. AI

IMPACT Indicates a heavy investment phase in AI infrastructure, potentially signaling future capability advancements.
- AI industry
- Nvidia
RESEARCH · HN — AI infrastructure stories Română(RO) · 26mo · [2 sources]

1-Bit AI Infrastructure

Researchers have developed a software stack called 'this http URL' to enable fast and lossless inference of 1-bit Large Language Models (LLMs) like BitNet b1.58 on CPUs. This new infrastructure achieves significant speedups, ranging from 2.37x to 6.17x on x86 CPUs and 1.37x to 5.07x on ARM CPUs, depending on model size. The goal is to make LLMs more efficient and deployable on a wider range of devices. AI

IMPACT Enables more efficient and widespread deployment of LLMs on consumer hardware.
- Shaoguang Mao
- this http URL
- x86 CPUs
- LLMs
- ARM CPUs
- BitNet b1.58
- BitNet
RESEARCH · Medium — MLOps tag English(EN) · 34mo · [63 sources]

Building Secure AI Gateways with MLflow AI Gateway

Google Research has introduced ReasoningBank, a novel framework designed to enhance AI agents' ability to learn from their experiences, both successes and failures, after deployment. This system distills generalizable reasoning strategies from past interactions, allowing agents to continuously improve and avoid repeating mistakes. Separately, new research explores optimizing multi-agent communication through latent representations and introduces Agent Evolving Learning (AEL) for agents operating in open-ended environments, focusing on how to effectively use remembered information. Additionally, DeepSeek has released preview models of its V4 series, offering large context windows and advanced capabilities at a significantly lower cost than comparable frontier models. AI

IMPACT New frameworks for agent learning and memory, alongside cost-effective frontier models, could accelerate AI adoption in complex tasks and personalized applications.
- MLflow
- OpenRouter
- OpenAI
- Anthropic
- Gemini
- GPT-5.5
- Claude Opus 4.7
- MLflow AI Gateway
- LiteLLM
- Portkey
- Nemobot
- Google
- ReasoningBank
- DeepSeek
- DeepSeek-V4-Pro
- DeepSeek-V4-Flash
- AI agents
- LLM
- Hugging Face
- DiffMAS
- Agent Evolving Learning (AEL)
- AgenticQwen
- Memora
RESEARCH · Google AI / Research English(EN) · 38mo · [475 sources]

Making LLMs more accurate by using all of their layers

Google Research has developed a new framework to evaluate the behavioral alignment of large language models with human social inclinations. This approach adapts established psychological questionnaires into large-scale situational judgment tests, allowing for the quantification of model tendencies in realistic scenarios. The research identifies gaps where model behaviors deviate from human consensus or fail to capture the range of human opinions, aiming to improve LLM navigation of social dynamics. Separately, Google Research also introduced SLED, a novel decoding strategy that enhances LLM factuality by utilizing all model layers instead of just the final one, without requiring external data or fine-tuning. AI

IMPACT New methods for evaluating LLM alignment and improving factuality could lead to more trustworthy and socially adept AI systems.
- NeurIPS 2024
- Google Research
- LLMs
- ERQ
- IRI
- Situational Judgment Tests
- SLED
- CodeGemma
- GitHub
RESEARCH · Hugging Face Blog English(EN) · 48mo · [405 sources]

The Annotated Diffusion Model

Apple's research paper explores the mechanisms behind compositional generalization in conditional diffusion models, particularly focusing on how these models handle generating images with more objects than trained on. The study identifies 'local conditional scores' as a key factor enabling this ability, demonstrating that models succeeding at length generalization exhibit these scores, while those that fail do not. The research also proposes a method to enforce these local scores, which successfully enabled length generalization in a previously underperforming model. AI

IMPACT Research into diffusion model generalization could lead to more robust and controllable image generation systems.
RESEARCH · 量子位 (QbitAI) 中文(ZH) · 71mo · [190 sources]

Secured 70 billion yuan in funding! DeepSeek Code is really coming, ACM gold medalist Cui Tianyi is in charge

New research explores the challenges and advancements in AI-native code generation, focusing on improving efficiency, reliability, and safety. Papers introduce novel architectures like MicroSkill for better context management and modular knowledge encapsulation, reducing token consumption and increasing compilation success rates. Other studies benchmark coding agents' performance on complex tasks, including their ability to handle underspecified user intent and detect potential sabotage, highlighting the need for human-centric safety mechanisms and robust evaluation frameworks. AI

IMPACT New benchmarks and architectures are pushing the boundaries of AI coding agents, addressing efficiency, safety, and complex task handling.
- Udemy
- Replit
- Codex
- Cursor
- GitHub Copilot
- Claude Code
- DeepSeek
- DeepSeek Code
- Python
- Cui Tianyi
- TSY Capital
- Replit Agent
- Agent Harness
- Anthropic
- OpenAI
- TensorBench
- Asuka-Bench
- MiniMax-M2.7
- Gemini-3.1-Pro
- GPT-5.4
- Claude-Opus-4.6
- MicroSkill Architecture
- AI-native code generation
- SABER
- OpenAI Codex
RESEARCH · OpenAI News English(EN) · 91mo · [1013 sources]

Better language models and their implications

Google DeepMind has introduced the FACTS Benchmark Suite, a new set of evaluations designed to systematically measure the factuality of large language models across various use cases. This suite includes benchmarks for parametric knowledge, search-based information retrieval, and multimodal understanding, alongside an updated grounding benchmark. The initiative aims to provide a more comprehensive understanding of LLM factuality and drive industry-wide improvements in accuracy and trustworthiness. AI

IMPACT Provides new evaluation tools to drive progress in LLM factuality and reduce hallucinations.
RESEARCH · OpenAI News English(EN) · 122mo · [741 sources]

RL²: Fast reinforcement learning via slow reinforcement learning

OpenAI has published a series of research papers detailing advancements in reinforcement learning. These include achieving superhuman performance in Dota 2 with OpenAI Five, developing benchmarks for safe exploration in RL, and quantifying generalization capabilities with the CoinRun environment. The company also explored novel methods like prediction-based rewards for curiosity-driven exploration, learning policy representations in multiagent systems, and an experimental metalearning approach called Evolved Policy Gradients for faster training on new tasks. Further research addresses variance reduction in policy gradients and the equivalence between policy gradients and soft Q-learning, alongside challenging robotics environments for multi-goal RL. AI

IMPACT Demonstrates significant progress in RL capabilities, including superhuman performance, safety, generalization, and exploration, pushing the boundaries of AI.