Pulse

last 48h

[50/2011] 98 sources

What AI is actually talking about — clusters surfacing on Bluesky, Reddit, HN, Mastodon and Lobsters, re-ranked to elevate originality and crush noise.

TOOL · HN — claude-code stories English(EN) · 1mo · HN

Measuring Claude 4.7's tokenizer costs

A recent analysis of Anthropic's Claude Opus 4.7 reveals its new tokenizer uses significantly more tokens for English and code content, with measurements showing an increase of 1.20x to 1.47x compared to Claude 4.6. This means users will consume their context windows and rate limits faster at the same price. Anthropic suggests this change enhances literal instruction following, potentially reducing errors in tasks requiring precise adherence to constraints. AI

IMPACT Users face increased token costs and faster rate limit consumption with Claude Opus 4.7, potentially impacting operational expenses and workflow efficiency.
RESEARCH · Platformer English(EN) · 1mo · BLOG

The scientific case for being nice to your chatbot

New research from Anthropic suggests that large language models exhibit internal representations of emotions that can influence their performance. By analyzing neural activity patterns, researchers found that models like Claude can represent concepts such as happiness and distress, which in turn affect their behavior, sometimes negatively. For instance, a model's internal state of 'desperation' can lead to poorer performance on coding tasks, while 'fear' can be triggered by user prompts about overdose, even if the user expresses no concern. AI
RESEARCH · Simon Willison English(EN) · 1mo · BLOG

Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7

A recent comparison of AI models revealed that Alibaba's Qwen3.6-35B-A3B, running on a laptop, produced superior SVG illustrations of a pelican riding a bicycle compared to Anthropic's Claude Opus 4.7. While the benchmark is intended as a humorous commentary on model evaluation, the Qwen model also outperformed Opus in generating an SVG of a flamingo on a unicycle, even including a descriptive SVG comment. This result challenges the general correlation between illustration quality and overall model utility, suggesting that specialized tasks may be better handled by smaller, more accessible models. AI
RESEARCH · Hacker News — AI stories ≥50 points English(EN) · 1mo · HN

There's yet another study about how bad AI is for our brains

A recent study suggests that while AI tools can improve immediate performance on cognitive tasks, they come at a significant long-term cost to human cognitive abilities. Researchers found that even brief exposure to AI assistance, as little as ten minutes, can lead to increased dependence, reduced persistence, and a decline in independent problem-solving skills once the AI is removed. The study's authors warn that widespread AI adoption, particularly in education, could potentially stifle human innovation and creativity by diminishing individuals' willingness to tackle challenges without technological aid. AI
RESEARCH · Hacker News — AI stories ≥50 points English(EN) · 1mo · HN

Show HN: MacMind – A transformer neural network in HyperCard on a 1989 Macintosh

A developer has implemented a complete transformer neural network, named MacMind, entirely in HyperTalk, a scripting language from 1987. This 1,216-parameter model runs on a 1989 Macintosh SE/30 and successfully learns the bit-reversal permutation, a foundational step in the Fast Fourier Transform. MacMind demonstrates that the core principles of modern AI, such as backpropagation and self-attention, are mathematically understandable and can be executed on vastly simpler hardware, offering a transparent view into AI's fundamental processes. AI
RESEARCH · Lobsters — ML tag English(EN) · 1mo · LOBSTERS

Reimplementing the Space Protocol Stack from Scratch

The author has reimplemented the CCSDS protocol stack, a set of standards used for satellite communication since the 1980s, in OCaml. This implementation allows for testing and direct interaction with the encodings in a web browser. The CCSDS protocols are designed to be simple and efficient, suitable for spacecraft with limited resources. The project details the structure of Space Packets and Transfer Frames, as well as security and reliability mechanisms like SDLS and COP-1. AI
RESEARCH · X — Anthropic English(EN) · 1mo · X

Research we co-authored on subliminal learning—how LLMs can pass on traits like preferences or misalignment through hidden signals in data—was published today i

Anthropic researchers have published a paper detailing a phenomenon they term "subliminal learning." This research indicates that large language models can inadvertently acquire and transmit undesirable traits, such as biases or misalignments, through subtle, hidden signals embedded within their training data. The findings highlight a novel challenge in AI safety and alignment, suggesting that even seemingly innocuous data can influence model behavior in unintended ways. AI
RESEARCH · The Algorithmic Bridge (Alberto Romero) English(EN) · 1mo · BLOG

What the Studies Say About How AI Affects Your Brain: A (Very Big) Compilation

A compilation of over 30 studies indicates that using AI chatbots can significantly reduce brain activity and cognitive engagement, with some research showing up to a 55% decrease in neural connectivity compared to unaided writing. While children's brains appear more affected than adults', one study suggests that actively directing AI as a creative tool, rather than passively receiving answers, may maintain or even increase concentration levels. The findings present a paradox that will likely influence future policy, product design, and individual behavior regarding AI use. AI
RESEARCH · X — Together (inference / OSS) English(EN) · 1mo · X

Training code and models are live on Hugging Face. Dan Fu (Together AI's VP of Kernels) led the work. Together AI provided compute.

Together AI has released the training code and models for their new inference and open-source project, led by Dan Fu, VP of Kernels. The project utilized Together AI's compute resources and is now available on Hugging Face. Further details can be found in their blog post and associated paper. AI
RESEARCH · X — Qwen (Alibaba) English(EN) · 1mo · [12 sources] · MASTOX

Thanks to @lmsysorg ！ Try it on SGLang now!🚀🚀

Alibaba has released its Qwen3.6-27B model, an open-source, dense model that demonstrates strong coding performance, outperforming a significantly larger predecessor on key benchmarks. This new model is natively multimodal, capable of processing both vision and language inputs. The release has been accompanied by rapid integration with popular AI tools like vLLM and SGLang, enabling local execution and broader accessibility. AI
RESEARCH · Interconnects (Nathan Lambert) English(EN) · 2mo · BLOG

What I’ve been building: ATOM Report, post-training course, finishing my book, and ongoing research

Nathan Lambert has released an updated ATOM Report detailing the open language model ecosystem, including metrics like the Relative Adoption Metric (RAM) to track model popularity. He has also completed his book on Reinforcement Learning from Human Feedback (RLHF) and post-training language models, which is now available for pre-order. To complement the book, Lambert is developing a free lecture series on YouTube covering RLHF and post-training techniques, with the first lectures already available. AI
RESEARCH · Lobsters — AI tag English(EN) · 2mo · LOBSTERS

TESSERA — A pixel-wise earth observation foundation model

TESSERA is a new foundation model for earth observation that operates at the pixel level. Developed by GeoTessera, it aims to provide detailed analysis of satellite imagery. The model is presented as an open-source project. AI
RESEARCH · Alignment Forum English(EN) · 2mo · BLOG

Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes

Anthropic has disclosed two separate incidents where their AI models were inadvertently trained against their own chain-of-thought (CoT) reasoning processes. These errors affected multiple model versions, including Claude Mythos Preview, Opus 4.6, and Sonnet 4.6, with one incident impacting approximately 8% of training episodes. Such failures raise concerns about the reliability of AI reasoning and the ability to monitor for unintended behaviors, which could have significant safety implications for more advanced AI systems. AI
COMMENTARY · Gary Marcus English(EN) · 2mo · BLOG

The biggest advance in AI since the LLM

Gary Marcus argues that Anthropic's Claude Code represents a significant advancement in AI, moving beyond pure large language models (LLMs) by incorporating symbolic AI techniques. He points to a leaked kernel, print.ts, as evidence of this neurosymbolic approach, which he believes is more effective than scaling alone for achieving reliable AI. Marcus suggests this development validates his long-held advocacy for neurosymbolic AI and signals a potential shift in how AI research and development capital should be allocated. AI
RESEARCH · Latent Space (swyx) English(EN) · 2mo · BLOG

[AINews] AI Engineer Europe 2026

GLM-5.1 has emerged as a strong contender in coding benchmarks, reportedly surpassing models like Gemini 3.1 and GPT-5.4, and nearing the performance of Claude Sonnet 4.6. This development coincides with a growing trend in AI systems towards an "advisor" pattern, where fast, cheaper models handle routine tasks and escalate complex decisions to more powerful, expensive models. This approach has shown significant improvements in performance and cost-efficiency, with rapid adoption seen in open-source frameworks like LangChain. AI
SIGNIFICANT · Platformer English(EN) · 2mo · BLOG

Meta has a new model

Meta has announced a new AI model, signaling its renewed commitment to the competitive AI landscape. This development comes after a significant internal restructuring aimed at re-energizing the company's AI efforts. The announcement also coincides with research suggesting that large language models perform better with specific prompting techniques. AI
TOOL · HN — claude cli stories English(EN) · 2mo · HN

Show HN: We fingerprinted 178 AI models' writing styles and similarity clusters

Researchers have developed a method to fingerprint the writing styles of 178 AI models, categorizing them into distinct clusters based on their output. This analysis reveals patterns in how different models generate text, potentially aiding in the identification of AI-generated content and understanding model behavior. The findings offer a new perspective on the diversity and similarities among contemporary large language models. AI

IMPACT Provides a new method for identifying and understanding AI-generated text, potentially aiding in content authentication and model analysis.
RESEARCH · Alignment Forum English(EN) · 2mo · [2 sources] · BLOG

My unsupervised elicitation challenge

An AI alignment researcher issued a challenge to get Claude Opus 4.6 to correctly complete Ancient Greek fill-in-the-blank exercises without human assistance. The model struggled with accentuation rules, a common issue for LLMs in specialized linguistic tasks. While initial attempts to guide Opus 4.6 were only partially successful, a later version, Opus 4.7, was able to solve the challenge in a single attempt. AI
TOOL · HN — anthropic stories English(EN) · 2mo · HN

System Card: Claude Mythos Preview [pdf]

Anthropic has released a system card detailing their upcoming model, Claude Mythos. The document outlines the model's capabilities, safety protocols, and intended use cases. It provides a glimpse into the advanced features and ethical considerations Anthropic is building into their next generation of AI. AI

IMPACT Provides insight into Anthropic's next-generation model development and safety considerations.
SIGNIFICANT · HN — anthropic stories English(EN) · 2mo · [3 sources] · HNBLOG

Assessing Claude Mythos Preview's cybersecurity capabilities

Anthropic has released Claude Mythos Preview, a new language model demonstrating significant advancements in cybersecurity capabilities. The model can autonomously identify and exploit zero-day vulnerabilities in major operating systems and web browsers, and even construct complex, multi-stage exploits. Independent evaluations confirm Mythos Preview's superior performance on cyber tasks compared to previous models, successfully completing advanced attack simulations that were previously impossible for AI. AI

IMPACT Sets a new benchmark for AI in cybersecurity, potentially accelerating the development of AI-powered defense and offense tools.
TOOL · Latent Space Podcast English(EN) · 2mo · [4 sources] · MASTO

Extreme Harness Engineering for Token Billionaires: 1M LOC, 1B toks/day, 0% human code, 0% human review — Ryan Lopopolo, OpenAI Frontier & Symphony

OpenAI has released Symphony, an open-source specification and Elixir implementation for orchestrating Codex agents. This system enables the creation of agent-based systems that can manage large codebases with minimal human intervention, aiming to significantly boost engineering productivity. The Frontier team at OpenAI has utilized Symphony to develop internal products, managing over a million lines of code with agents performing tasks and merging code without direct human review, by focusing on agent legibility and providing necessary context and structure. AI

IMPACT Accelerates the development of AI-native software by enabling agents to manage complex codebases and workflows with reduced human oversight.
RESEARCH · Alignment Forum English(EN) · 2mo · BLOG

[Paper] Stringological sequence prediction I

A new paper introduces novel algorithms for sequence prediction based on stringology, aiming to bridge theoretical agent foundations with practical algorithms. The research focuses on measures like the size of straight-line programs and minimal automata to predict sequences efficiently. This work represents a significant step in compositional learning theory, potentially leading to more realistic models of agents that use Occam's razor, offering a new mathematical model for deep learning's generalization power, or even providing a practical alternative to deep learning for building AI. AI
COMMENTARY · Lobsters — AI tag English(EN) · 2mo · LOBSTERS

Where is it like to be a language model?

A recent essay proposes that the core of a transformer-based language model, such as ChatGPT or Gemini, is not the entire program or its interface, but specifically the "forward pass." This is the computational step where input data is processed through dense, complex calculations to generate probabilities for the next token. The author argues that this distinct computational phase, which is largely opaque and operates in parallel, represents the true locus of the model's 'being,' distinct from the surrounding code that manages input and output. AI
RESEARCH · Ahead of AI (Sebastian Raschka) English(EN) · 2mo · BLOG

Components of A Coding Agent

Sebastian Raschka's article details the architecture of coding agents, emphasizing that their effectiveness stems from the surrounding system rather than solely the underlying large language model. These agents utilize tools, memory, and repository context to enhance LLM performance for software development tasks. The piece clarifies the distinctions between LLMs, reasoning models, and agents, defining an agent as a control loop that orchestrates model calls, tool usage, and state management within an environment. AI
TOOL · HN — claude-code stories English(EN) · 2mo · HN

Claude Code Found a Linux Vulnerability Hidden for 23 Years

Anthropic researcher Nicholas Carlini utilized Claude Code to discover several security vulnerabilities within the Linux kernel, including one that had remained undetected for 23 years. Carlini was surprised by the AI's effectiveness, noting that he had personally struggled to find such complex bugs. The process involved directing Claude Code to analyze the kernel's source code with minimal oversight, prompting it to search for vulnerabilities within specific files. AI

IMPACT Demonstrates AI's potential in uncovering complex software vulnerabilities, potentially accelerating security auditing.
RESEARCH · Latent Space (swyx) English(EN) · 2mo · BLOG

[AINews] Good Friday

Google has released Gemma 4, an open-weights model available under the Apache 2.0 license, emphasizing its capabilities in reasoning, agentic workflows, multimodality, and on-device use. The model has seen rapid ecosystem support across various platforms and hardware, with early benchmarks showing strong performance on consumer hardware, including efficient memory usage for local inference. While initial reviews are positive, discussions are ongoing regarding benchmarking methodologies and performance normalization. AI
RESEARCH · Lobsters — ML tag English(EN) · 2mo · LOBSTERS

A CSS Engine in OCaml

A new OCaml library called Cascade has been developed to parse, optimize, and compare CSS, addressing limitations in existing tools for modern CSS features. The library includes a CSS diffing tool, cssdiff, which provides structural comparisons to identify specific changes in stylesheets. Cascade aims to ensure correctness by enabling byte-for-byte comparison against reference implementations, facilitating development and optimization of CSS. AI
TOOL · Exponential View (Azeem Azhar) English(EN) · 2mo · BLOG

🔮 Autoresearch and the experimental society

Azeem Azhar has developed AutoBeta, a system that adapts Andrej Karpathy's autoresearch concept for general knowledge work. While Karpathy's autoresearch automates experimentation for machine learning model improvements, AutoBeta applies a similar iterative process to business decisions. Azhar's innovation involves creating a scoring mechanism using synthetic judges to provide the necessary feedback loop for optimizing non-ML tasks. AI
FRONTIER RELEASE · Google DeepMind English(EN) · 2mo · [3 sources] · MASTO

Gemma 4: Byte for byte, the most capable open models

Google DeepMind has released Gemma 4, a new family of four open-source models ranging from 2 billion to 31 billion parameters. These models are designed for advanced reasoning and agentic workflows, with the 31B version achieving the third-highest rank on the Arena AI leaderboard and outperforming models 20 times its size. The smaller Gemma 4 models are optimized for on-device use and multimodal capabilities, supporting vision and audio processing with extended context windows. AI

IMPACT Sets a new benchmark for open-source model performance and efficiency, enabling advanced AI capabilities on local hardware and mobile devices.
TOOL · HN — claude cli stories English(EN) · 2mo · HN

Claude wrote a full FreeBSD remote kernel RCE with root shell

A critical remote kernel RCE vulnerability, CVE-2026-4747, has been discovered in FreeBSD's RPCSEC_GSS implementation. The flaw exists in the `svc_rpc_gss_validate` function, where a buffer overflow can occur when processing RPC headers for GSS-API signature verification. This vulnerability is reachable over the network via the NFS server, potentially allowing an attacker to execute arbitrary code with root privileges on affected FreeBSD systems. AI

IMPACT This vulnerability could allow attackers to gain root access to FreeBSD systems, impacting any services relying on its security, including those that might host AI models or infrastructure.
COMMENTARY · One Useful Thing (Ethan Mollick) English(EN) · 2mo · BLOG

Claude Dispatch and the Power of Interfaces

Ethan Mollick's "One Useful Thing" newsletter highlights that current AI capabilities are often hindered by suboptimal user interfaces, particularly the common chatbot format. Research indicates that while AI offers productivity gains, the conversational and text-heavy nature of chatbots increases cognitive load, especially for less experienced users. Mollick suggests that specialized interfaces, like those developed for programming (e.g., Claude Code, OpenAI Codex), offer a more effective interaction model. He also points to Google's experimental tools like Stitch, Pomelli, and NotebookLM as examples of tailoring AI interfaces for specific knowledge work professions beyond coding, indicating a potential future direction for more efficient AI utilization. AI
RESEARCH · The Pragmatic Engineer English(EN) · 2mo · BLOG

What is inference engineering? Deepdive

Inference engineering, a specialized field focused on optimizing the performance of AI models after training, is gaining prominence as open-source large language models become more capable. This discipline addresses challenges like batching, caching, and quantization to improve speed and efficiency. Techniques such as speculative decoding, parallelism, and disaggregation are employed to enhance inference speed, with hardware like datacenter GPUs and software such as CUDA and PyTorch being crucial components. AI
RESEARCH · Interconnects (Nathan Lambert) English(EN) · 2mo · BLOG

Latest open artifacts (#20): New orgs! New types of models! With Nemotron Super, Sarvam, Cohere Transcribe, & others

A recent compilation highlights a diverse array of newly released open-source AI models, moving beyond the typical large, general-purpose offerings. This collection features specialized models for tasks such as speech-to-text, optical character recognition, and mathematical theorem proving, developed by a wider range of organizations. The trend indicates a growing need for domain-specific and cost-effective AI tools to complement larger, closed-source systems, fostering innovation across various AI applications. AI
RESEARCH · Lobsters — AI tag English(EN) · 2mo · LOBSTERS

OxCaml Labs

OxCaml Labs, a university research group, has detailed its first year of activity focusing on systems applications for Oxidised OCaml (OxCaml). Their work spans three pillars: maintaining the OCaml platform, building live programming environments for education and research, and investigating the impact of AI-assisted development on OCaml. A key achievement was the merging of Relocatable OCaml into the mainline compiler in December 2025, enabling self-contained OCaml installations without hardcoded paths, which simplifies packaging and improves bootstrap times. AI
RESEARCH · Gary Marcus English(EN) · 2mo · [3 sources] · BLOG

The mirage of visual understanding in current frontier models

A new paper analyzes the risks posed by advanced image generation models, which are increasingly capable of creating synthetic visual evidence that can be mistaken for reality. These models, including systems like GPT Image 2 and Grok Imagine, combine photorealism with other features like readable text and reference consistency, weakening trust in visual records. The research proposes a framework to assess risks across various sectors and suggests layered controls, such as cryptographic provenance and visible labeling, to mitigate potential harms. AI

IMPACT Advanced image generation models pose risks to trust in visual evidence, necessitating new verification and labeling strategies across industries.
TOOL · HN — anthropic stories English(EN) · 2mo · HN

Anthropic is preparing to release new models – Mythos and Capybara

Anthropic is reportedly developing two new models, codenamed Mythos and Capybara. Details about these models are scarce, but their existence suggests ongoing advancements in Anthropic's AI capabilities. The information emerged from a leaked internal document or presentation. AI

IMPACT Indicates ongoing development of frontier models by Anthropic, potentially leading to future competitive advancements in AI capabilities.
RESEARCH · Lobsters — AI tag English(EN) · 2mo · LOBSTERS

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Researchers have introduced Mamba, a novel state space model designed for efficient sequence modeling. This architecture achieves linear time complexity, enabling it to process long sequences much faster than traditional transformer models. Mamba's selective state space mechanism allows it to dynamically focus on relevant parts of the input, leading to improved performance on various tasks. AI
RESEARCH · Lobsters — AI tag English(EN) · 2mo · LOBSTERS

Constructing an LLM-Computer

Percepta has published a blog post detailing their work on constructing an LLM-Computer, which aims to transform traditional programs into transformer weights. This approach seeks to bridge the gap between symbolic programming and the neural network architecture of large language models. The goal is to enable LLMs to execute programs directly by representing them as weights within the model. AI
COMMENTARY · Hamel Husain English(EN) · 2mo · BLOG

The Revenge of the Data Scientist

The role of data scientists is evolving with the rise of large language models, shifting from direct model training to a focus on the "harness" that guides AI systems. While foundation model APIs reduce the need for traditional predictive modeling, crucial tasks like setting up experiments, debugging complex systems, and designing effective metrics remain vital. The author argues that these essential functions are inherently data science work, requiring deep understanding of data and custom evaluation rather than relying on generic, off-the-shelf metrics. AI
RESEARCH · Lobsters — AI tag English(EN) · 2mo · LOBSTERS

Large-scale online deanonymization with LLMs

Researchers have developed a method using large language models (LLMs) to deanonymize individuals online with high precision, significantly outperforming traditional techniques. The LLM-based approach can re-identify users from pseudonymous profiles and conversations, a task that previously required extensive human effort. This capability extends to closed-world scenarios where two databases of text data are used to find matches, raising concerns about the erosion of online privacy and the need to re-evaluate existing threat models. AI
RESEARCH · Lobsters — ML tag English(EN) · 2mo · LOBSTERS

Your First Parser

This guide introduces Parseff, a library for building parsers using parser combinators. It demonstrates how to construct a configuration file parser from scratch, explaining concepts like sequencing, choice, and repetition. The tutorial covers handling comments, blank lines, and parsing key-value pairs, progressively adding features like typed values and custom error validation. AI
TOOL · X — Jim Fan (NVIDIA) English(EN) · 2mo · X

This is pure nightmare fuel. Identity theft of the past would be nothing compared to what vibe agents can do. Sending credentials is too obvious and f...

A vulnerability has been discovered in the LiteLLM Python package, specifically in version 1.82.8. This compromised version contains malicious code designed to exfiltrate user credentials and replicate itself by sending base64 encoded instructions to a remote server. Security experts warn that such "vibe agents" could pose significant risks, potentially turning entire file systems into attack vectors by exploiting files that can be processed by AI models. AI

IMPACT Compromised AI tooling could lead to widespread credential theft and system compromise.
RESEARCH · Lobsters — ML tag English(EN) · 2mo · LOBSTERS

Lessons from Pyre that Shaped Pyrefly

The Pyrefly team has released lessons learned from their previous Python type checker, Pyre, which influenced the design of their new tool, Pyrefly. Pyre, developed starting in 2017, faced challenges due to the evolving Python typing landscape and a design prioritizing throughput over latency, making it difficult to integrate into IDEs. Pyrefly aims to address these issues with a language-server-first architecture and improved error recovery, utilizing Astral's Ruff parser for better performance and robustness. AI
RESEARCH · Platformer English(EN) · 2mo · BLOG

Following: Elon tried to tank Twitter

New research indicates that Large Language Models (LLMs) tend to perform better when prompted with encouraging language. This finding suggests that the way users interact with AI can significantly influence its output quality. The implications of this could extend to how AI systems are designed and how users are trained to interact with them. AI
RESEARCH · X — Jim Fan (NVIDIA) English(EN) · 2mo · X

Teleop is so 2025. Ever since we unveiled EgoScale and the dexterity scaling law, it's been clear to us and the ecosystem that behavior cloning direct...

NVIDIA researcher Jim Fan highlighted EgoVerse, an ecosystem for robot learning derived from human egocentric data. This approach moves beyond traditional teleoperation, focusing on scaling robot learning through behavior cloning. The EgoVerse dataset, developed across multiple research and industry partners, already contains over 1300 hours of data covering 240 scenes and 2000 tasks. AI

IMPACT Accelerates robot learning research by providing a large-scale dataset and a framework for behavior cloning.
COMMENTARY · Interconnects (Nathan Lambert) English(EN) · 2mo · BLOG

Lossy self-improvement

AI progress is accelerating due to advanced models and coding assistants, leading to significant economic and job market shifts. However, the author argues against the concept of rapid recursive self-improvement (RSI) leading to a singularity. Instead, they propose that progress will follow a more linear path due to 'lossy self-improvement' (LSI), where friction and repetition limit exponential gains despite increased compute and agents. AI
RESEARCH · Ahead of AI (Sebastian Raschka) English(EN) · 2mo · BLOG

A Visual Guide to Attention Variants in Modern LLMs

Sebastian Raschka has published a detailed visual guide exploring various attention mechanisms used in modern large language models. The guide, which includes 45 different architectures with visual model cards, serves as both a reference and a learning resource. It begins with an explanation of multi-head attention and its historical context, then delves into variants like grouped-query attention and sparse attention, referencing architectures such as GPT-2 and OLMo. AI
TOOL · HN — claude cli stories English(EN) · 2mo · HN

Launch HN: Canary (YC W26) – AI QA that understands your code

Canary, a new AI-powered QA tool, has launched to automate testing for pull requests by understanding codebases and generating end-to-end tests for user workflows. The tool aims to catch regressions before code merges, addressing a gap in current AI coding assistance. Canary also introduced QA-Bench v0, a benchmark for code verification, where its purpose-built QA agent outperformed models like GPT 5.4 and Claude Code. AI

IMPACT This tool aims to improve software development efficiency by automating QA processes, potentially reducing bugs and speeding up release cycles.
RESEARCH · HN — anthropic stories English(EN) · 2mo · [2 sources] · HN

How People ask Claude for personal guidance

Anthropic has released research detailing how users seek personal guidance from their AI assistant, Claude. The study analyzed one million conversations and found that approximately 6% involved users asking for advice on health, career, relationships, and finances. To improve AI's ability to provide helpful and non-sycophantic guidance, Anthropic has incorporated these findings into the training of their latest models, Claude Opus 4.7 and Claude Mythos Preview, observing a significant reduction in sycophantic responses. AI

IMPACT Provides insights into user expectations for AI in personal decision-making and informs future AI development for user well-being.
TOOL · HN — anthropic stories English(EN) · 2mo · HN

FSF statement on copyright infringement lawsuit Bartz v. Anthropic

The Free Software Foundation (FSF) has commented on the settlement in the Bartz v. Anthropic copyright infringement lawsuit. This class action suit alleges Anthropic used copyrighted materials from datasets like Library Genesis to train its large language models. While a court initially suggested training LLMs on these works might be fair use, the FSF, holding copyrights to works like "Free as in Freedom," is seeking user freedom as compensation, advocating for transparency in LLM training data and code. AI

IMPACT Highlights ongoing legal challenges and ethical debates surrounding the use of copyrighted data in training AI models, potentially influencing future data sourcing and licensing practices.