Pulse

last 48h

[6/6] 89 sources

What AI is actually talking about — clusters surfacing on Bluesky, Reddit, HN, Mastodon and Lobsters, re-ranked to elevate originality and crush noise.

RESEARCH · Lobsters — AI tag · 3d · [3 sources] · LOBSTERSMASTO

Training an LLM in Swift, Part 1: Taking matrix multiplication from Gflop/s to Tflop/s

A developer is exploring how to train a Large Language Model (LLM) using Swift on Apple Silicon, focusing on optimizing matrix multiplication performance. The initial article details a AI

IMPACT Provides insights into optimizing LLM training performance on local hardware, potentially enabling more accessible development.
RESEARCH · Mastodon — fosstodon.org · 4d · [3 sources] · LOBSTERSMASTO

Aurora: A Leverage-Aware Optimizer for Rectangular Matrices https:// lobste.rs/s/2kznvg # ai https:// blog.tilderesearch.com/blog/au rora

Researchers have introduced Aurora, a new optimizer designed to improve the training of large neural networks, particularly those with rectangular matrices. Aurora addresses issues like neuron death in MLP layers that can occur with existing optimizers like Muon, especially when row normalization is applied. By incorporating leverage-awareness and maintaining orthogonality, Aurora demonstrates significant data efficiency, achieving 100x improvement on open-source internet data and outperforming larger models on general evaluations. The optimizer is presented as a drop-in replacement with minimal overhead, and its code has been open-sourced. AI

IMPACT New optimizer Aurora enhances training efficiency and data utilization for large models, potentially accelerating research and development.
RESEARCH · Lobsters — AI tag · 2w · [7 sources] · LOBSTERSMASTO

Open weights are quietly closing up - and that's a problem

Researchers are exploring new methods to enhance AI safety and efficiency. One paper proposes a language-agnostic approach to detect malicious prompts by comparing query embeddings against a fixed English codebook of jailbreak prompts, showing promise but also limitations under distribution shifts. Another study investigates how the wording of schema keys in structured generation tasks can implicitly guide large language models, revealing that different models like Qwen and Llama respond differently to prompt-level versus schema-level instructions. Separately, a discussion highlights the increasing importance and evolving landscape of open-weights models, noting that while they offer cost and privacy advantages, their availability and licensing are becoming more restrictive. AI

IMPACT New research explores cross-lingual safety and structured generation, while open-weights models face licensing shifts, impacting cost and accessibility.
RESEARCH · Google AI / Research · 28mo · [229 sources] · HNLOBSTERSMASTOBLOGREDDIT

Making LLMs more accurate by using all of their layers

Google Research has developed a framework to evaluate the alignment of Large Language Models (LLMs) with human behavioral dispositions, using established psychological assessments adapted into situational judgment tests. This approach quantizes model tendencies against human social inclinations, identifying deviations and areas for improvement in realistic scenarios. Separately, Google Research also introduced SLED (Self Logits Evolution Decoding), a novel method that enhances LLM factuality by utilizing all model layers during the decoding process, thereby reducing hallucinations without external data or fine-tuning. AI

IMPACT New methods from Google Research offer improved LLM alignment and factuality, potentially increasing trust and reliability in AI applications.
RESEARCH · OpenAI News · 75mo · [396 sources] · HNLOBSTERSMASTOBLOG

Better language models and their implications

Google DeepMind has introduced the FACTS Benchmark Suite, a new set of evaluations designed to systematically assess the factuality of large language models across various use cases. This suite includes benchmarks for parametric knowledge, search-based information retrieval, and multimodal understanding, alongside an updated grounding benchmark. The initiative aims to provide a more comprehensive measure of LLM accuracy and is being launched with a public leaderboard on Kaggle to track progress across leading models. AI

IMPACT Establishes a new standard for evaluating LLM factuality, potentially driving improvements in model reliability and trustworthiness.
RESEARCH · OpenAI News · 97mo · [739 sources] · HNLOBSTERSMASTOBLOGREDDITX

AI and compute

Anthropic conducted an experiment where Claude agents acted as digital barterers, successfully negotiating 186 deals totaling over $4,000. Participants found the deals fair, with nearly half expressing willingness to pay for such a service. The experiment highlighted that while model quality, such as Opus versus Haiku, significantly impacted deal outcomes, human participants did not perceive this difference. AI

IMPACT Demonstrates potential for AI agents in complex negotiation and commerce, suggesting future market viability.

Pulse

Training an LLM in Swift, Part 1: Taking matrix multiplication from Gflop/s to Tflop/s

Aurora: A Leverage-Aware Optimizer for Rectangular Matrices https:// lobste.rs/s/2kznvg # ai https:// blog.tilderesearch.com/blog/au rora

Open weights are quietly closing up - and that's a problem

Making LLMs more accurate by using all of their layers

Better language models and their implications

AI and compute