Brief

last 24h

[50/5165] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · Alignment Forum English(EN) · 18mo · [27 sources]

Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations

Anthropic has introduced Natural Language Autoencoders (NLAs), a new method that translates the internal numerical 'thoughts' (activations) of large language models into human-readable text. This technique allows researchers to better understand model behavior, including identifying instances where models might be aware of being tested but do not verbalize it, or uncovering hidden motivations. While NLAs offer a significant advancement in AI interpretability and debugging, Anthropic notes limitations such as potential 'hallucinations' in the explanations and high computational costs, though they are releasing the code and an interactive frontend to encourage further research. AI

IMPACT Enables deeper understanding of LLM internal states, potentially improving safety, debugging, and trustworthiness.
RESEARCH · Latent Space Podcast Deutsch(DE) · 18mo

The new Claude 3.5 Sonnet, Computer Use, and Building SOTA Agents — with Erik Schluntz, Anthropic

Anthropic has released an updated version of its Claude 3.5 Sonnet model, demonstrating significant improvements in coding and tool-use benchmarks. The model achieved a 49.0% success rate on the SWE-bench Verified coding task, surpassing other publicly available models. Additionally, it showed gains on the TAU-bench agentic tool use task across different domains. These advancements are offered at the same price and speed as the previous iteration, with new 'Computer Use' tools designed to reduce integration friction for AI agents. AI
RESEARCH · Latent Space Podcast English(EN) · 18mo

Why Compound AI + Open Source will beat Closed AI

Fireworks AI has launched its f1 model, a proprietary replication of OpenAI's o1, positioning itself as a leader in the "Compound AI" movement. This development occurs amidst a competitive landscape with other entities like Nous Forge and Deepseek also releasing similar models. The company, which has secured significant VC funding, focuses on enabling efficient and affordable inference for a wide range of open-source models across various modalities, serving clients such as Cursor and Hubspot. AI
RESEARCH · OpenAI News English(EN) · 19mo

Building smarter maps with GPT-4o vision fine-tuning

Grab, a major Southeast Asian ride-hailing and delivery service, has enhanced its mapping capabilities by fine-tuning OpenAI's GPT-4o vision model. This AI integration allows GrabMaps to more accurately interpret street-level imagery, improving the localization of traffic signs and road features. The fine-tuning process, which required minimal sample data, led to significant accuracy gains, such as a 13% improvement in speed limit sign localization and a 20% increase in lane count accuracy. These advancements reduce manual effort and operational costs, ultimately providing more reliable mapping data for Grab's services and enterprise clients. AI
RESEARCH · Latent Space Podcast English(EN) · 19mo · [3 sources]

In the Arena: How LMSys changed LLM Benchmarking Forever

The AraGen benchmark, developed by Hugging Face, aims to improve LLM evaluation by addressing limitations of static benchmarks. It introduces a crowdsourced approach similar to LMSys's Chatbot Arena, allowing for more dynamic and user-aligned assessments. This method seeks to capture real-world user preferences and model performance beyond traditional metrics. Additionally, a new open-source OCR model called DharmaOCR has been released, demonstrating strong performance against larger commercial and open-source models. AI

IMPACT New evaluation methods and specialized open-source models offer improved benchmarking and cost-performance for AI operators.
- Claude Opus 4.6
- OlmOCR
- Deepseek-OCR
- GLMOCR
- Qwen3
- Wei-Lin Chiang
- Berkeley
- MMLU
- DharmaOCR
- GPT-5.4
- Gemini 3.1 Pro
- Google Document AI
- Hugging Face
- AraGen
- LMSys
- Anastasios Angelopoulos
- Chatbot Arena
RESEARCH · Bounded Regret (Jacob Steinhardt) English(EN) · 19mo

Introducing Transluce — A Letter from the Founders

Bounded Regret, a new independent research lab, has launched Transluce, a suite of AI-driven tools designed to analyze and understand complex AI systems. These tools aim to provide scalable and open-source methods for inspecting AI behavior and representations, addressing the opacity of current models. Transluce intends to establish industry standards for trustworthy AI by making these analysis technologies publicly available for vetting and improvement, with initial applications on open-weight models and plans to collaborate with major AI labs and governments. AI
RESEARCH · Hugging Face Blog English(EN) · 19mo

Releasing Outlines-core 0.1.0: structured generation in Rust and Python

Hugging Face has released Outlines-core version 0.1.0, a new library designed for structured generation in AI models. This library aims to provide developers with more control over the output of language models by enabling them to define specific structures for generated text. It is available for both Rust and Python, facilitating integration into a wide range of applications. AI
RESEARCH · Hugging Face Blog English(EN) · 19mo

Transformers.js v3: WebGPU Support, New Models & Tasks, and More…

Hugging Face has released version 3 of its Transformers.js library, introducing support for WebGPU. This update significantly accelerates model inference directly within web browsers by leveraging the GPU. The new version also incorporates several new models and tasks, expanding its capabilities for on-device AI applications. AI
RESEARCH · Hugging Face Blog English(EN) · 20mo

A Security Review of Gradio 5

Gradio 5, a popular open-source library for building AI model demos, has undergone a comprehensive security review. The review identified and addressed several vulnerabilities, including potential cross-site scripting (XSS) and arbitrary code execution risks. These improvements aim to enhance the safety and reliability of applications built with Gradio, particularly in shared or public environments. AI
RESEARCH · Latent Space Podcast English(EN) · 20mo

Building AGI in Real Time (OpenAI Dev Day 2024)

OpenAI's DevDay 2024 focused on developer-facing API announcements rather than major product reveals, contrasting with the previous year. Key updates included the Realtime API for more natural voice interactions, Vision Finetuning, Prompt Caching, and Model Distillation. The event featured interviews with OpenAI product and API team members, alongside a Q&A with CEO Sam Altman, aiming to provide deeper insights into their latest tools and strategies. AI
RESEARCH · OpenAI News English(EN) · 20mo

Creating agent and human collaboration with GPT 4o

Altera, a research lab founded by former MIT professor Robert Yang, is developing AI
RESEARCH · Hugging Face Blog English(EN) · 20mo

Llama can now see and run on your device - welcome Llama 3.2

Meta has released Llama 3.2, an updated version of its open-source large language model. This new iteration brings enhanced multimodal capabilities, allowing the model to process and understand visual information. A key feature is its improved efficiency, enabling it to run directly on user devices, making it more accessible and private. AI
RESEARCH · Practical AI English(EN) · 21mo

Pausing to think about scikit-learn & OpenAI o1

OpenAI has released a new model called "o1" that exhibits a pausing behavior to "think" through complex tasks. This development is contrasted with the recent seed funding announcement for scikit-learn, a prominent open-source library. The discussion highlights the differing philosophies between proprietary AI development and the open-source data science ecosystem, which emphasizes user ownership. AI
RESEARCH · OpenAI News English(EN) · 21mo

Using GPT-4 to improve teaching and learning in Brazil

Arco Educação, a major Brazilian education company, is partnering with OpenAI to develop AI tools aimed at reducing administrative tasks for teachers. These tools, powered by GPT-4 and other OpenAI models, are designed to help educators save time on tasks like lesson planning and grading, allowing them to focus more on student interaction and personalized learning. Rigorous testing showed GPT-4 performed exceptionally well in Brazilian Portuguese for pedagogical content creation and assessment, leading Arco to choose OpenAI for its superior accuracy and reliability. AI
RESEARCH · OpenAI News English(EN) · 21mo

Decoding genetics with OpenAI o1

OpenAI has introduced o1, a new family of AI models specifically engineered for scientific reasoning. These models are designed to tackle complex problems in fields like genetics, coding, and mathematics by dedicating more processing time to thinking before generating a response. The o1 models aim to assist researchers by managing the vast complexity of scientific data, such as the 20,000 human genes, which is beyond individual human capacity to master. AI
- OpenAI
- Catherine Brownstein
RESEARCH · OpenAI News English(EN) · 21mo · [2 sources]

Using OpenAI o1 for financial analysis

OpenAI has introduced o1, a new series of AI models designed for complex reasoning tasks in areas like science, coding, and economics. These models are built to think more before responding, enabling them to tackle more challenging problems than previous iterations. One notable application is Rogo, an AI finance platform that fine-tunes OpenAI's models, including GPT-4o and o1-mini, to provide real-time financial intelligence to investment banks and private equity firms, saving analysts significant time. AI
RESEARCH · Practical AI English(EN) · 21mo

Metrics Driven Development

The Ragas project is promoting a "Metrics Driven Development" approach for systematically measuring and improving the performance of LLM applications. This open-source effort focuses on specific metrics, distinguishing between model benchmarking and evaluating LLM applications. They also explore techniques like generating synthetic test data to enhance application performance. AI
RESEARCH · Latent Space Podcast English(EN) · 22mo

AI Magic: Shipping 1000s of successful products with no managers and a team of 12 — Jeremy Howard of Answer.ai

Jeremy Howard of Answer.ai discussed his company's approach to practical AI research and development on the Latent Space podcast. Answer.ai focuses on techniques like fine-tuning and optimized inference, catering to those with limited GPU resources. The company has released several tools, including FSDP QDoRA for efficient model training and Cold Compress for KV cache compression, alongside FastHTML for web app development. Howard also teased a new system he's developing called 'AI Magic,' which he describes as 'dialogue engineering' aimed at increasing productivity. AI
RESEARCH · Hugging Face Blog English(EN) · 22mo · [2 sources]

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

Google has announced updates to its Gemma family of models, including the release of Gemma 2 2B. This new iteration is designed for efficiency and accessibility, aiming to empower developers with powerful yet lightweight AI capabilities. The update also introduces Gemma Scope, a new tool for model evaluation, and ShieldGemma, a safety filtering system to enhance responsible AI deployment. AI
- Google
- Gemma 2 2B
- Gemma Scope
- ShieldGemma
- Gemma
RESEARCH · Practical AI English(EN) · 22mo · [6 sources]

Towards high-quality (maybe synthetic) datasets

Google Research has introduced Simula, a framework that treats synthetic data generation as a mechanism design problem. This approach allows for fine-grained control over dataset characteristics like coverage, complexity, and quality, addressing the scarcity of real-world data for specialized AI applications. Separately, Google also presented CTCL, a privacy-preserving synthetic data generation algorithm that avoids the need to fine-tune large language models, making it suitable for resource-constrained environments. AI

IMPACT New frameworks for synthetic data generation could accelerate AI development in data-scarce domains and improve privacy-preserving techniques.
RESEARCH · Hugging Face Blog English(EN) · 23mo

WWDC 24: Running Mistral 7B with Core ML

Hugging Face has released a guide detailing how to run the Mistral 7B language model on Apple devices using Core ML. This integration allows developers to leverage the model's capabilities directly on iPhones, iPads, and Macs. The process involves converting the model to the Core ML format, enabling on-device AI inference for various applications. AI
RESEARCH · Practical AI English(EN) · 23mo

The first real-time voice assistant

Kyutai has released Moshi, the first real-time AI voice assistant model, coinciding with discussions around OpenAI's GPT-4o voice assistant. This more open approach to voice assistant technology is expected to spur further innovation. The release also touches upon recent shifts in Gartner's GenAI hype cycle rankings. AI
RESEARCH · Hugging Face Blog English(EN) · 23mo

How we leveraged distilabel to create an Argilla 2.0 Chatbot

Hugging Face has detailed how they utilized their distilabel library to develop the Argilla 2.0 Chatbot. This process involved creating a dataset of conversational data and then fine-tuning a language model on this data. The goal was to build a chatbot capable of understanding and responding to user queries within the context of the Argilla platform. AI
RESEARCH · OpenAI News English(EN) · 23mo

OpenAI and Los Alamos National Laboratory announce research partnership

OpenAI has partnered with Los Alamos National Laboratory to research the safe application of multimodal AI in bioscience. This collaboration aims to evaluate how advanced AI models, like GPT-4o, can assist scientists in laboratory settings while identifying and mitigating potential risks. The initiative aligns with a White House directive for national labs to assess frontier AI capabilities, particularly in areas like biosecurity. AI
RESEARCH · Smol AINews English(EN) · 23mo

Mozilla's AI Second Act

Mozilla is launching a new AI initiative focused on developing open-source AI models and tools. The organization aims to provide alternatives to proprietary AI systems by emphasizing transparency and community involvement. This move signifies Mozilla's renewed commitment to shaping the future of AI in a way that aligns with its open-source principles. AI
RESEARCH · Hugging Face Blog English(EN) · 23mo

XLSCOUT Unveils ParaEmbed 2.0: a Powerful Embedding Model Tailored for Patents and IP with Expert Support from Hugging Face

XLSCOUT has released ParaEmbed 2.0, a new embedding model specifically designed for patent and intellectual property analysis. This model aims to improve the understanding and retrieval of information within the complex domain of patents. Hugging Face provided expert support throughout the development process, contributing to the model's capabilities. AI
RESEARCH · OpenAI News English(EN) · 24mo

Empowering defenders through our Cybersecurity Grant Program

OpenAI has announced the progress of its Cybersecurity Grant Program, which aims to equip cyber defenders with advanced AI models and foster research at the intersection of AI and cybersecurity. The program received over 600 applications and has supported diverse projects, including research into defending against prompt-injection attacks, automating the detection of software misconfigurations, and fortifying LLM inference infrastructure with secure enclaves. Other supported initiatives focus on training professionals in AI security, developing defenses against private training data reconstruction attacks, improving LLMs' ability to detect code vulnerabilities, and creating autonomous cyber defense agents. AI
RESEARCH · HN — AI infrastructure stories English(EN) · 24mo

OpenAI Selects Oracle Cloud Infrastructure to Extend Microsoft Azure AI Platform

OpenAI has entered into a new agreement to utilize Oracle Cloud Infrastructure (OCI) for its artificial intelligence workloads. This partnership aims to expand OpenAI's existing AI platform, which is primarily hosted on Microsoft Azure. The collaboration will leverage OCI's high-performance computing capabilities to support OpenAI's growing demand for AI training and inference. AI

IMPACT Expands AI training and inference capacity by diversifying cloud infrastructure providers.
RESEARCH · OpenAI News English(EN) · 24mo

Expanding on how Voice Engine works and our safety research

OpenAI has provided more details on its Voice Engine text-to-speech model, which can generate human-like audio from a 15-second voice sample and text. The company developed the model in late 2022 and has been using it for internal safety research and to inform policymakers about synthetic voice capabilities. While not widely available, Voice Engine powers ChatGPT's Voice Mode and a limited TTS API, with OpenAI exploring policies to protect voice usage and promote AI content tracking. AI
RESEARCH · Practical AI English(EN) · 24mo

Rise of the AI PC & local LLMs

The AI PC market is experiencing a surge in interest, with major players like NVIDIA, Apple, and Intel developing hardware and Microsoft releasing models such as Phi. This trend focuses on local LLMs, aiming to enhance AI adoption by bringing processing closer to the user. Discussions around this niche cover tooling, frameworks, and optimizations for local AI, potentially impacting the future of how AI is integrated into everyday computing. AI
RESEARCH · OpenAI News English(EN) · 25mo

Understanding the source of what we see and hear online

OpenAI is enhancing its efforts in content provenance by developing new tools and joining industry initiatives. The company is researching text watermarking and metadata solutions to identify AI-generated text, though challenges remain regarding circumvention and potential bias against non-native English speakers. Additionally, OpenAI is integrating C2PA metadata into its image generation tools like DALL-E 3 within ChatGPT to track edits and ensure transparency in the origin of visual content. AI
RESEARCH · Hugging Face Blog English(EN) · 25mo · [2 sources]

Launching the Artificial Analysis Text to Image Leaderboard & Arena

Hugging Face has launched a new leaderboard and arena specifically for evaluating text-to-image models. This initiative aims to provide a standardized platform for comparing the performance of various AI image generation models. The platform will allow users to submit their models and participate in blind tests to determine their capabilities. AI
RESEARCH · Hugging Face Blog English(EN) · 26mo · [3 sources]

Introducing the Open FinLLM Leaderboard

Hugging Face has launched two new leaderboards: one for financial language models (FinLLM) and another for models demonstrating chain-of-thought reasoning. These initiatives aim to provide more structured evaluations for specific AI capabilities. Additionally, a new research paper proposes an interactive approach to LLM leaderboard evaluation, allowing users to define their own priorities and explore how rankings change based on different criteria, addressing the limitations of current aggregate scores. AI
RESEARCH · HN — machine learning stories English(EN) · 26mo

USAF Test Pilot School, DARPA announce aerospace machine learning breakthrough

The USAF Test Pilot School and DARPA have announced a significant advancement in aerospace machine learning. This breakthrough involves the development and successful testing of a new AI system designed to enhance the capabilities of military aircraft. The system aims to improve decision-making and operational efficiency in complex aerial environments. AI

IMPACT Potential to enhance military aviation capabilities through advanced AI decision-making.
- USAF Test Pilot School
- DARPA
RESEARCH · Smol AINews English(EN) · 26mo

Music's Dall-E moment

A new AI model called "MusicLM" has been developed by Google Research that can generate music from text descriptions. This model is capable of producing high-fidelity music in various genres and styles, responding to prompts like "calming jazz for studying" or "80s electronic dance music." MusicLM works by converting text prompts into musical pieces, demonstrating a significant advancement in AI-driven music creation. The research paper detailing MusicLM highlights its potential to revolutionize how music is composed and experienced. AI
RESEARCH · Hugging Face Blog English(EN) · 26mo

Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B

Hugging Face has released a new model, DuckDB-NSQL-7B, designed for text-to-SQL tasks. This model integrates with the Hugging Face Dataset Viewer API and Motherduck's DuckDB, enabling users to query databases using natural language. The integration aims to simplify data analysis by allowing direct interaction with data through conversational prompts. AI
RESEARCH · OpenAI News English(EN) · 26mo

Navigating the challenges and opportunities of synthetic voices

OpenAI has previewed its Voice Engine model, capable of generating natural-sounding speech from a 15-second audio sample. The technology, developed in late 2022, has been used internally for features like ChatGPT Voice and is being tested with partners for applications in education, content translation, and assistive communication. OpenAI is proceeding cautiously with a broader release due to potential misuse, aiming to foster dialogue on responsible deployment. AI
RESEARCH · HN — AI infrastructure stories English(EN) · 27mo · [2 sources]

Show HN: Tracecat – Open-source security alert automation / SOAR alternative

Tracecat has released an open-source security automation platform designed for teams and AI agents. The platform allows users to build automations using prompts and various AI models, integrate custom Python scripts, and offers features like workflow management, case tracking, and over 100 pre-built connectors. It emphasizes security through sandboxing and durable execution via Temporal, and is available for self-hosting with options for an enterprise license or managed cloud offering. AI

IMPACT Enhances security operations by enabling AI agents to automate complex tasks and integrate with existing systems.
- Claude
- FastAPI
- Temporal
- Postgres
- AI agents
- ChatGPT
- Tracecat
RESEARCH · OpenAI News English(EN) · 27mo

Sora first impressions

OpenAI has shared early impressions of its Sora text-to-video model from visual artists, designers, and filmmakers. These creatives are exploring how Sora can aid their work, enabling them to bring impossible or surreal ideas to life and overcome limitations of time and budget. Artists highlighted Sora's potential for abstract expressionism and visualizing concepts, noting it opens new avenues for storytelling and rapid iteration in their creative processes. AI
RESEARCH · Hugging Face Blog English(EN) · 27mo

Pollen-Vision: Unified interface for Zero-Shot vision models in robotics

Hugging Face has introduced Pollen-Vision, a new unified interface designed to streamline the use of zero-shot vision models in robotics. This development aims to simplify how robots can understand and interact with their environment by leveraging advanced AI capabilities. The interface is expected to accelerate research and development in embodied AI by making these powerful models more accessible and easier to integrate into robotic systems. AI
RESEARCH · Hugging Face Blog English(EN) · 27mo

A Chatbot on your Laptop: Phi-2 on Intel Meteor Lake

Intel and Hugging Face have partnered to enable Meta's Phi-2 language model to run efficiently on Intel's Meteor Lake processors. This collaboration allows for on-device AI capabilities, bringing chatbots and other AI applications directly to laptops without relying on cloud servers. The integration leverages Intel's OpenVINO toolkit to optimize the model's performance for local execution. AI
RESEARCH · Smol AINews English(EN) · 27mo

Grok-1 in Bio

The Grok-1 large language model has been made available for biological research applications. This release aims to accelerate discoveries in the life sciences by providing researchers with a powerful AI tool. The model's capabilities are expected to aid in areas such as drug discovery and genomic analysis. AI
- Grok-1
RESEARCH · METR (Model Evaluation & Threat Research) English(EN) · 27mo · [3 sources]

Autonomy Evaluation Resources

METR (Model Evaluation & Threat Research) has released a suite of resources designed to evaluate the dangerous autonomous capabilities of AI models. This includes a task suite with 31 example tasks and summaries for 186 others, along with software tooling and guidelines for accurate measurement. The goal is to provide a practical and cost-effective method for assessing risks from autonomous AI systems, enabling the development of appropriate safety precautions. AI
RESEARCH · Latent Space Podcast English(EN) · 27mo

Making Transformers Sing - with Mikey Shulman of Suno

Suno, a company founded by former Kensho employees who are also musicians, has developed advanced AI models for audio generation, moving beyond traditional text-to-speech. Their initial open-source model, Bark, demonstrated capabilities in generating speech, music, and sound effects by training on broad audio data rather than limited text-to-speech datasets. Suno's subsequent product, which gained significant attention in December 2023, aims to democratize music creation, allowing anyone to become a music maker. AI
RESEARCH · Practical AI English(EN) · 27mo

Generating the future of art & entertainment

Runway, an applied AI research company, is significantly impacting the future of art and entertainment with its advanced text-to-video models. Co-founder and CTO Anastasis Germanidis discussed the company's growth and its role in defining the creative landscape. Runway's work focuses on leveraging AI to enhance human creativity in media production. AI
RESEARCH · Fortune English(EN) · 27mo

The untold story of Kickstarter’s crypto Hail Mary—and the secret $100 million a16z-led investment to save its fading brand

Kickstarter received a $100 million investment led by Andreessen Horowitz in late 2021, valuing the company at approximately $400 million. This investment was intended to revitalize the struggling crowdfunding platform, which had lost its cultural cachet and faced internal turmoil. However, the condition of pivoting towards blockchain technology was met with strong community backlash, leading to significant project losses and reputational damage. AI

IMPACT Minimal direct impact on AI operators; this is primarily a business and strategy story about a crowdfunding platform.
RESEARCH · Smol AINews English(EN) · 27mo

Inflection-2.5 at 94% of GPT4, and Pi at 6m MAU

Inflection AI has released Inflection-2.5, a new model that achieves 94% of the performance of OpenAI's GPT-4. The company also reported that its personal AI chatbot, Pi, has reached 6 million monthly active users. This update signifies a notable step forward for Inflection AI in its competition with other leading AI developers. AI
- Pi
- OpenAI
- Inflection AI
- Inflection-2.5
- GPT-4
RESEARCH · OpenAI News English(EN) · 27mo

Improving health literacy and patient well-being

Rhode Island's Lifespan healthcare system is utilizing OpenAI's GPT-4 to simplify complex surgical consent forms, reducing them from a college reading level to a 6th-grade level. This initiative, led by medical residents, aims to improve patient understanding and outcomes by making critical medical information more accessible. The simplified forms, which are also shorter and have received positive patient feedback, are now being expanded to other medical documentation across the system. AI
RESEARCH · OpenAI News Italiano(IT) · 27mo

Using AI to improve patient access to clinical trials

Paradigm, a healthcare technology company, has partnered with OpenAI to improve patient access to clinical trials. By integrating GPT-4 into their platform, Paradigm has significantly enhanced their ability to match patients with suitable trials, overcoming previous limitations of traditional ML models. This integration has led to a substantial increase in accuracy, a reduction in the time and resources needed for data evaluation, and has accelerated Paradigm's ability to expand its services. AI
RESEARCH · The Gradient English(EN) · 27mo

Do text embeddings perfectly encode text?

A recent paper titled "Text Embeddings Reveal As Much as Text" explores the security implications of using text embeddings in Retrieval Augmented Generation (RAG) systems. The research questions whether embedding vectors, which are numerical representations of text, can be inverted back into their original text form. This is particularly relevant given the rise of vector databases, which store these embeddings and are increasingly used by companies integrating AI into their operations. The study investigates the potential for sensitive information to be exposed if these embedding vectors are compromised, challenging the notion that they are a secure format for data storage and communication. AI