Brief

last 24h

[50/2973] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · Practical AI English(EN) · 44mo

What's up, DocQuery?

Impira has released an open-source ML model called DocQuery, designed to help users query semi-structured and unstructured documents using LLMs. The model can process various document types, including invoices and contracts, enabling users to ask questions and extract information more efficiently. This tool aims to provide practical AI solutions for managing and understanding document-based data. AI
RESEARCH · OpenAI News English(EN) · 45mo

Introducing Whisper

OpenAI has released Whisper, an automatic speech recognition system trained on a massive 680,000 hours of diverse, multilingual data. This extensive training enables Whisper to perform robustly across various accents, background noises, and technical language, while also supporting transcription and translation into English. The system utilizes a Transformer-based encoder-decoder architecture and is being open-sourced to foster application development and further research in speech processing. AI
RESEARCH · Hugging Face Blog English(EN) · 45mo · [2 sources]

Optimization story: Bloom inference

Hugging Face has released new optimization techniques for the BLOOM language model, significantly improving its inference speed. These advancements leverage DeepSpeed and Hugging Face's Accelerate library, enabling faster and more efficient deployment of BLOOM. The optimizations are detailed in recent blog posts, offering practical guidance for developers working with large language models. AI
RESEARCH · Hugging Face Blog English(EN) · 45mo

What's new in Diffusers? 🎨

Hugging Face has released version 0.29.0 of its Diffusers library, introducing significant enhancements for diffusion models. Key updates include improved support for latent consistency models (LCMs) and LoRA, alongside performance optimizations for faster inference. This release also brings new features for handling model conditioning and expands the library's capabilities for advanced image generation tasks. AI
RESEARCH · Hugging Face Blog English(EN) · 45mo

Train your first Decision Transformer

Hugging Face has released a guide on how to train Decision Transformers, a type of model that frames reinforcement learning as a sequence modeling problem. The blog post details the process of training these transformers, which can be used for various decision-making tasks. It aims to make this advanced technique more accessible to developers. AI
RESEARCH · Hugging Face Blog English(EN) · 46mo · [6 sources]

Introducing Modular Diffusers - Composable Building Blocks for Diffusion Pipelines

Hugging Face has released Stable Diffusion 3.5 Large, an updated version of its text-to-image generation model. This release is part of a broader effort to introduce modularity and efficiency to diffusion models through the Diffusers library. The library now supports composable building blocks for diffusion pipelines, memory-efficient training with technologies like Quanto, and streamlined workflows for techniques such as Dreambooth. AI
RESEARCH · Hugging Face Blog English(EN) · 46mo

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

Hugging Face has integrated the bitsandbytes library to enable efficient 8-bit matrix multiplication for large transformer models. This optimization significantly reduces memory usage, allowing for the training and inference of bigger models on existing hardware. The integration aims to make advanced AI model development more accessible by lowering computational barriers. AI
RESEARCH · Practical AI English(EN) · 46mo

CMU's AI pilot lands in the news 🗞

Carnegie Mellon University has developed an AI pilot capable of navigating complex and crowded airspace. This advancement was highlighted in a recent discussion covering various AI topics, including infrastructure tools like Baseten's Truss and advancements in transformer models. The AI's ability to manage aerial traffic was a notable point of interest. AI
RESEARCH · OpenAI News English(EN) · 46mo · [2 sources]

Upgrading the Moderation API with our new multimodal moderation model

OpenAI has released an upgraded Moderation API, powered by a new multimodal model based on GPT-4o. This enhanced model offers improved accuracy in detecting harmful text and images, particularly in non-English languages, and supports new categories like illicit activities. The update aims to provide developers with more robust tools for content safety, enabling them to build more secure AI applications and products. AI
RESEARCH · Hugging Face Blog English(EN) · 47mo

Nyströmformer: Approximating self-attention in linear time and memory via the Nyström method

Researchers have developed Nyströmformer, a novel approach to approximating self-attention mechanisms in transformer models. This method utilizes the Nyström method to achieve linear time and memory complexity, a significant improvement over the quadratic complexity of standard self-attention. The innovation holds promise for enabling transformers to handle much longer sequences more efficiently. AI
RESEARCH · Hugging Face Blog English(EN) · 47mo

Faster Text Generation with TensorFlow and XLA

Hugging Face has integrated TensorFlow and XLA to significantly accelerate text generation. This optimization allows for faster inference speeds, making it more efficient to deploy large language models. The improvements are particularly noticeable for users leveraging TensorFlow within the Hugging Face ecosystem. AI
RESEARCH · OpenAI News English(EN) · 47mo

A hazard analysis framework for code synthesis large language models

OpenAI has developed a hazard analysis framework to identify potential risks associated with large language models that generate code, such as their model Codex. This framework aims to uncover technical, social, political, and economic safety concerns that may arise from the deployment of these powerful code-synthesis tools. The analysis is supported by a new evaluation system that assesses the models' ability to understand and execute complex prompts compared to human capabilities. AI
RESEARCH · Practical AI English(EN) · 47mo

DALL-E is one giant leap for raccoons! 🔭

OpenAI has released DALL-E 2, a new model capable of generating detailed images from text descriptions. While some in the AI community speculate about models approaching sentience, the hosts of this podcast dismiss such notions. They highlight DALL-E 2's impressive capabilities, particularly its ability to create imaginative visuals like raccoons in space. AI
RESEARCH · OpenAI News English(EN) · 47mo

Reducing bias and improving safety in DALL·E 2

OpenAI has implemented a new system-level technique for DALL·E 2 to generate more diverse images of people when race or gender are not specified in prompts. This change, informed by user feedback during a research preview, has resulted in users being 12 times more likely to see diverse representations. Additionally, OpenAI has enhanced safety measures by rejecting realistic face uploads, limiting public figure likeness generation, and refining content filters and monitoring systems to prevent misuse and deceptive content. AI
RESEARCH · Hugging Face Blog English(EN) · 47mo

How to train your model dynamically using adversarial data

Hugging Face has released a guide on dynamically training models using adversarial data. This method involves generating adversarial examples during the training process to improve model robustness. The guide uses the MNIST dataset as a practical example to demonstrate the techniques involved. AI
RESEARCH · Hugging Face Blog English(EN) · 47mo

The Technology Behind BLOOM Training

BLOOM, an open-access large language model, was trained using a combination of Megatron-LM and DeepSpeed. This approach allowed for efficient training across multiple GPUs by distributing the model and data. The training process involved careful management of hardware resources and software configurations to achieve optimal performance. AI
RESEARCH · Practical AI English(EN) · 47mo · [2 sources]

Cloning voices with Coqui

Coqui, a speech technology startup, is making significant contributions to open-source speech technology and voice cloning. The company focuses on open access models and data, enabling the creation of emotionally resonant cloned voices. Coqui's work is being utilized by creators to develop new AI-driven applications. AI
RESEARCH · OpenAI News English(EN) · 48mo

DALL·E 2 pre-training mitigations

OpenAI has detailed its pre-training mitigations for the DALL·E 2 image generation model, focusing on how the training data was modified to reduce risks. The company filtered out violent and sexual imagery from the dataset to prevent the model from generating such content. Additionally, OpenAI addressed potential biases introduced by data filtering and implemented techniques to mitigate image memorization by removing visually similar images. AI
RESEARCH · OpenAI News English(EN) · 48mo

Learning to play Minecraft with Video PreTraining

OpenAI has developed a new method called Video PreTraining (VPT) to train AI agents using vast amounts of unlabeled online video data. This technique involves first training an inverse dynamics model on a small set of labeled videos to predict actions, which then labels a larger dataset. The trained model, demonstrated in Minecraft, can perform complex tasks like crafting diamond tools, showcasing a step towards general AI agents capable of interacting with computer interfaces. AI
RESEARCH · OpenAI News English(EN) · 48mo

Evolution through large models

OpenAI researchers have introduced Evolution through Large Models (ELM), a novel approach that leverages large language models (LLMs) trained on code to enhance genetic programming. This method uses LLMs to generate effective mutation operators for programs, enabling the creation of numerous functional examples in previously unseen domains. The research demonstrates ELM's potential to bootstrap new conditional language models capable of generating context-appropriate outputs, with implications for open-endedness, deep learning, and reinforcement learning. AI
RESEARCH · OpenAI News English(EN) · 48mo

AI-written critiques help humans notice flaws

OpenAI has developed AI models capable of writing critiques to help human evaluators identify flaws in summaries. These AI assistants significantly improve human detection of errors, increasing the rate of flaw identification by 50% in general cases and from 27% to 45% for deliberately misleading summaries. The research indicates that larger models are more adept at self-critiquing and can use these critiques to improve their own outputs, although a gap remains between their ability to detect flaws and articulate them. AI
RESEARCH · Practical AI English(EN) · 48mo

Generalist models & Iceman's voice

DeepMind has unveiled Gato, a generalist AI model capable of performing a wide array of tasks. This single model can play video games, generate image captions, engage in chat conversations, and even operate robotic arms. The development signifies a step towards more versatile AI systems that can handle diverse functions. AI
RESEARCH · Hugging Face Blog English(EN) · 48mo · [436 sources]

The Annotated Diffusion Model

Apple's research paper explores the mechanisms behind compositional generalization in conditional diffusion models, particularly focusing on how these models handle generating images with more objects than trained on. The study identifies 'local conditional scores' as a key factor enabling this ability, demonstrating that models succeeding at length generalization exhibit these scores, while those that fail do not. The research also proposes a method to enforce these local scores, which successfully enabled length generalization in a previously underperforming model. AI

IMPACT Research into diffusion model generalization could lead to more robust and controllable image generation systems.
RESEARCH · Hugging Face Blog English(EN) · 49mo

Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers

Graphcore has partnered with Hugging Face to optimize its Intelligence Processing Unit (IPU) hardware for transformer models. This collaboration aims to improve the efficiency and performance of training and deploying large language models on Graphcore's IPUs. The initiative includes making popular transformer models readily available and optimized for the IPU architecture, facilitating easier adoption for researchers and developers. AI
RESEARCH · Hugging Face Blog English(EN) · 49mo

Efficient Table Pre-training without Real Data: An Introduction to TAPEX

Researchers have introduced TAPEX, a novel pre-training method for enhancing table understanding in language models. This approach leverages a "table-to-text" objective, allowing models to generate textual representations of tabular data. TAPEX demonstrates improved performance on various table-related downstream tasks, offering a more efficient way to train models on structured information without requiring extensive real-world datasets. AI
RESEARCH · OpenAI News English(EN) · 49mo

DALL·E 2 research preview update

OpenAI is expanding access to its DALL-E 2 research preview, inviting up to 1,000 new users weekly from its waitlist. The company has focused on enhancing safety systems, with less than 0.05% of shared images flagged for policy violations. OpenAI is also actively working to address biases in the model inherited from its training data, requesting early users to avoid sharing photorealistic images with faces. AI
RESEARCH · Practical AI English(EN) · 51mo

It's been a BIG week in AI news 🗞

BigScience is currently training a large language model, attracting significant global attention. Concurrently, NVIDIA has unveiled its newest generation of GPUs, the "Hopper" series. These developments, alongside other AI-related news, were discussed in a recent episode of Practical AI. AI
- BigScience
- NVIDIA
- Hopper
RESEARCH · Hugging Face Blog English(EN) · 51mo

Fine-Tune a Semantic Segmentation Model with a Custom Dataset

Hugging Face has published a guide detailing how to fine-tune a semantic segmentation model using a custom dataset. The tutorial focuses on the SegFormer model, demonstrating the process of adapting it for specific segmentation tasks. This guide is intended to help users leverage pre-trained models and tailor them to their unique data requirements. AI
RESEARCH · OpenAI News English(EN) · 51mo

New GPT-3 capabilities: Edit & insert

OpenAI has introduced new GPT-3 and Codex capabilities that allow for editing and inserting content within existing text, moving beyond simple text completion. The 'insert' feature enables contextually relevant additions in the middle of text or code, improving applications like long-form writing and code generation. Additionally, a new 'edits' endpoint allows for modifications to existing text based on specific instructions, useful for tasks such as refactoring code, changing tone, or fixing errors. These features are now available in beta via the OpenAI API and are being piloted in tools like GitHub Copilot. AI
RESEARCH · Hugging Face Blog English(EN) · 51mo · [2 sources]

Generating Human-level Text with Contrastive Search in Transformers 🤗

Hugging Face has introduced two new text generation techniques for its Transformers library: contrastive search and constrained beam search. Contrastive search aims to produce more human-like text by balancing likelihood and distinctiveness, while constrained beam search allows users to guide the generation process with specific rules or patterns. These methods offer developers more control and improved quality for text generation tasks within the Hugging Face ecosystem. AI
RESEARCH · Practical AI English(EN) · 52mo

One algorithm to rule them all?

Researchers have developed an AI system capable of quickly predicting protein attachments, a significant advancement in biological research. Additionally, a new self-supervised algorithm from Meta AI demonstrates high performance across speech, vision, and text modalities. DeepMind has also announced an AI coding engine that matches the proficiency of an average human programmer. AI
RESEARCH · Hugging Face Blog English(EN) · 52mo

Fine-Tune ViT for Image Classification with 🤗 Transformers

Hugging Face has released a guide on fine-tuning the Vision Transformer (ViT) model for image classification tasks. The tutorial utilizes the 🤗 Transformers library, demonstrating how to adapt a pre-trained ViT model to a specific dataset. This process allows developers to leverage powerful pre-trained models for custom image recognition applications without training from scratch. AI
RESEARCH · EleutherAI Blog English(EN) · 53mo

Announcing GPT-NeoX-20B

EleutherAI has released GPT-NeoX-20B, a 20 billion parameter open-source language model trained using their GPT-NeoX framework. This model is notable for being the largest publicly accessible pretrained autoregressive language model to date. The release aims to facilitate research into the safe use of AI systems, with the model available via inference services and a public release scheduled after a seven-day delay. AI
RESEARCH · OpenAI News English(EN) · 53mo

Solving (some) formal math olympiad problems

OpenAI has developed a neural theorem prover for the Lean formal proof assistant that can solve challenging high-school olympiad math problems. The system utilizes a language model to discover proofs, iteratively improving its performance by using newly found proofs as training data. This approach achieved a new state-of-the-art on the miniF2F benchmark, outperforming previous methods. AI
RESEARCH · Hugging Face Blog English(EN) · 53mo

Case Study: Millisecond Latency using Hugging Face Infinity and modern CPUs

Hugging Face has released Infinity, a new inference engine designed to optimize large language model performance on modern CPUs. This engine achieves millisecond latency by leveraging techniques like quantization and efficient memory management. The goal is to make powerful LLMs more accessible and cost-effective for a wider range of applications without requiring specialized hardware. AI
RESEARCH · Hugging Face Blog English(EN) · 53mo · [2 sources]

Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers

Hugging Face has released updates to its Transformers library, enhancing the Wav2Vec2 model for automatic speech recognition (ASR). The library now supports processing large audio files by implementing chunking, which breaks down large files into smaller, manageable segments. Additionally, performance is boosted through the integration of n-grams, further improving the accuracy and efficiency of speech recognition tasks. AI
RESEARCH · Hugging Face Blog English(EN) · 54mo

Perceiver IO: a scalable, fully-attentional model that works on any modality

Perceiver IO is a new AI model architecture developed by DeepMind that utilizes a fully attentional mechanism to process information from various modalities. Unlike previous models that required modality-specific input processing, Perceiver IO can handle diverse data types like images, audio, and text directly. This approach aims to create a more scalable and unified framework for multimodal AI research and applications. AI
RESEARCH · Hugging Face Blog English(EN) · 54mo

Training CodeParrot 🦜 from Scratch

Hugging Face has released CodeParrot, a new large language model specifically trained for code generation. The model was built from scratch using a novel training approach that emphasizes efficiency and performance. CodeParrot is designed to assist developers by generating code snippets, completing code, and potentially aiding in debugging tasks. AI
RESEARCH · Hugging Face Blog English(EN) · 55mo

Introducing Snowball Fight ☃️, our first ML-Agents environment

Hugging Face has released Snowball Fight, a new machine learning environment designed for training agents. This environment is built using the ML-Agents toolkit and aims to provide a platform for developing and testing AI agents in a simulated setting. The release is intended to foster innovation in reinforcement learning and agent-based AI development within the community. AI
RESEARCH · METR (Model Evaluation & Threat Research) English(EN) · 55mo · [5 sources]

2023 Year In Review

METR, an AI safety research organization, detailed its 2023 accomplishments, including developing methodologies for evaluating AI agents on autonomous tasks and contributing to OpenAI's GPT-4 system card. The organization also proposed "Responsible Scaling Policies" (RSPs), a framework for AI safety that gained traction among researchers and companies like Anthropic and OpenAI. Additionally, METR partnered with the UK AI Safety Institute and evaluated GPT-5.1 for catastrophic risks. AI
RESEARCH · Practical AI English(EN) · 55mo

Zero-shot multitask learning

The BigScience research workshop, a year-long initiative by Hugging Face, has released the T0 family of AI models. These models are specifically designed to explore zero-shot multitask learning in natural language processing. The T0 models demonstrate the potential for AI to generalize across various tasks without explicit training for each one. AI
RESEARCH · Hugging Face Blog English(EN) · 55mo

Accelerating PyTorch distributed fine-tuning with Intel technologies

Hugging Face has partnered with Intel to optimize PyTorch distributed fine-tuning using Intel's latest technologies. This collaboration focuses on enhancing performance and efficiency for large language model training. The integration aims to leverage Intel's hardware advancements to accelerate the fine-tuning process, making it more accessible and faster for researchers and developers. AI
RESEARCH · OpenAI News English(EN) · 56mo

Solving math word problems

OpenAI has developed a new system capable of solving grade school math word problems with nearly double the accuracy of previous GPT-3 models. This system achieves approximately 90% of the performance of real children in the 9-12 age range by training the model to recognize and correct its own errors through repeated attempts. The approach involves using verifiers to evaluate multiple candidate solutions, selecting the best one, which offers a significant performance boost and appears to scale more effectively with data than simply increasing model size. AI
RESEARCH · Hugging Face Blog English(EN) · 56mo

The Age of Machine Learning As Code Has Arrived

Hugging Face has announced a new initiative, "Machine Learning as Code," aiming to standardize how machine learning models are developed, shared, and deployed. This approach treats ML models like software code, emphasizing version control, reproducibility, and collaboration. The goal is to streamline the ML lifecycle, making it more accessible and efficient for developers and researchers. AI
RESEARCH · Hugging Face Blog English(EN) · 56mo

Fine tuning CLIP with Remote Sensing (Satellite) images and captions

Hugging Face has released a guide on fine-tuning the CLIP model using remote sensing images and their corresponding captions. This process involves adapting the pre-trained CLIP model to better understand and associate visual information from satellite imagery with textual descriptions. The guide details the steps and considerations for this specialized application of CLIP, enabling more accurate analysis and retrieval of geospatial data. AI
RESEARCH · Hugging Face Blog Dansk(DA) · 57mo

Summer at Hugging Face

Hugging Face is hosting a series of events and releasing new features throughout the summer. These initiatives aim to foster community engagement and advance the open-source AI ecosystem. Key highlights include new model releases, educational content, and opportunities for developers to collaborate and showcase their work. AI
RESEARCH · Hugging Face Blog English(EN) · 59mo

Deep Learning over the Internet: Training Language Models Collaboratively

Hugging Face has introduced a new framework enabling collaborative training of large language models over the internet. This approach allows multiple parties to contribute to training without sharing their raw data, addressing privacy and security concerns. The system leverages techniques to ensure that individual data remains private while still enabling the collective model to learn from diverse datasets. AI
RESEARCH · EleutherAI Blog English(EN) · 60mo · [3 sources]

EleutherAI Second Retrospective: The long version

EleutherAI has released a retrospective detailing their work over the past year and a half. Key achievements include the development of the open-source LLM GPT-NeoX-20B and contributions to text-to-image generation models like VQGAN-CLIP. The organization has also seen several members depart to found new AI research entities focused on alignment, preference learning, and biomedical applications. AI
RESEARCH · Hugging Face Blog English(EN) · 61mo · [2 sources]

SetFit: Efficient Few-Shot Learning Without Prompts

Hugging Face has introduced SetFit, a novel few-shot learning approach that achieves state-of-the-art performance without requiring prompt engineering. This method utilizes a two-stage process: first, it fine-tunes a model on a small set of labeled data, and then it generates synthetic data from this fine-tuned model to further train it. SetFit has demonstrated impressive results, outperforming prompt-based methods like few-shot GPT-3 on several benchmarks, and is available as an open-source library. AI
RESEARCH · EleutherAI Blog English(EN) · 61mo

Why Release a Large Language Model?

EleutherAI has detailed its reasoning for releasing large language models, emphasizing that open access is crucial for advancing AI safety research. The organization argues that significant safety studies, particularly in model interpretability, can only be effectively conducted with access to these powerful models. They believe that the potential dangers of current large language models are not world-ending and that releasing them allows for critical safety research to be performed before models become significantly more powerful and potentially uncontrollable. Furthermore, EleutherAI contends that attempts to restrict access to this technology are futile, as well-funded actors can replicate it, making open release the best strategy to empower society to study and utilize it for beneficial purposes. AI