PulseAugur / Brief
EN
LIVE 16:41:20

Brief

last 24h
[50/2973] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. What's up, DocQuery?

    Impira has released an open-source ML model called DocQuery, designed to help users query semi-structured and unstructured documents using LLMs. The model can process various document types, including invoices and contracts, enabling users to ask questions and extract information more efficiently. This tool aims to provide practical AI solutions for managing and understanding document-based data. AI

    What's up, DocQuery?
  2. Introducing Whisper

    OpenAI has released Whisper, an automatic speech recognition system trained on a massive 680,000 hours of diverse, multilingual data. This extensive training enables Whisper to perform robustly across various accents, background noises, and technical language, while also supporting transcription and translation into English. The system utilizes a Transformer-based encoder-decoder architecture and is being open-sourced to foster application development and further research in speech processing. AI

    Introducing Whisper
  3. Optimization story: Bloom inference

    Hugging Face has released new optimization techniques for the BLOOM language model, significantly improving its inference speed. These advancements leverage DeepSpeed and Hugging Face's Accelerate library, enabling faster and more efficient deployment of BLOOM. The optimizations are detailed in recent blog posts, offering practical guidance for developers working with large language models. AI

    Optimization story: Bloom inference
  4. What's new in Diffusers? 🎨

    Hugging Face has released version 0.29.0 of its Diffusers library, introducing significant enhancements for diffusion models. Key updates include improved support for latent consistency models (LCMs) and LoRA, alongside performance optimizations for faster inference. This release also brings new features for handling model conditioning and expands the library's capabilities for advanced image generation tasks. AI

    What's new in Diffusers? 🎨
  5. Train your first Decision Transformer

    Hugging Face has released a guide on how to train Decision Transformers, a type of model that frames reinforcement learning as a sequence modeling problem. The blog post details the process of training these transformers, which can be used for various decision-making tasks. It aims to make this advanced technique more accessible to developers. AI

    Train your first Decision Transformer
  6. Introducing Modular Diffusers - Composable Building Blocks for Diffusion Pipelines

    Hugging Face has released Stable Diffusion 3.5 Large, an updated version of its text-to-image generation model. This release is part of a broader effort to introduce modularity and efficiency to diffusion models through the Diffusers library. The library now supports composable building blocks for diffusion pipelines, memory-efficient training with technologies like Quanto, and streamlined workflows for techniques such as Dreambooth. AI

    Introducing Modular Diffusers - Composable Building Blocks for Diffusion Pipelines
  7. A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

    Hugging Face has integrated the bitsandbytes library to enable efficient 8-bit matrix multiplication for large transformer models. This optimization significantly reduces memory usage, allowing for the training and inference of bigger models on existing hardware. The integration aims to make advanced AI model development more accessible by lowering computational barriers. AI

    A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes
  8. CMU's AI pilot lands in the news 🗞

    Carnegie Mellon University has developed an AI pilot capable of navigating complex and crowded airspace. This advancement was highlighted in a recent discussion covering various AI topics, including infrastructure tools like Baseten's Truss and advancements in transformer models. The AI's ability to manage aerial traffic was a notable point of interest. AI

    CMU's AI pilot lands in the news 🗞
  9. Upgrading the Moderation API with our new multimodal moderation model

    OpenAI has released an upgraded Moderation API, powered by a new multimodal model based on GPT-4o. This enhanced model offers improved accuracy in detecting harmful text and images, particularly in non-English languages, and supports new categories like illicit activities. The update aims to provide developers with more robust tools for content safety, enabling them to build more secure AI applications and products. AI

    Upgrading the Moderation API with our new multimodal moderation model
  10. Nyströmformer: Approximating self-attention in linear time and memory via the Nyström method

    Researchers have developed Nyströmformer, a novel approach to approximating self-attention mechanisms in transformer models. This method utilizes the Nyström method to achieve linear time and memory complexity, a significant improvement over the quadratic complexity of standard self-attention. The innovation holds promise for enabling transformers to handle much longer sequences more efficiently. AI

    Nyströmformer: Approximating self-attention in linear time and memory via the Nyström method
  11. Faster Text Generation with TensorFlow and XLA

    Hugging Face has integrated TensorFlow and XLA to significantly accelerate text generation. This optimization allows for faster inference speeds, making it more efficient to deploy large language models. The improvements are particularly noticeable for users leveraging TensorFlow within the Hugging Face ecosystem. AI

    Faster Text Generation with TensorFlow and XLA
  12. A hazard analysis framework for code synthesis large language models

    OpenAI has developed a hazard analysis framework to identify potential risks associated with large language models that generate code, such as their model Codex. This framework aims to uncover technical, social, political, and economic safety concerns that may arise from the deployment of these powerful code-synthesis tools. The analysis is supported by a new evaluation system that assesses the models' ability to understand and execute complex prompts compared to human capabilities. AI

    A hazard analysis framework for code synthesis large language models
  13. DALL-E is one giant leap for raccoons! 🔭

    OpenAI has released DALL-E 2, a new model capable of generating detailed images from text descriptions. While some in the AI community speculate about models approaching sentience, the hosts of this podcast dismiss such notions. They highlight DALL-E 2's impressive capabilities, particularly its ability to create imaginative visuals like raccoons in space. AI

    DALL-E is one giant leap for raccoons! 🔭
  14. Reducing bias and improving safety in DALL·E 2

    OpenAI has implemented a new system-level technique for DALL·E 2 to generate more diverse images of people when race or gender are not specified in prompts. This change, informed by user feedback during a research preview, has resulted in users being 12 times more likely to see diverse representations. Additionally, OpenAI has enhanced safety measures by rejecting realistic face uploads, limiting public figure likeness generation, and refining content filters and monitoring systems to prevent misuse and deceptive content. AI

    Reducing bias and improving safety in DALL·E 2
  15. How to train your model dynamically using adversarial data

    Hugging Face has released a guide on dynamically training models using adversarial data. This method involves generating adversarial examples during the training process to improve model robustness. The guide uses the MNIST dataset as a practical example to demonstrate the techniques involved. AI

    How to train your model dynamically using adversarial data
  16. The Technology Behind BLOOM Training

    BLOOM, an open-access large language model, was trained using a combination of Megatron-LM and DeepSpeed. This approach allowed for efficient training across multiple GPUs by distributing the model and data. The training process involved careful management of hardware resources and software configurations to achieve optimal performance. AI

    The Technology Behind BLOOM Training
  17. DALL·E 2 pre-training mitigations

    OpenAI has detailed its pre-training mitigations for the DALL·E 2 image generation model, focusing on how the training data was modified to reduce risks. The company filtered out violent and sexual imagery from the dataset to prevent the model from generating such content. Additionally, OpenAI addressed potential biases introduced by data filtering and implemented techniques to mitigate image memorization by removing visually similar images. AI

    DALL·E 2 pre-training mitigations
  18. Learning to play Minecraft with Video PreTraining

    OpenAI has developed a new method called Video PreTraining (VPT) to train AI agents using vast amounts of unlabeled online video data. This technique involves first training an inverse dynamics model on a small set of labeled videos to predict actions, which then labels a larger dataset. The trained model, demonstrated in Minecraft, can perform complex tasks like crafting diamond tools, showcasing a step towards general AI agents capable of interacting with computer interfaces. AI

    Learning to play Minecraft with Video PreTraining
  19. Evolution through large models

    OpenAI researchers have introduced Evolution through Large Models (ELM), a novel approach that leverages large language models (LLMs) trained on code to enhance genetic programming. This method uses LLMs to generate effective mutation operators for programs, enabling the creation of numerous functional examples in previously unseen domains. The research demonstrates ELM's potential to bootstrap new conditional language models capable of generating context-appropriate outputs, with implications for open-endedness, deep learning, and reinforcement learning. AI

    Evolution through large models
  20. AI-written critiques help humans notice flaws

    OpenAI has developed AI models capable of writing critiques to help human evaluators identify flaws in summaries. These AI assistants significantly improve human detection of errors, increasing the rate of flaw identification by 50% in general cases and from 27% to 45% for deliberately misleading summaries. The research indicates that larger models are more adept at self-critiquing and can use these critiques to improve their own outputs, although a gap remains between their ability to detect flaws and articulate them. AI

    AI-written critiques help humans notice flaws
  21. Generalist models & Iceman's voice

    DeepMind has unveiled Gato, a generalist AI model capable of performing a wide array of tasks. This single model can play video games, generate image captions, engage in chat conversations, and even operate robotic arms. The development signifies a step towards more versatile AI systems that can handle diverse functions. AI

    Generalist models & Iceman's voice
  22. The Annotated Diffusion Model

    Apple's research paper explores the mechanisms behind compositional generalization in conditional diffusion models, particularly focusing on how these models handle generating images with more objects than trained on. The study identifies 'local conditional scores' as a key factor enabling this ability, demonstrating that models succeeding at length generalization exhibit these scores, while those that fail do not. The research also proposes a method to enforce these local scores, which successfully enabled length generalization in a previously underperforming model. AI

    The Annotated Diffusion Model

    IMPACT Research into diffusion model generalization could lead to more robust and controllable image generation systems.

  23. Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers

    Graphcore has partnered with Hugging Face to optimize its Intelligence Processing Unit (IPU) hardware for transformer models. This collaboration aims to improve the efficiency and performance of training and deploying large language models on Graphcore's IPUs. The initiative includes making popular transformer models readily available and optimized for the IPU architecture, facilitating easier adoption for researchers and developers. AI

    Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers
  24. Efficient Table Pre-training without Real Data: An Introduction to TAPEX

    Researchers have introduced TAPEX, a novel pre-training method for enhancing table understanding in language models. This approach leverages a "table-to-text" objective, allowing models to generate textual representations of tabular data. TAPEX demonstrates improved performance on various table-related downstream tasks, offering a more efficient way to train models on structured information without requiring extensive real-world datasets. AI

    Efficient Table Pre-training without Real Data: An Introduction to TAPEX
  25. DALL·E 2 research preview update

    OpenAI is expanding access to its DALL-E 2 research preview, inviting up to 1,000 new users weekly from its waitlist. The company has focused on enhancing safety systems, with less than 0.05% of shared images flagged for policy violations. OpenAI is also actively working to address biases in the model inherited from its training data, requesting early users to avoid sharing photorealistic images with faces. AI

    DALL·E 2 research preview update
  26. It's been a BIG week in AI news 🗞

    BigScience is currently training a large language model, attracting significant global attention. Concurrently, NVIDIA has unveiled its newest generation of GPUs, the "Hopper" series. These developments, alongside other AI-related news, were discussed in a recent episode of Practical AI. AI

    It's been a BIG week in AI news 🗞
  27. Fine-Tune a Semantic Segmentation Model with a Custom Dataset

    Hugging Face has published a guide detailing how to fine-tune a semantic segmentation model using a custom dataset. The tutorial focuses on the SegFormer model, demonstrating the process of adapting it for specific segmentation tasks. This guide is intended to help users leverage pre-trained models and tailor them to their unique data requirements. AI

    Fine-Tune a Semantic Segmentation Model with a Custom Dataset
  28. New GPT-3 capabilities: Edit & insert

    OpenAI has introduced new GPT-3 and Codex capabilities that allow for editing and inserting content within existing text, moving beyond simple text completion. The 'insert' feature enables contextually relevant additions in the middle of text or code, improving applications like long-form writing and code generation. Additionally, a new 'edits' endpoint allows for modifications to existing text based on specific instructions, useful for tasks such as refactoring code, changing tone, or fixing errors. These features are now available in beta via the OpenAI API and are being piloted in tools like GitHub Copilot. AI

    New GPT-3 capabilities: Edit & insert
  29. Generating Human-level Text with Contrastive Search in Transformers 🤗

    Hugging Face has introduced two new text generation techniques for its Transformers library: contrastive search and constrained beam search. Contrastive search aims to produce more human-like text by balancing likelihood and distinctiveness, while constrained beam search allows users to guide the generation process with specific rules or patterns. These methods offer developers more control and improved quality for text generation tasks within the Hugging Face ecosystem. AI

    Generating Human-level Text with Contrastive Search in Transformers 🤗
  30. One algorithm to rule them all?

    Researchers have developed an AI system capable of quickly predicting protein attachments, a significant advancement in biological research. Additionally, a new self-supervised algorithm from Meta AI demonstrates high performance across speech, vision, and text modalities. DeepMind has also announced an AI coding engine that matches the proficiency of an average human programmer. AI

    One algorithm to rule them all?
  31. Fine-Tune ViT for Image Classification with 🤗 Transformers

    Hugging Face has released a guide on fine-tuning the Vision Transformer (ViT) model for image classification tasks. The tutorial utilizes the 🤗 Transformers library, demonstrating how to adapt a pre-trained ViT model to a specific dataset. This process allows developers to leverage powerful pre-trained models for custom image recognition applications without training from scratch. AI

    Fine-Tune ViT for Image Classification with 🤗 Transformers
  32. Announcing GPT-NeoX-20B

    EleutherAI has released GPT-NeoX-20B, a 20 billion parameter open-source language model trained using their GPT-NeoX framework. This model is notable for being the largest publicly accessible pretrained autoregressive language model to date. The release aims to facilitate research into the safe use of AI systems, with the model available via inference services and a public release scheduled after a seven-day delay. AI

    Announcing GPT-NeoX-20B
  33. Solving (some) formal math olympiad problems

    OpenAI has developed a neural theorem prover for the Lean formal proof assistant that can solve challenging high-school olympiad math problems. The system utilizes a language model to discover proofs, iteratively improving its performance by using newly found proofs as training data. This approach achieved a new state-of-the-art on the miniF2F benchmark, outperforming previous methods. AI

    Solving (some) formal math olympiad problems
  34. Case Study: Millisecond Latency using Hugging Face Infinity and modern CPUs

    Hugging Face has released Infinity, a new inference engine designed to optimize large language model performance on modern CPUs. This engine achieves millisecond latency by leveraging techniques like quantization and efficient memory management. The goal is to make powerful LLMs more accessible and cost-effective for a wider range of applications without requiring specialized hardware. AI

    Case Study: Millisecond Latency using Hugging Face Infinity and modern CPUs
  35. Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers

    Hugging Face has released updates to its Transformers library, enhancing the Wav2Vec2 model for automatic speech recognition (ASR). The library now supports processing large audio files by implementing chunking, which breaks down large files into smaller, manageable segments. Additionally, performance is boosted through the integration of n-grams, further improving the accuracy and efficiency of speech recognition tasks. AI

    Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers
  36. Perceiver IO: a scalable, fully-attentional model that works on any modality

    Perceiver IO is a new AI model architecture developed by DeepMind that utilizes a fully attentional mechanism to process information from various modalities. Unlike previous models that required modality-specific input processing, Perceiver IO can handle diverse data types like images, audio, and text directly. This approach aims to create a more scalable and unified framework for multimodal AI research and applications. AI

    Perceiver IO: a scalable, fully-attentional model that works on any modality
  37. Training CodeParrot 🦜 from Scratch

    Hugging Face has released CodeParrot, a new large language model specifically trained for code generation. The model was built from scratch using a novel training approach that emphasizes efficiency and performance. CodeParrot is designed to assist developers by generating code snippets, completing code, and potentially aiding in debugging tasks. AI

    Training CodeParrot 🦜 from Scratch
  38. Introducing Snowball Fight ☃️, our first ML-Agents environment

    Hugging Face has released Snowball Fight, a new machine learning environment designed for training agents. This environment is built using the ML-Agents toolkit and aims to provide a platform for developing and testing AI agents in a simulated setting. The release is intended to foster innovation in reinforcement learning and agent-based AI development within the community. AI

    Introducing Snowball Fight ☃️, our first ML-Agents environment
  39. 2023 Year In Review

    METR, an AI safety research organization, detailed its 2023 accomplishments, including developing methodologies for evaluating AI agents on autonomous tasks and contributing to OpenAI's GPT-4 system card. The organization also proposed "Responsible Scaling Policies" (RSPs), a framework for AI safety that gained traction among researchers and companies like Anthropic and OpenAI. Additionally, METR partnered with the UK AI Safety Institute and evaluated GPT-5.1 for catastrophic risks. AI

    2023 Year In Review
  40. Zero-shot multitask learning

    The BigScience research workshop, a year-long initiative by Hugging Face, has released the T0 family of AI models. These models are specifically designed to explore zero-shot multitask learning in natural language processing. The T0 models demonstrate the potential for AI to generalize across various tasks without explicit training for each one. AI

    Zero-shot multitask learning
  41. Accelerating PyTorch distributed fine-tuning with Intel technologies

    Hugging Face has partnered with Intel to optimize PyTorch distributed fine-tuning using Intel's latest technologies. This collaboration focuses on enhancing performance and efficiency for large language model training. The integration aims to leverage Intel's hardware advancements to accelerate the fine-tuning process, making it more accessible and faster for researchers and developers. AI

    Accelerating PyTorch distributed fine-tuning with Intel technologies
  42. Solving math word problems

    OpenAI has developed a new system capable of solving grade school math word problems with nearly double the accuracy of previous GPT-3 models. This system achieves approximately 90% of the performance of real children in the 9-12 age range by training the model to recognize and correct its own errors through repeated attempts. The approach involves using verifiers to evaluate multiple candidate solutions, selecting the best one, which offers a significant performance boost and appears to scale more effectively with data than simply increasing model size. AI

    Solving math word problems
  43. The Age of Machine Learning As Code Has Arrived

    Hugging Face has announced a new initiative, "Machine Learning as Code," aiming to standardize how machine learning models are developed, shared, and deployed. This approach treats ML models like software code, emphasizing version control, reproducibility, and collaboration. The goal is to streamline the ML lifecycle, making it more accessible and efficient for developers and researchers. AI

    The Age of Machine Learning As Code Has Arrived
  44. Fine tuning CLIP with Remote Sensing (Satellite) images and captions

    Hugging Face has released a guide on fine-tuning the CLIP model using remote sensing images and their corresponding captions. This process involves adapting the pre-trained CLIP model to better understand and associate visual information from satellite imagery with textual descriptions. The guide details the steps and considerations for this specialized application of CLIP, enabling more accurate analysis and retrieval of geospatial data. AI

    Fine tuning CLIP with Remote Sensing (Satellite) images and captions
  45. Summer at Hugging Face

    Hugging Face is hosting a series of events and releasing new features throughout the summer. These initiatives aim to foster community engagement and advance the open-source AI ecosystem. Key highlights include new model releases, educational content, and opportunities for developers to collaborate and showcase their work. AI

    Summer at Hugging Face
  46. Deep Learning over the Internet: Training Language Models Collaboratively

    Hugging Face has introduced a new framework enabling collaborative training of large language models over the internet. This approach allows multiple parties to contribute to training without sharing their raw data, addressing privacy and security concerns. The system leverages techniques to ensure that individual data remains private while still enabling the collective model to learn from diverse datasets. AI

    Deep Learning over the Internet: Training Language Models Collaboratively
  47. EleutherAI Second Retrospective: The long version

    EleutherAI has released a retrospective detailing their work over the past year and a half. Key achievements include the development of the open-source LLM GPT-NeoX-20B and contributions to text-to-image generation models like VQGAN-CLIP. The organization has also seen several members depart to found new AI research entities focused on alignment, preference learning, and biomedical applications. AI

    EleutherAI Second Retrospective: The long version
  48. SetFit: Efficient Few-Shot Learning Without Prompts

    Hugging Face has introduced SetFit, a novel few-shot learning approach that achieves state-of-the-art performance without requiring prompt engineering. This method utilizes a two-stage process: first, it fine-tunes a model on a small set of labeled data, and then it generates synthetic data from this fine-tuned model to further train it. SetFit has demonstrated impressive results, outperforming prompt-based methods like few-shot GPT-3 on several benchmarks, and is available as an open-source library. AI

    SetFit: Efficient Few-Shot Learning Without Prompts
  49. Why Release a Large Language Model?

    EleutherAI has detailed its reasoning for releasing large language models, emphasizing that open access is crucial for advancing AI safety research. The organization argues that significant safety studies, particularly in model interpretability, can only be effectively conducted with access to these powerful models. They believe that the potential dangers of current large language models are not world-ending and that releasing them allows for critical safety research to be performed before models become significantly more powerful and potentially uncontrollable. Furthermore, EleutherAI contends that attempts to restrict access to this technology are futile, as well-funded actors can replicate it, making open release the best strategy to empower society to study and utilize it for beneficial purposes. AI

    Why Release a Large Language Model?