Brief

last 24h

[50/8376] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · Hugging Face Blog English(EN) · 49mo

Efficient Table Pre-training without Real Data: An Introduction to TAPEX

Researchers have introduced TAPEX, a novel pre-training method for enhancing table understanding in language models. This approach leverages a "table-to-text" objective, allowing models to generate textual representations of tabular data. TAPEX demonstrates improved performance on various table-related downstream tasks, offering a more efficient way to train models on structured information without requiring extensive real-world datasets. AI
RESEARCH · OpenAI News English(EN) · 49mo

DALL·E 2 research preview update

OpenAI is expanding access to its DALL-E 2 research preview, inviting up to 1,000 new users weekly from its waitlist. The company has focused on enhancing safety systems, with less than 0.05% of shared images flagged for policy violations. OpenAI is also actively working to address biases in the model inherited from its training data, requesting early users to avoid sharing photorealistic images with faces. AI
SIGNIFICANT · Hugging Face Blog English(EN) · 49mo

We Raised $100 Million for Open & Collaborative Machine Learning 🚀

Hugging Face has secured $100 million in a Series C funding round, led by Salesforce Ventures. This investment aims to accelerate the development of open-source machine learning technologies and foster a more collaborative AI ecosystem. The company plans to expand its platform and community initiatives to support researchers and developers globally. AI
TOOL · Hugging Face Blog English(EN) · 50mo · [2 sources]

Accelerate Large Model Training using DeepSpeed

Hugging Face has released new guides detailing how to accelerate the training of large AI models. The guides focus on two key technologies: DeepSpeed and PyTorch's Fully Sharded Data Parallel (FSDP). By implementing these techniques, developers can more efficiently train complex models, potentially reducing computational costs and time. AI
COMMENTARY · Hugging Face Blog English(EN) · 50mo

Machine Learning Experts - Lewis Tunstall

Lewis Tunstall, a prominent figure in the machine learning community and an engineer at Hugging Face, shared insights during an interview. He discussed the evolving landscape of open-source AI models and the importance of community-driven development. Tunstall also touched upon the practical applications and future directions of large language models. AI
TOOL · Hugging Face Blog English(EN) · 50mo

Habana Labs and Hugging Face Partner to Accelerate Transformer Model Training

Habana Labs and Hugging Face have announced a partnership aimed at speeding up the training of transformer models. This collaboration will integrate Habana's Gaudi deep learning accelerators with Hugging Face's popular libraries and platforms. The goal is to provide developers with more efficient tools for training large AI models. AI
RESEARCH · Practical AI English(EN) · 51mo

It's been a BIG week in AI news 🗞

BigScience is currently training a large language model, attracting significant global attention. Concurrently, NVIDIA has unveiled its newest generation of GPUs, the "Hopper" series. These developments, alongside other AI-related news, were discussed in a recent episode of Practical AI. AI
- BigScience
- Hopper
- NVIDIA
RESEARCH · Hugging Face Blog English(EN) · 51mo

Fine-Tune a Semantic Segmentation Model with a Custom Dataset

Hugging Face has published a guide detailing how to fine-tune a semantic segmentation model using a custom dataset. The tutorial focuses on the SegFormer model, demonstrating the process of adapting it for specific segmentation tasks. This guide is intended to help users leverage pre-trained models and tailor them to their unique data requirements. AI
RESEARCH · OpenAI News English(EN) · 51mo

New GPT-3 capabilities: Edit & insert

OpenAI has introduced new GPT-3 and Codex capabilities that allow for editing and inserting content within existing text, moving beyond simple text completion. The 'insert' feature enables contextually relevant additions in the middle of text or code, improving applications like long-form writing and code generation. Additionally, a new 'edits' endpoint allows for modifications to existing text based on specific instructions, useful for tasks such as refactoring code, changing tone, or fixing errors. These features are now available in beta via the OpenAI API and are being piloted in tools like GitHub Copilot. AI
RESEARCH · Hugging Face Blog English(EN) · 51mo · [2 sources]

Generating Human-level Text with Contrastive Search in Transformers 🤗

Hugging Face has introduced two new text generation techniques for its Transformers library: contrastive search and constrained beam search. Contrastive search aims to produce more human-like text by balancing likelihood and distinctiveness, while constrained beam search allows users to guide the generation process with specific rules or patterns. These methods offer developers more control and improved quality for text generation tasks within the Hugging Face ecosystem. AI
RESEARCH · Practical AI English(EN) · 52mo

One algorithm to rule them all?

Researchers have developed an AI system capable of quickly predicting protein attachments, a significant advancement in biological research. Additionally, a new self-supervised algorithm from Meta AI demonstrates high performance across speech, vision, and text modalities. DeepMind has also announced an AI coding engine that matches the proficiency of an average human programmer. AI
RESEARCH · Hugging Face Blog English(EN) · 52mo

Fine-Tune ViT for Image Classification with 🤗 Transformers

Hugging Face has released a guide on fine-tuning the Vision Transformer (ViT) model for image classification tasks. The tutorial utilizes the 🤗 Transformers library, demonstrating how to adapt a pre-trained ViT model to a specific dataset. This process allows developers to leverage powerful pre-trained models for custom image recognition applications without training from scratch. AI
RESEARCH · EleutherAI Blog English(EN) · 53mo

Announcing GPT-NeoX-20B

EleutherAI has released GPT-NeoX-20B, a 20 billion parameter open-source language model trained using their GPT-NeoX framework. This model is notable for being the largest publicly accessible pretrained autoregressive language model to date. The release aims to facilitate research into the safe use of AI systems, with the model available via inference services and a public release scheduled after a seven-day delay. AI
RESEARCH · OpenAI News English(EN) · 53mo

Solving (some) formal math olympiad problems

OpenAI has developed a neural theorem prover for the Lean formal proof assistant that can solve challenging high-school olympiad math problems. The system utilizes a language model to discover proofs, iteratively improving its performance by using newly found proofs as training data. This approach achieved a new state-of-the-art on the miniF2F benchmark, outperforming previous methods. AI
SIGNIFICANT · Hugging Face Blog English(EN) · 53mo · [5 sources]

Getting Started With Embeddings

OpenAI has released new embedding models, text-embedding-3-small and text-embedding-3-large, offering significant improvements in performance and efficiency over previous models like text-embedding-ada-002. These new models are designed to better understand relationships between concepts in text and code, powering applications such as semantic search and retrieval-augmented generation. OpenAI is also reducing prices for GPT-3.5 Turbo and updating its GPT-4 Turbo preview model, while also enhancing API key management and usage transparency for developers. AI
RESEARCH · Hugging Face Blog English(EN) · 53mo

Case Study: Millisecond Latency using Hugging Face Infinity and modern CPUs

Hugging Face has released Infinity, a new inference engine designed to optimize large language model performance on modern CPUs. This engine achieves millisecond latency by leveraging techniques like quantization and efficient memory management. The goal is to make powerful LLMs more accessible and cost-effective for a wider range of applications without requiring specialized hardware. AI
RESEARCH · Hugging Face Blog English(EN) · 53mo · [2 sources]

Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers

Hugging Face has released updates to its Transformers library, enhancing the Wav2Vec2 model for automatic speech recognition (ASR). The library now supports processing large audio files by implementing chunking, which breaks down large files into smaller, manageable segments. Additionally, performance is boosted through the integration of n-grams, further improving the accuracy and efficiency of speech recognition tasks. AI
RESEARCH · Hugging Face Blog English(EN) · 54mo

Perceiver IO: a scalable, fully-attentional model that works on any modality

Perceiver IO is a new AI model architecture developed by DeepMind that utilizes a fully attentional mechanism to process information from various modalities. Unlike previous models that required modality-specific input processing, Perceiver IO can handle diverse data types like images, audio, and text directly. This approach aims to create a more scalable and unified framework for multimodal AI research and applications. AI
SIGNIFICANT · OpenAI News English(EN) · 54mo · [3 sources]

Using custom GPTs

OpenAI has introduced GPTs, a new feature allowing users to create custom versions of ChatGPT tailored for specific tasks or workflows. These custom GPTs can be built without coding by providing instructions, additional knowledge, and defining capabilities like web search or image generation. The company plans to launch a GPT Store later this month, where creators can share their GPTs and potentially monetize them, while also implementing safety measures to review submissions. AI
- OpenAI
- ChatGPT
- GPTs
- GPT Store
- Canva
- Zapier AI Actions
- GPT-3
RESEARCH · Hugging Face Blog English(EN) · 54mo

Training CodeParrot 🦜 from Scratch

Hugging Face has released CodeParrot, a new large language model specifically trained for code generation. The model was built from scratch using a novel training approach that emphasizes efficiency and performance. CodeParrot is designed to assist developers by generating code snippets, completing code, and potentially aiding in debugging tasks. AI
RESEARCH · Hugging Face Blog English(EN) · 55mo

Introducing Snowball Fight ☃️, our first ML-Agents environment

Hugging Face has released Snowball Fight, a new machine learning environment designed for training agents. This environment is built using the ML-Agents toolkit and aims to provide a platform for developing and testing AI agents in a simulated setting. The release is intended to foster innovation in reinforcement learning and agent-based AI development within the community. AI
RESEARCH · METR (Model Evaluation & Threat Research) English(EN) · 55mo · [5 sources]

2023 Year In Review

METR, an AI safety research organization, detailed its 2023 accomplishments, including developing methodologies for evaluating AI agents on autonomous tasks and contributing to OpenAI's GPT-4 system card. The organization also proposed "Responsible Scaling Policies" (RSPs), a framework for AI safety that gained traction among researchers and companies like Anthropic and OpenAI. Additionally, METR partnered with the UK AI Safety Institute and evaluated GPT-5.1 for catastrophic risks. AI
RESEARCH · Practical AI English(EN) · 55mo

Zero-shot multitask learning

The BigScience research workshop, a year-long initiative by Hugging Face, has released the T0 family of AI models. These models are specifically designed to explore zero-shot multitask learning in natural language processing. The T0 models demonstrate the potential for AI to generalize across various tasks without explicit training for each one. AI
RESEARCH · Hugging Face Blog English(EN) · 55mo

Accelerating PyTorch distributed fine-tuning with Intel technologies

Hugging Face has partnered with Intel to optimize PyTorch distributed fine-tuning using Intel's latest technologies. This collaboration focuses on enhancing performance and efficiency for large language model training. The integration aims to leverage Intel's hardware advancements to accelerate the fine-tuning process, making it more accessible and faster for researchers and developers. AI
RESEARCH · OpenAI News English(EN) · 56mo

Solving math word problems

OpenAI has developed a new system capable of solving grade school math word problems with nearly double the accuracy of previous GPT-3 models. This system achieves approximately 90% of the performance of real children in the 9-12 age range by training the model to recognize and correct its own errors through repeated attempts. The approach involves using verifiers to evaluate multiple candidate solutions, selecting the best one, which offers a significant performance boost and appears to scale more effectively with data than simply increasing model size. AI
RESEARCH · Hugging Face Blog English(EN) · 56mo

The Age of Machine Learning As Code Has Arrived

Hugging Face has announced a new initiative, "Machine Learning as Code," aiming to standardize how machine learning models are developed, shared, and deployed. This approach treats ML models like software code, emphasizing version control, reproducibility, and collaboration. The goal is to streamline the ML lifecycle, making it more accessible and efficient for developers and researchers. AI
RESEARCH · Hugging Face Blog English(EN) · 56mo

Fine tuning CLIP with Remote Sensing (Satellite) images and captions

Hugging Face has released a guide on fine-tuning the CLIP model using remote sensing images and their corresponding captions. This process involves adapting the pre-trained CLIP model to better understand and associate visual information from satellite imagery with textual descriptions. The guide details the steps and considerations for this specialized application of CLIP, enabling more accurate analysis and retrieval of geospatial data. AI
RESEARCH · Hugging Face Blog Dansk(DA) · 57mo

Summer at Hugging Face

Hugging Face is hosting a series of events and releasing new features throughout the summer. These initiatives aim to foster community engagement and advance the open-source AI ecosystem. Key highlights include new model releases, educational content, and opportunities for developers to collaborate and showcase their work. AI
TOOL · Hugging Face Blog English(EN) · 57mo · [4 sources]

Convert Transformers to ONNX with Hugging Face Optimum

Hugging Face has released Optimum, a new toolkit designed to optimize Transformer models for various hardware accelerators. This initiative includes partnerships with hardware vendors like Graphcore, enabling users to run models more efficiently on specialized hardware such as IPUs. The toolkit supports conversion to ONNX format, further enhancing model performance and deployment flexibility across different platforms. AI
RESEARCH · Hugging Face Blog English(EN) · 59mo

Deep Learning over the Internet: Training Language Models Collaboratively

Hugging Face has introduced a new framework enabling collaborative training of large language models over the internet. This approach allows multiple parties to contribute to training without sharing their raw data, addressing privacy and security concerns. The system leverages techniques to ensure that individual data remains private while still enabling the collective model to learn from diverse datasets. AI
RESEARCH · EleutherAI Blog English(EN) · 60mo · [3 sources]

EleutherAI Second Retrospective: The long version

EleutherAI has released a retrospective detailing their work over the past year and a half. Key achievements include the development of the open-source LLM GPT-NeoX-20B and contributions to text-to-image generation models like VQGAN-CLIP. The organization has also seen several members depart to found new AI research entities focused on alignment, preference learning, and biomedical applications. AI
RESEARCH · Hugging Face Blog English(EN) · 61mo · [2 sources]

SetFit: Efficient Few-Shot Learning Without Prompts

Hugging Face has introduced SetFit, a novel few-shot learning approach that achieves state-of-the-art performance without requiring prompt engineering. This method utilizes a two-stage process: first, it fine-tunes a model on a small set of labeled data, and then it generates synthetic data from this fine-tuned model to further train it. SetFit has demonstrated impressive results, outperforming prompt-based methods like few-shot GPT-3 on several benchmarks, and is available as an open-source library. AI
RESEARCH · EleutherAI Blog English(EN) · 61mo

Why Release a Large Language Model?

EleutherAI has detailed its reasoning for releasing large language models, emphasizing that open access is crucial for advancing AI safety research. The organization argues that significant safety studies, particularly in model interpretability, can only be effectively conducted with access to these powerful models. They believe that the potential dangers of current large language models are not world-ending and that releasing them allows for critical safety research to be performed before models become significantly more powerful and potentially uncontrollable. Furthermore, EleutherAI contends that attempts to restrict access to this technology are futile, as well-funded actors can replicate it, making open release the best strategy to empower society to study and utilize it for beneficial purposes. AI
RESEARCH · EleutherAI Blog English(EN) · 61mo

On the Sizes of OpenAI API Models

EleutherAI has estimated the parameter counts of OpenAI's API models by comparing their performance on various tasks to known benchmarks. Their analysis suggests that models like Ada, Babbage, Curie, and Davinci correspond to approximately 350 million, 1.3 billion, 6.7 billion, and 175 billion parameters, respectively. While not official figures, these estimates provide a strong indication of the scale of OpenAI's deployed models. AI
RESEARCH · EleutherAI Blog English(EN) · 61mo

Evaluating Different Fewshot Description Prompts on GPT-3

Researchers at EleutherAI investigated how different few-shot description prompts affect GPT-3's performance on the SST benchmark. Their experiments revealed that smaller GPT-2 models performed poorly and inconsistently, with performance not strictly increasing with model size. Surprisingly, the study found no correlation between different GPT models regarding which prompts yielded the best results, challenging the expectation that similar models would favor similar prompting strategies. AI
RESEARCH · EleutherAI Blog English(EN) · 61mo

Finetuning Models on Downstream Tasks

Researchers at EleutherAI explored the impact of fine-tuning the GPT-Neo 2.7B model on a diverse set of downstream tasks. They observed that while the fine-tuned model did not universally outperform the base model, it showed significant improvements on certain tasks like ANLI. However, this specialization came at the cost of performance degradation on tasks not included in the fine-tuning set, such as LAMBADA and PubMedQA, indicating a potential for catastrophic forgetting. AI
TOOL · Practical AI English(EN) · 62mo

Generating "hunches" using smart home data 🏠

Amazon has developed a new feature for Alexa called "hunches" that uses complex smart home data to anticipate user needs. This system synthesizes disparate data from various devices and configurations, even amidst anomalies like those seen during a pandemic. The goal is to create a more intuitive and proactive smart home experience for users. AI
RESEARCH · Practical AI English(EN) · 63mo

Next-gen voice assistants

PolyAI CEO Nikola Mrkšić discussed advancements in conversational AI and the development of next-generation voice assistants capable of human-level conversations. The company's ConveRT model has demonstrated superior performance compared to BERT and GPT-based models in evaluations, particularly in understanding various languages and accents. PolyAI's technology aims to enhance customer service interactions through more sophisticated voice assistant capabilities. AI
RESEARCH · Hugging Face Blog English(EN) · 63mo

Understanding BigBird's Block Sparse Attention

BigBird is a novel attention mechanism designed to address the quadratic complexity of standard Transformer models. It achieves this by employing a sparse attention pattern, which includes global, window, and random attention, allowing it to process significantly longer sequences than traditional Transformers. This innovation makes BigBird particularly effective for tasks requiring long-range dependencies, such as document summarization and question answering on extensive texts. AI
TOOL · OpenAI News English(EN) · 63mo

GPT-3 powers the next generation of apps

OpenAI has announced that over 300 applications are now leveraging its GPT-3 API to provide advanced AI features. These applications span various sectors, including productivity, education, and gaming, demonstrating GPT-3's versatility in tasks like search, conversation, and text completion. Companies such as Viable, Fable Studio, and Algolia are highlighted for their innovative uses of GPT-3, with Algolia reporting significant improvements in search accuracy compared to previous models. AI
COMMENTARY · Lil'Log (Lilian Weng) English(EN) · 63mo · [2 sources]

Reducing Toxicity in Language Models

OpenAI has shared insights gained from deploying its language models, highlighting that real-world misuse often differs from initial fears. The company emphasized the limitations of current evaluation methods and the need for novel benchmarks to address safety concerns. OpenAI also noted that basic safety research significantly enhances the commercial utility of AI systems. AI
- Codex
- OpenAI
- GPT-3
- InstructGPT
- Lilian Weng
RESEARCH · Hugging Face Blog English(EN) · 63mo · [4 sources]

Fine-Tune W2V2-Bert for low-resource ASR with 🤗 Transformers

Hugging Face has released a series of blog posts detailing how to fine-tune various Wav2Vec2 and Whisper models for Automatic Speech Recognition (ASR) tasks using their Transformers library. These guides cover adapting models for low-resource scenarios, multilingual applications, and specific languages like English. The tutorials emphasize practical implementation for researchers and developers working with speech data. AI
TOOL · Replit blog English(EN) · 64mo · [7 sources]

Replit Case Study - Catalyst Coding Club

Replit has launched Agent v2, an enhanced AI coding assistant that offers greater autonomy and a real-time application design preview. This new version is designed to be less prone to errors and more efficient in generating user interfaces. The update is available to paid Replit users through an early access program, with further features planned for release in the coming weeks. Replit also introduced Replit Projects, a beta feature for teams to collaborate on codebases with version control and merging capabilities, aiming to streamline the development process. AI

IMPACT Enhances developer productivity and collaboration through AI-powered coding assistance and project management tools.
RESEARCH · Hugging Face Blog English(EN) · 64mo

Hugging Face Reads, Feb. 2021 - Long-range Transformers

This blog post from Hugging Face discusses the advancements in long-range Transformers, a type of neural network architecture. It explores how these models are being developed to handle longer sequences of text, overcoming previous limitations. The post likely delves into the technical aspects and potential applications of these more capable Transformer models. AI
RESEARCH · OpenAI News Italiano(IT) · 64mo

Multimodal neurons in artificial neural networks

OpenAI researchers have identified "multimodal neurons" within their CLIP model, which respond to concepts regardless of whether they are presented visually, symbolically, or textually. This discovery offers insight into how CLIP achieves high accuracy on challenging datasets by abstracting concepts, similar to how neurons in the human brain function. The findings suggest a common mechanism for abstraction in both artificial and natural vision systems, potentially explaining model versatility and compactness. AI
RESEARCH · Practical AI English(EN) · 65mo · [8 sources]

Quick, beautiful web UIs for ML apps

The Machine Learning Compilation (MLC) group, led by Tianqi Chen at CMU, is developing frameworks like MLC Chat and Web LLM to enable running large language models on consumer hardware, including iPhones and web browsers. This initiative aims to mitigate the current GPU shortage by allowing models to run locally on devices with AMD cards or even just CPUs. Projects like Hugging Face's text-to-webapp generator and Gradio are also contributing to easier deployment and accessibility of ML models for developers and end-users. AI
- MLCommons
- MLPerf
- MLC
- Tianqi Chen
- CMU
- MLC Chat
- Web LLM
- LLaMA-70B
- AMD
- NVIDIA
- XGBoost
- Apache TVM
- OctoML
- Hugging Face
- Gradio
RESEARCH · Hugging Face Blog English(EN) · 65mo

Fit More and Train Faster With ZeRO via DeepSpeed and FairScale

Hugging Face has integrated ZeRO (Zero Redundancy Optimizer) into its libraries, leveraging DeepSpeed and FairScale. This enhancement allows for more efficient training of large language models by reducing memory redundancy across distributed training setups. The optimization enables fitting larger models into memory and accelerating the training process. AI
RESEARCH · Eugene Yan English(EN) · 66mo · [9 sources]

Improving Recommendation Systems & Search in the Age of LLMs

A new paper explores the critical role of user state representation in contextual multi-armed bandit (CMAB) recommender systems, finding that variations in state representation can yield greater performance improvements than changes to the bandit algorithm itself. The research highlights that no single embedding or aggregation strategy is universally superior, emphasizing the need for domain-specific evaluations. Another study introduces BEAR, a novel fine-tuning objective for Large Language Models (LLMs) in recommendation tasks that explicitly accounts for beam search behavior during training to address inconsistencies between training and inference. Additionally, a paper proposes a methodology to measure the stability and plasticity of recommender systems, evaluating how models adapt to retraining and changes in data patterns. AI

IMPACT Advances in user state representation and LLM fine-tuning for recommendations could lead to more personalized and effective user experiences.
- arXiv
- BEAR
- LLMs
- GoodReads
- Netflix
- YouTube
- BERT
- Transformer
- Word2vec
- Hugging Face
- DagsHub
RESEARCH · OpenAI News English(EN) · 66mo

CLIP: Connecting text and images

OpenAI has introduced CLIP, a neural network designed to learn visual concepts from natural language supervision. This model can perform a wide range of image classification tasks without specific training for each benchmark, leveraging the vast amount of text paired with images available online. CLIP aims to overcome limitations of traditional computer vision models, such as the cost of creating datasets and the narrow focus of task-specific training, by achieving robust performance across various benchmarks with zero-shot capabilities. AI
RESEARCH · Hugging Face Blog English(EN) · 66mo · [4 sources]

Open Preference Dataset for Text-to-Image Generation by the 🤗 Community

OpenAI has detailed a new method for generating images from text using CLIP latents, employing a two-stage process with a prior and a decoder. This approach enhances image diversity while maintaining photorealism and caption similarity, and allows for language-guided image manipulations. Separately, OpenAI also introduced DALL-E, a 12-billion parameter GPT-3 variant capable of creating images from text descriptions, demonstrating abilities like combining concepts and rendering text. AI

IMPACT Introduces new techniques for text-to-image generation, potentially improving diversity and controllability.
- DALL-E
- GPT-3
- DALL-E 2
- Hugging Face
- OpenAI