Brief

last 24h

[50/9095] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · Hugging Face Blog English(EN) · 48mo · [436 sources]

The Annotated Diffusion Model

Apple's research paper explores the mechanisms behind compositional generalization in conditional diffusion models, particularly focusing on how these models handle generating images with more objects than trained on. The study identifies 'local conditional scores' as a key factor enabling this ability, demonstrating that models succeeding at length generalization exhibit these scores, while those that fail do not. The research also proposes a method to enforce these local scores, which successfully enabled length generalization in a previously underperforming model. AI

IMPACT Research into diffusion model generalization could lead to more robust and controllable image generation systems.
RESEARCH · Hugging Face Blog English(EN) · 48mo

Deep Q-Learning with Space Invaders

Hugging Face has released a new blog post detailing how to implement Deep Q-Learning (DQN) for the classic game Space Invaders. The post provides a practical guide, including code examples, to help developers understand and apply reinforcement learning techniques. This resource aims to make advanced RL concepts more accessible to a wider audience. AI
RESEARCH · OpenAI News English(EN) · 49mo

Teaching models to express their uncertainty in words

OpenAI has demonstrated a GPT-3 model capable of expressing its uncertainty in natural language, without relying on internal model probabilities. The model can generate both an answer and a confidence level, such as "90% confidence," which are well-calibrated and maintain moderate calibration even when faced with shifts in data distribution. This research marks the first instance of a model verbally communicating calibrated uncertainty about its own responses and introduces a new testing suite called CalibratedMath. AI
RESEARCH · Hugging Face Blog English(EN) · 49mo

Efficient Table Pre-training without Real Data: An Introduction to TAPEX

Researchers have introduced TAPEX, a novel pre-training method for enhancing table understanding in language models. This approach leverages a "table-to-text" objective, allowing models to generate textual representations of tabular data. TAPEX demonstrates improved performance on various table-related downstream tasks, offering a more efficient way to train models on structured information without requiring extensive real-world datasets. AI
RESEARCH · Hugging Face Blog English(EN) · 49mo

Putting ethical principles at the core of the research lifecycle

Hugging Face has introduced a new ethical charter designed to guide the development and deployment of AI models. This charter emphasizes responsible AI practices throughout the entire research lifecycle, from initial conception to final release. The initiative aims to foster a more ethical and trustworthy AI ecosystem by providing clear principles for researchers and developers. AI
RESEARCH · Practical AI English(EN) · 49mo

Active learning & endangered languages

Sarah Moeller from the University of Florida discussed how AI, specifically active learning methods, can assist in documenting and revitalizing endangered languages, even with limited data. She shared personal experiences working with low-resource languages, highlighting the potential for AI to support linguistic diversity. The conversation explored the necessity of data for AI and practical applications in the field. AI
RESEARCH · Hugging Face Blog English(EN) · 50mo

Opinion Classification with Kili and HuggingFace AutoTrain

This blog post details how to perform opinion classification using Hugging Face's AutoTrain and Kili's data labeling platform. It outlines a workflow that begins with data annotation in Kili and then leverages AutoTrain to efficiently build and train a custom model for this task. The process aims to streamline the development of specialized NLP models for sentiment analysis and related applications. AI
RESEARCH · Eugene Yan English(EN) · 50mo

How to Measure and Mitigate Position Bias

Position bias, where higher-ranked items receive more engagement regardless of relevance, poses a challenge for recommender systems. This bias can stem from user trust in algorithms, presentation effects, or a tendency to stop searching after finding a satisfactory result. To address this, methods like randomizing result positions or exploiting inherent randomness in logged data can be employed to measure and mitigate the impact of position bias, ensuring that truly relevant items are not overlooked. AI
RESEARCH · OpenAI News English(EN) · 50mo

Measuring Goodhart’s law

OpenAI has published research on how to mitigate Goodhart's Law, a phenomenon where a measure becomes a target and ceases to be a good measure. The paper explores mathematical approaches to optimize AI models for complex human preferences, which are difficult to measure directly. OpenAI uses proxy objectives, like a reward model, and investigates techniques such as best-of-sampling to ensure that optimizing the proxy still aligns with the true underlying objective. AI
RESEARCH · Eugene Yan English(EN) · 50mo

Counterfactual Evaluation for Recommendation Systems

Eugene Yan's article discusses the limitations of traditional offline evaluation for recommendation systems, arguing that they treat an interventional problem as observational. Current methods evaluate how well recommendations fit historical data rather than predicting user behavior with new recommendations. The author proposes counterfactual evaluation, particularly using Inverse Propensity Scoring (IPS), as a method to estimate the impact of new recommendations without live A/B testing. AI
- Eugene Yan
RESEARCH · Practical AI English(EN) · 51mo · [2 sources]

"Foundation" models

EleutherAI, in collaboration with researchers from several prominent institutions, has released "The Foundation Model Development Cheatsheet." This guide aims to simplify the process of creating open AI models by providing a comprehensive overview of tools and resources covering the entire development lifecycle, from data collection to release practices. The initiative emphasizes the importance of full-pipeline transparency and responsible development, building on EleutherAI's previous work like the Pythia model suite. AI
RESEARCH · Hugging Face Blog English(EN) · 51mo

Fine-Tune a Semantic Segmentation Model with a Custom Dataset

Hugging Face has published a guide detailing how to fine-tune a semantic segmentation model using a custom dataset. The tutorial focuses on the SegFormer model, demonstrating the process of adapting it for specific segmentation tasks. This guide is intended to help users leverage pre-trained models and tailor them to their unique data requirements. AI
RESEARCH · Hugging Face Blog English(EN) · 51mo · [2 sources]

Generating Human-level Text with Contrastive Search in Transformers 🤗

Hugging Face has introduced two new text generation techniques for its Transformers library: contrastive search and constrained beam search. Contrastive search aims to produce more human-like text by balancing likelihood and distinctiveness, while constrained beam search allows users to guide the generation process with specific rules or patterns. These methods offer developers more control and improved quality for text generation tasks within the Hugging Face ecosystem. AI
RESEARCH · OpenAI News English(EN) · 52mo

A research agenda for assessing the economic impacts of code generation models

OpenAI has released a research agenda focused on understanding the economic consequences of AI models that generate code. The initiative aims to explore impacts on productivity, employment, skill development, competition, consumer prices, and inequality. OpenAI is inviting external researchers to collaborate on this project, utilizing their Codex model as a tool for study and methodology development. AI
RESEARCH · Practical AI English(EN) · 52mo

One algorithm to rule them all?

Researchers have developed an AI system capable of quickly predicting protein attachments, a significant advancement in biological research. Additionally, a new self-supervised algorithm from Meta AI demonstrates high performance across speech, vision, and text modalities. DeepMind has also announced an AI coding engine that matches the proficiency of an average human programmer. AI
RESEARCH · Hugging Face Blog English(EN) · 52mo

Fine-Tune ViT for Image Classification with 🤗 Transformers

Hugging Face has released a guide on fine-tuning the Vision Transformer (ViT) model for image classification tasks. The tutorial utilizes the 🤗 Transformers library, demonstrating how to adapt a pre-trained ViT model to a specific dataset. This process allows developers to leverage powerful pre-trained models for custom image recognition applications without training from scratch. AI
RESEARCH · Practical AI English(EN) · 52mo · [2 sources]

🌍 AI in Africa - Voice & language tools

Google Research has released WAXAL, a large-scale, open-access dataset designed to advance speech technology for 27 African languages. The resource includes approximately 1,846 hours of transcribed spontaneous speech for automatic speech recognition and over 565 hours of high-fidelity recordings for text-to-speech synthesis. This initiative aims to bridge the digital divide by empowering the African AI ecosystem to develop inclusive voice-enabled technologies that reflect the continent's linguistic diversity. AI
RESEARCH · EleutherAI Blog English(EN) · 53mo

Announcing GPT-NeoX-20B

EleutherAI has released GPT-NeoX-20B, a 20 billion parameter open-source language model trained using their GPT-NeoX framework. This model is notable for being the largest publicly accessible pretrained autoregressive language model to date. The release aims to facilitate research into the safe use of AI systems, with the model available via inference services and a public release scheduled after a seven-day delay. AI
RESEARCH · OpenAI News English(EN) · 53mo

Solving (some) formal math olympiad problems

OpenAI has developed a neural theorem prover for the Lean formal proof assistant that can solve challenging high-school olympiad math problems. The system utilizes a language model to discover proofs, iteratively improving its performance by using newly found proofs as training data. This approach achieved a new state-of-the-art on the miniF2F benchmark, outperforming previous methods. AI
RESEARCH · Practical AI English(EN) · 53mo

Democratizing ML for speech

MLCommons has released two new speech datasets aimed at making machine learning more accessible. These datasets focus on increasing data scale and diversity in terms of languages and speakers. The initiative seeks to democratize the field of speech recognition technology. AI
RESEARCH · Hugging Face Blog English(EN) · 53mo · [2 sources]

Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers

Hugging Face has released updates to its Transformers library, enhancing the Wav2Vec2 model for automatic speech recognition (ASR). The library now supports processing large audio files by implementing chunking, which breaks down large files into smaller, manageable segments. Additionally, performance is boosted through the integration of n-grams, further improving the accuracy and efficiency of speech recognition tasks. AI
RESEARCH · Hugging Face Blog English(EN) · 54mo

Perceiver IO: a scalable, fully-attentional model that works on any modality

Perceiver IO is a new AI model architecture developed by DeepMind that utilizes a fully attentional mechanism to process information from various modalities. Unlike previous models that required modality-specific input processing, Perceiver IO can handle diverse data types like images, audio, and text directly. This approach aims to create a more scalable and unified framework for multimodal AI research and applications. AI
RESEARCH · Hugging Face Blog English(EN) · 54mo

Training CodeParrot 🦜 from Scratch

Hugging Face has released CodeParrot, a new large language model specifically trained for code generation. The model was built from scratch using a novel training approach that emphasizes efficiency and performance. CodeParrot is designed to assist developers by generating code snippets, completing code, and potentially aiding in debugging tasks. AI
RESEARCH · METR (Model Evaluation & Threat Research) English(EN) · 55mo · [5 sources]

2023 Year In Review

METR, an AI safety research organization, detailed its 2023 accomplishments, including developing methodologies for evaluating AI agents on autonomous tasks and contributing to OpenAI's GPT-4 system card. The organization also proposed "Responsible Scaling Policies" (RSPs), a framework for AI safety that gained traction among researchers and companies like Anthropic and OpenAI. Additionally, METR partnered with the UK AI Safety Institute and evaluated GPT-5.1 for catastrophic risks. AI
RESEARCH · Practical AI English(EN) · 55mo

Zero-shot multitask learning

The BigScience research workshop, a year-long initiative by Hugging Face, has released the T0 family of AI models. These models are specifically designed to explore zero-shot multitask learning in natural language processing. The T0 models demonstrate the potential for AI to generalize across various tasks without explicit training for each one. AI
RESEARCH · OpenAI News English(EN) · 56mo

Solving math word problems

OpenAI has developed a new system capable of solving grade school math word problems with nearly double the accuracy of previous GPT-3 models. This system achieves approximately 90% of the performance of real children in the 9-12 age range by training the model to recognize and correct its own errors through repeated attempts. The approach involves using verifiers to evaluate multiple candidate solutions, selecting the best one, which offers a significant performance boost and appears to scale more effectively with data than simply increasing model size. AI
RESEARCH · EleutherAI Blog English(EN) · 56mo

A Preliminary Exploration into Factored Cognition with Language Models

Researchers at EleutherAI have explored a concept called "factored cognition" using GPT-3 to tackle complex arithmetic tasks it would otherwise fail at. By decomposing problems into smaller, sequential steps, similar to how humans use tools for calculations, they observed significant improvements in the model's performance. This approach aims to provide preliminary evidence for the effectiveness of breaking down complex tasks for large language models. AI
RESEARCH · Hugging Face Blog English(EN) · 56mo

The Age of Machine Learning As Code Has Arrived

Hugging Face has announced a new initiative, "Machine Learning as Code," aiming to standardize how machine learning models are developed, shared, and deployed. This approach treats ML models like software code, emphasizing version control, reproducibility, and collaboration. The goal is to streamline the ML lifecycle, making it more accessible and efficient for developers and researchers. AI
RESEARCH · Hugging Face Blog English(EN) · 56mo

Fine tuning CLIP with Remote Sensing (Satellite) images and captions

Hugging Face has released a guide on fine-tuning the CLIP model using remote sensing images and their corresponding captions. This process involves adapting the pre-trained CLIP model to better understand and associate visual information from satellite imagery with textual descriptions. The guide details the steps and considerations for this specialized application of CLIP, enabling more accurate analysis and retrieval of geospatial data. AI
RESEARCH · Practical AI English(EN) · 56mo · [19 sources]

Friendly federated learning 🌼

Researchers have developed several new methods to improve federated learning, a distributed machine learning approach that trains models on decentralized data without sharing raw information. FedHarmony addresses challenges in modeling label correlations across heterogeneous client data by introducing a consensus mechanism. "Who Trains Matters" tackles selection biases in federated learning by proposing an inverse-probability-weighted aggregation scheme to ensure training representativeness. Additionally, new techniques like Subspace Optimization (SSF), FedSLoP, and GradsSharding aim to enhance efficiency by reducing communication and memory overhead, particularly for large models on serverless platforms. AI

IMPACT New federated learning algorithms promise improved efficiency and accuracy, especially for large models and heterogeneous data.
RESEARCH · EleutherAI Blog English(EN) · 56mo

Multiple Choice Normalization in LM Evaluation

EleutherAI's blog post introduces and analyzes four distinct methods for evaluating language model performance on multiple-choice tasks. These methods, including unnormalized, token-length normalized, byte-length normalized, and unconditional likelihood normalized scores, address the challenge of comparing continuations of varying lengths. The post highlights the trade-offs of each approach, particularly concerning tokenization dependence and computational requirements, with byte-length normalization emerging as a tokenization-agnostic solution. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 57mo · [2 sources]

How to Train Really Large Models on Many GPUs?

Training extremely large neural network models presents significant challenges due to their immense memory requirements and lengthy training times, often exceeding the capacity of individual GPUs. To address this, various parallelism techniques are employed, including data parallelism where models are replicated across multiple workers, and model parallelism where the model itself is partitioned across machines. Advanced methods like gradient accumulation and techniques to offload parameters to CPU memory are also utilized to optimize training efficiency and manage resource constraints. AI
RESEARCH · OpenAI News English(EN) · 57mo

TruthfulQA: Measuring how models mimic human falsehoods

OpenAI has introduced TruthfulQA, a new benchmark designed to evaluate how well language models avoid generating false information. The benchmark consists of 817 questions across 38 categories, specifically designed to elicit false answers based on common human misconceptions. Early tests showed that even the best-performing models were truthful on only 58% of questions, significantly lower than the 94% achieved by humans, and larger models tended to be less truthful, suggesting that simply scaling up models may not improve their accuracy. AI
RESEARCH · Eugene Yan English(EN) · 59mo · [3 sources]

Bootstrapping Labels via ___ Supervision & Human-In-The-Loop

A new paper from Timothy Christensen proposes a coupled-label bootstrap method to address biases in OLS estimators that arise when using AI/ML-generated labels as covariates in economic regressions. The research highlights that standard fixed-label bootstrap methods are often invalid unless specific independence conditions are met. The proposed coupled-label bootstrap jointly resamples true and imputed labels, offering a more robust solution without these stringent conditions, and includes finite-sample adjustments for improved accuracy. This work is illustrated with simulations and applied to analyze the relationship between wages and remote work status. AI

IMPACT Provides a statistical method to improve the reliability of economic analyses that incorporate AI-generated data labels.
- Timothy Christensen
- bootstrap
- arXiv
- Econometrics
- wage
- remote work
- AI/ML
RESEARCH · Hugging Face Blog English(EN) · 59mo

Deep Learning over the Internet: Training Language Models Collaboratively

Hugging Face has introduced a new framework enabling collaborative training of large language models over the internet. This approach allows multiple parties to contribute to training without sharing their raw data, addressing privacy and security concerns. The system leverages techniques to ensure that individual data remains private while still enabling the collective model to learn from diverse datasets. AI
RESEARCH · Eugene Yan English(EN) · 59mo · [2 sources]

MLOps Community - System Design for RecSys & Search

Eugene Yan recently presented on system design for recommendation systems and search at two separate meetups: the MLOps Community and SF Big Analytics. The talks, which occurred in September 2021 and July 2021 respectively, covered key aspects of building and deploying such systems. Yan's presentations are available as recorded talks and slides, with citations provided for academic use. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 59mo

What are Diffusion Models?

Lilian Weng's blog post provides a comprehensive overview of diffusion models, a type of generative model inspired by non-equilibrium thermodynamics. The post details the forward diffusion process, where noise is gradually added to data until it resembles a Gaussian distribution. It also explains the reverse diffusion process, which learns to reconstruct data from noise, and discusses connections to stochastic gradient Langevin dynamics. The article has been updated multiple times to include recent advancements like classifier-free guidance and latent diffusion models. AI
RESEARCH · EleutherAI Blog English(EN) · 60mo · [3 sources]

EleutherAI Second Retrospective: The long version

EleutherAI has released a retrospective detailing their work over the past year and a half. Key achievements include the development of the open-source LLM GPT-NeoX-20B and contributions to text-to-image generation models like VQGAN-CLIP. The organization has also seen several members depart to found new AI research entities focused on alignment, preference learning, and biomedical applications. AI
RESEARCH · Hugging Face Blog English(EN) · 61mo · [2 sources]

SetFit: Efficient Few-Shot Learning Without Prompts

Hugging Face has introduced SetFit, a novel few-shot learning approach that achieves state-of-the-art performance without requiring prompt engineering. This method utilizes a two-stage process: first, it fine-tunes a model on a small set of labeled data, and then it generates synthetic data from this fine-tuned model to further train it. SetFit has demonstrated impressive results, outperforming prompt-based methods like few-shot GPT-3 on several benchmarks, and is available as an open-source library. AI
RESEARCH · EleutherAI Blog English(EN) · 61mo

Why Release a Large Language Model?

EleutherAI has detailed its reasoning for releasing large language models, emphasizing that open access is crucial for advancing AI safety research. The organization argues that significant safety studies, particularly in model interpretability, can only be effectively conducted with access to these powerful models. They believe that the potential dangers of current large language models are not world-ending and that releasing them allows for critical safety research to be performed before models become significantly more powerful and potentially uncontrollable. Furthermore, EleutherAI contends that attempts to restrict access to this technology are futile, as well-funded actors can replicate it, making open release the best strategy to empower society to study and utilize it for beneficial purposes. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 61mo · [3 sources]

Contrastive Representation Learning

Contrastive learning is a machine learning technique that creates an embedding space where similar data points are grouped together and dissimilar ones are separated. This method can be applied in both supervised and unsupervised settings, offering advantages over traditional cross-entropy loss functions, particularly in safety-critical applications. Research indicates that supervised contrastive learning can lead to more trustworthy and transparent neural networks by improving feature attribution explanations. AI
RESEARCH · Practical AI English(EN) · 61mo

Elixir meets machine learning

José Valim, the creator of Elixir, has launched Numerical Elixir (Nx), a project aimed at integrating Elixir into the machine learning landscape. This initiative includes a collaborative notebook built on Phoenix LiveView, designed to facilitate ML development. The project draws inspiration from various influences and collaborators, with the goal of bringing Elixir's capabilities to the ML domain. AI
RESEARCH · EleutherAI Blog English(EN) · 61mo

On the Sizes of OpenAI API Models

EleutherAI has estimated the parameter counts of OpenAI's API models by comparing their performance on various tasks to known benchmarks. Their analysis suggests that models like Ada, Babbage, Curie, and Davinci correspond to approximately 350 million, 1.3 billion, 6.7 billion, and 175 billion parameters, respectively. While not official figures, these estimates provide a strong indication of the scale of OpenAI's deployed models. AI
RESEARCH · EleutherAI Blog English(EN) · 61mo

Evaluating Different Fewshot Description Prompts on GPT-3

Researchers at EleutherAI investigated how different few-shot description prompts affect GPT-3's performance on the SST benchmark. Their experiments revealed that smaller GPT-2 models performed poorly and inconsistently, with performance not strictly increasing with model size. Surprisingly, the study found no correlation between different GPT models regarding which prompts yielded the best results, challenging the expectation that similar models would favor similar prompting strategies. AI
RESEARCH · EleutherAI Blog English(EN) · 61mo

Finetuning Models on Downstream Tasks

Researchers at EleutherAI explored the impact of fine-tuning the GPT-Neo 2.7B model on a diverse set of downstream tasks. They observed that while the fine-tuned model did not universally outperform the base model, it showed significant improvements on certain tasks like ANLI. However, this specialization came at the cost of performance degradation on tasks not included in the fine-tuning set, such as LAMBADA and PubMedQA, indicating a potential for catastrophic forgetting. AI
RESEARCH · EleutherAI Blog English(EN) · 61mo

Activation Function Ablation

Researchers at EleutherAI conducted an experiment to study the impact of different activation functions on GPT-like language models with approximately 100 million parameters. The models were trained for a limited duration of 10,000 iterations. While the initial goal was to demonstrate that activation functions have minimal impact, the experiment was not extensive enough to provide statistically significant conclusions, and the results are being shared publicly for potential use by others. AI
RESEARCH · Practical AI English(EN) · 61mo

Apache TVM and OctoML

Apache TVM, an open-source machine learning compiler, was developed at the University of Washington to address the challenge of deploying AI models efficiently across various hardware and software platforms. To commercialize this technology, Luis Ceze and his team founded OctoML. Their work aims to overcome the significant hurdle of getting AI applications from development to market, as a large percentage currently fail due to the complexity and cost of optimizing models for diverse environments. AI
RESEARCH · Eugene Yan English(EN) · 62mo

Search: Query Matching via Lexical, Graph, and Embedding Methods

Eugene Yan's article explores three primary methods for matching search queries to documents: lexical, graph, and embedding-based approaches. Lexical methods involve direct query string manipulation like normalization, spell checking, and expansion/relaxation. Graph-based techniques leverage knowledge graphs for deeper query understanding and expansion. Embedding-based methods utilize learned representations to achieve similar goals. The post details preprocessing steps, query expansion strategies, and how these techniques are applied in real-world systems like DoorDash's. AI
- DoorDash
- Eugene Yan
RESEARCH · EleutherAI Blog English(EN) · 62mo · [3 sources]

Downstream Evaluations of Rotary Position Embeddings

EleutherAI has released a blog post detailing Rotary Positional Embeddings (RoPE), a novel method for encoding positional information in transformer models. RoPE unifies absolute and relative positional encoding approaches and has demonstrated performance matching or surpassing existing methods across various transformer architectures. The researchers also conducted a head-to-head evaluation comparing RoPE with GPT-style learned position embeddings on 1.3B models trained on the Pile dataset, finding no strong trend but offering the results for community use. AI
RESEARCH · Practical AI English(EN) · 63mo

Next-gen voice assistants

PolyAI CEO Nikola Mrkšić discussed advancements in conversational AI and the development of next-generation voice assistants capable of human-level conversations. The company's ConveRT model has demonstrated superior performance compared to BERT and GPT-based models in evaluations, particularly in understanding various languages and accents. PolyAI's technology aims to enhance customer service interactions through more sophisticated voice assistant capabilities. AI