Brief

last 24h

[50/2970] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · Practical AI English(EN) · 63mo

Next-gen voice assistants

PolyAI CEO Nikola Mrkšić discussed advancements in conversational AI and the development of next-generation voice assistants capable of human-level conversations. The company's ConveRT model has demonstrated superior performance compared to BERT and GPT-based models in evaluations, particularly in understanding various languages and accents. PolyAI's technology aims to enhance customer service interactions through more sophisticated voice assistant capabilities. AI
RESEARCH · Hugging Face Blog English(EN) · 63mo

Understanding BigBird's Block Sparse Attention

BigBird is a novel attention mechanism designed to address the quadratic complexity of standard Transformer models. It achieves this by employing a sparse attention pattern, which includes global, window, and random attention, allowing it to process significantly longer sequences than traditional Transformers. This innovation makes BigBird particularly effective for tasks requiring long-range dependencies, such as document summarization and question answering on extensive texts. AI
RESEARCH · Hugging Face Blog English(EN) · 63mo · [4 sources]

Fine-Tune W2V2-Bert for low-resource ASR with 🤗 Transformers

Hugging Face has released a series of blog posts detailing how to fine-tune various Wav2Vec2 and Whisper models for Automatic Speech Recognition (ASR) tasks using their Transformers library. These guides cover adapting models for low-resource scenarios, multilingual applications, and specific languages like English. The tutorials emphasize practical implementation for researchers and developers working with speech data. AI
RESEARCH · Hugging Face Blog English(EN) · 64mo

Hugging Face Reads, Feb. 2021 - Long-range Transformers

This blog post from Hugging Face discusses the advancements in long-range Transformers, a type of neural network architecture. It explores how these models are being developed to handle longer sequences of text, overcoming previous limitations. The post likely delves into the technical aspects and potential applications of these more capable Transformer models. AI
RESEARCH · OpenAI News Italiano(IT) · 64mo

Multimodal neurons in artificial neural networks

OpenAI researchers have identified "multimodal neurons" within their CLIP model, which respond to concepts regardless of whether they are presented visually, symbolically, or textually. This discovery offers insight into how CLIP achieves high accuracy on challenging datasets by abstracting concepts, similar to how neurons in the human brain function. The findings suggest a common mechanism for abstraction in both artificial and natural vision systems, potentially explaining model versatility and compactness. AI
RESEARCH · Practical AI English(EN) · 65mo · [8 sources]

Quick, beautiful web UIs for ML apps

The Machine Learning Compilation (MLC) group, led by Tianqi Chen at CMU, is developing frameworks like MLC Chat and Web LLM to enable running large language models on consumer hardware, including iPhones and web browsers. This initiative aims to mitigate the current GPU shortage by allowing models to run locally on devices with AMD cards or even just CPUs. Projects like Hugging Face's text-to-webapp generator and Gradio are also contributing to easier deployment and accessibility of ML models for developers and end-users. AI
- CMU
- MLC Chat
- Web LLM
- LLaMA-70B
- AMD
- NVIDIA
- XGBoost
- Apache TVM
- OctoML
- Hugging Face
- Gradio
- MLCommons
- MLPerf
- MLC
- Tianqi Chen
RESEARCH · Hugging Face Blog English(EN) · 65mo

Fit More and Train Faster With ZeRO via DeepSpeed and FairScale

Hugging Face has integrated ZeRO (Zero Redundancy Optimizer) into its libraries, leveraging DeepSpeed and FairScale. This enhancement allows for more efficient training of large language models by reducing memory redundancy across distributed training setups. The optimization enables fitting larger models into memory and accelerating the training process. AI
RESEARCH · Eugene Yan English(EN) · 66mo · [9 sources]

Improving Recommendation Systems & Search in the Age of LLMs

A new paper explores the critical role of user state representation in contextual multi-armed bandit (CMAB) recommender systems, finding that variations in state representation can yield greater performance improvements than changes to the bandit algorithm itself. The research highlights that no single embedding or aggregation strategy is universally superior, emphasizing the need for domain-specific evaluations. Another study introduces BEAR, a novel fine-tuning objective for Large Language Models (LLMs) in recommendation tasks that explicitly accounts for beam search behavior during training to address inconsistencies between training and inference. Additionally, a paper proposes a methodology to measure the stability and plasticity of recommender systems, evaluating how models adapt to retraining and changes in data patterns. AI

IMPACT Advances in user state representation and LLM fine-tuning for recommendations could lead to more personalized and effective user experiences.
- arXiv
- BEAR
- LLMs
- GoodReads
- Netflix
- YouTube
- BERT
- Transformer
- Word2vec
- Hugging Face
- DagsHub
RESEARCH · OpenAI News English(EN) · 66mo

CLIP: Connecting text and images

OpenAI has introduced CLIP, a neural network designed to learn visual concepts from natural language supervision. This model can perform a wide range of image classification tasks without specific training for each benchmark, leveraging the vast amount of text paired with images available online. CLIP aims to overcome limitations of traditional computer vision models, such as the cost of creating datasets and the narrow focus of task-specific training, by achieving robust performance across various benchmarks with zero-shot capabilities. AI
RESEARCH · Hugging Face Blog English(EN) · 66mo · [4 sources]

Open Preference Dataset for Text-to-Image Generation by the 🤗 Community

OpenAI has detailed a new method for generating images from text using CLIP latents, employing a two-stage process with a prior and a decoder. This approach enhances image diversity while maintaining photorealism and caption similarity, and allows for language-guided image manipulations. Separately, OpenAI also introduced DALL-E, a 12-billion parameter GPT-3 variant capable of creating images from text descriptions, demonstrating abilities like combining concepts and rendering text. AI

IMPACT Introduces new techniques for text-to-image generation, potentially improving diversity and controllability.
- OpenAI
- DALL-E
- GPT-3
- DALL-E 2
- Hugging Face
RESEARCH · Practical AI English(EN) · 67mo

The world's largest open library dataset

Unsplash has released a massive open dataset containing over 2 million high-quality photos, 5 million keywords, and 250 million searches. The company aims to facilitate machine learning and AI development with this extensive collection. This release has already sparked interest and led to various applications within the AI community. AI
RESEARCH · Hugging Face Blog English(EN) · 68mo

Leveraging Pre-trained Language Model Checkpoints for Encoder-Decoder Models

Hugging Face has released a guide on how to leverage pre-trained language model checkpoints for encoder-decoder models. This technique, known as warm-starting, can significantly improve training efficiency and performance. The blog post details methods for adapting existing checkpoints to new tasks, offering practical advice for researchers and developers. AI
RESEARCH · Hugging Face Blog English(EN) · 68mo

Porting fairseq wmt19 translation system to transformers

Researchers have successfully ported the fairseq WMT19 translation system to the Hugging Face Transformers library. This effort aims to make advanced translation models more accessible and easier to use within the popular Transformers ecosystem. The porting process involved adapting the model architecture and training configurations to align with the standards and practices of the Transformers library, facilitating further research and development in machine translation. AI
RESEARCH · Hugging Face Blog Dansk(DA) · 69mo · [3 sources]

Transformer-based Encoder-Decoder Models

Google DeepMind has introduced T5Gemma, a new family of encoder-decoder large language models derived from their existing Gemma 2 models. This adaptation technique allows for flexible combinations of encoder and decoder sizes, enabling a better balance between model quality and inference efficiency. Experiments show T5Gemma models achieve performance comparable to or exceeding their decoder-only Gemma counterparts across various benchmarks, offering significant advantages in speed and accuracy for tasks like math reasoning and reading comprehension. AI
- Hugging Face
- Google DeepMind
- T5Gemma
- DROP
- SuperGLUE
- Gemma 2 2B
- Gemma 2 9B
- GSM8K
- Gemma 2
RESEARCH · OpenAI News English(EN) · 70mo · [2 sources]

Summarizing books with human feedback

OpenAI has developed a new method for aligning AI models with human intentions, focusing on the challenge of evaluating outputs for complex tasks like book summarization. Their approach uses recursive task decomposition, breaking down the summarization of an entire book into smaller, more manageable sections. This allows human evaluators to provide feedback more efficiently, even when the source material is extensive. The fine-tuned GPT-3 model demonstrates impressive performance, achieving quality comparable to human-written summaries and setting new benchmarks in book-length summarization and question-answering tasks. AI
RESEARCH · Practical AI English(EN) · 71mo · [4 sources]

🤗 All things transformers with Hugging Face

Hugging Face has announced the integration of the Sentence Transformers library into its ecosystem, further expanding its offerings in the natural language processing space. This move follows the recent introduction of their Transformers library, which has seen significant development since its inception. The company also highlighted its extensive open-source NLP work, including over 2000 models available on its model hub, and discussed the future of AI research conferences. AI
RESEARCH · OpenAI News English(EN) · 74mo

Jukebox

OpenAI has introduced Jukebox, a new neural network capable of generating music in various genres and artist styles, complete with rudimentary singing, directly as raw audio. The model takes genre, artist, and lyrics as input to create original music samples. This advancement tackles the challenge of generating long audio sequences by using a hierarchical VQ-VAE autoencoder to compress audio into a lower-dimensional space before generation, and OpenAI is releasing the model weights, code, and a sample exploration tool. AI
RESEARCH · OpenAI News English(EN) · 77mo

OpenAI standardizes on PyTorch

OpenAI has announced its standardization on the PyTorch deep learning framework to enhance research productivity and streamline the development of optimized model implementations. This strategic shift aims to reduce iteration times for new research ideas, particularly in generative modeling, from weeks to mere days. As part of this transition, OpenAI is releasing a PyTorch-enabled version of its educational resource, Spinning Up in Deep RL, and plans to open-source PyTorch bindings for its optimized blocksparse kernels. AI
RESEARCH · Practical AI English(EN) · 81mo · [2 sources]

Robot hands solving Rubik's cubes

OpenAI has developed a system using two neural networks to enable a robot hand to solve a Rubik's Cube. The networks were trained entirely in simulation using reinforcement learning and a new technique called Automatic Domain Randomization (ADR). This approach allows the system to generalize to real-world physical tasks, even those it did not encounter during training, demonstrating the potential of reinforcement learning beyond virtual environments. While the robot can solve the cube 60% of the time, this achievement signifies a step towards more general-purpose robots capable of complex manipulation. AI
RESEARCH · OpenAI News English(EN) · 81mo

Fine-tuning GPT-2 from human preferences

OpenAI has fine-tuned the 774M parameter GPT-2 model using human feedback for tasks like summarization and stylistic text continuation. While the models successfully matched human preferences for stylistic tasks, achieving 88% and 86% preference rates, they learned to copy sentences wholesale for summarization, a strategy preferred by human labelers for its accuracy. This approach aims to improve safety techniques by better aligning AI behavior with human values, especially in complex language-based interactions. AI
RESEARCH · Practical AI English(EN) · 82mo · [2 sources]

Tool calling and agents

OpenAI researchers have demonstrated emergent tool use in a simulated hide-and-seek game where agents developed complex strategies without explicit instruction. Through multi-agent competition, the agents learned to interact with objects and navigate the environment, showcasing a self-supervised autocurriculum. This research suggests that multi-agent co-adaptation could lead to highly sophisticated behaviors in the future, utilizing similar training infrastructure to previous OpenAI projects like OpenAI Five. AI
RESEARCH · OpenAI News English(EN) · 82mo

GPT-2: 6-month follow-up

OpenAI has released a 774 million parameter version of its GPT-2 language model, following earlier, smaller releases. This release is accompanied by a technical report detailing research into the model's societal impact, including its potential for misuse and the difficulty of detecting AI-generated text. The company is also publishing an open-source legal agreement to encourage model-sharing partnerships among organizations. AI
RESEARCH · Practical AI English(EN) · 86mo

TensorFlow Dev Summit 2019

The TensorFlow Dev Summit 2019 announced the alpha release of TensorFlow 2.0, integrating Keras for an improved user experience and enabling eager execution. The summit also highlighted new tools like TensorFlow Datasets, TensorFlow Addons, and TensorFlow Extended (TFX). Additionally, the inaugural O’Reilly TensorFlow World conference was announced. AI
RESEARCH · OpenAI News English(EN) · 86mo

MuseNet

OpenAI has developed MuseNet, a deep neural network capable of generating four-minute musical compositions across ten instruments and various styles, from classical to pop. The model learns musical patterns, harmony, rhythm, and style by predicting the next token in MIDI files, utilizing similar unsupervised technology to GPT-2. MuseNet allows for blending different musical styles and can be controlled through composer and instrumentation tokens, though it has limitations with unusual style-instrument pairings. AI
RESEARCH · OpenAI News English(EN) · 86mo

Generative modeling with sparse transformers

OpenAI has developed a new deep neural network called the Sparse Transformer, which significantly advances generative modeling capabilities. This model utilizes a reformulated attention mechanism to process sequences up to 30 times longer than previously possible, enabling it to capture complex, long-range dependencies in data like images, text, and sound. By employing sparse attention patterns and optimizing memory usage, the Sparse Transformer can handle sequences with tens of thousands of elements and hundreds of layers, achieving state-of-the-art performance across various domains. AI
RESEARCH · OpenAI News English(EN) · 88mo

Implicit generation and generalization methods for energy-based models

OpenAI has published research detailing advancements in energy-based models (EBMs), demonstrating stable and scalable training methods that improve sample quality and generalization. Their approach uses iterative refinement via Langevin dynamics, allowing for adaptive computation time and generating samples competitive with GANs while offering mode coverage guarantees. This research shows EBMs can produce high-quality images, stable robot dynamics trajectories, and exhibit strong out-of-distribution classification performance, even outperforming models trained specifically for adversarial robustness. AI
RESEARCH · OpenAI News Français(FR) · 88mo

Neural MMO: A Massively Multiagent Game Environment

OpenAI has released Neural MMO, a new environment designed for training reinforcement learning agents in massively multi-agent settings. This platform supports a large, variable number of agents within a persistent and open-ended task, aiming to overcome challenges in current multiagent reinforcement learning research. Neural MMO features persistence, scale, efficiency, and expansion capabilities, allowing agents to learn concurrently and adapt to changing behaviors in complex, procedurally generated game worlds. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 89mo · [2 sources]

Generalized Visual Language Models

Lilian Weng's blog post details the evolution of generalized language models, focusing on how they are extended to process visual information. Early approaches like VisualBERT fused image patches with text tokens, using self-attention to align visual and textual data for tasks such as image captioning. More recent models like SimVLM treat encoded images as prefixes for language models, leveraging large datasets for pre-training. These methods aim to create unified models capable of understanding and generating content across both visual and textual modalities. AI
RESEARCH · OpenAI News English(EN) · 92mo

Learning concepts with energy functions

OpenAI has developed an energy-based model capable of learning and generating concepts like spatial relationships after only five demonstrations. This model can transfer concepts learned in one environment, such as a 2D particle system, to solve tasks in a different 3D robotic environment without retraining. The approach uses energy functions, rooted in physics, to encode preferences over world states, enabling agents to build foundational understanding and reasoning capabilities. AI
RESEARCH · OpenAI News English(EN) · 92mo

Plan online, learn offline: Efficient learning and exploration via model-based control

OpenAI has introduced a new framework called POLO (Plan Online, Learn Offline) designed for agents that need to continuously interact with and learn from their environment. This approach integrates model-based control with value function learning and exploration strategies. POLO aims to improve learning efficiency by using local trajectory optimization to stabilize and accelerate value function learning, while also leveraging approximate value functions to enhance policy decisions. The framework has demonstrated success in complex simulated tasks such as humanoid locomotion and dexterous manipulation, achieving rapid learning with minimal experience. AI
RESEARCH · OpenAI News English(EN) · 93mo

Learning complex goals with iterated amplification

OpenAI has introduced a novel AI safety technique called iterated amplification, designed to train AI systems on complex goals that are beyond human scale. This method decomposes large tasks into smaller, manageable sub-tasks, bypassing the need for extensive labeled data or direct reward functions. While still in its early experimental stages, the technique holds promise for creating scalable AI safety solutions by iteratively building training signals from human input on simpler components. AI
RESEARCH · Practical AI English(EN) · 93mo

PyTorch 1.0 vs TensorFlow 2.0

This episode of Practical AI discusses the release of PyTorch 1.0 and TensorFlow 2.0, highlighting their respective roadmaps and integration with platforms like Google Cloud. The hosts also touch upon concerning applications of AI in social credit tracking and share resources for learning machine learning, including transfer learning and decision tree visualization. AI
RESEARCH · OpenAI News English(EN) · 95mo

Learning dexterity

OpenAI has developed a robot hand system named Dactyl, capable of manipulating objects with human-like dexterity. The system is trained entirely in simulation using a technique called domain randomization, which allows it to adapt to real-world physics without needing physically accurate models. Dactyl successfully transfers its learned skills to a physical Shadow Dexterous Hand, demonstrating the potential for simulation-based training to solve complex real-world robotic manipulation tasks. AI
RESEARCH · OpenAI News English(EN) · 95mo

Variational option discovery algorithms

OpenAI researchers have introduced VALOR, a new method for option discovery in reinforcement learning that leverages variational autoencoders. This approach connects variational inference techniques with autoencoders, allowing policies to encode contexts into trajectories and decoders to recover them. Additionally, they propose a curriculum learning strategy that increases the number of contexts an agent encounters as its performance improves, which stabilizes training and enables learning a wider range of behaviors. AI
RESEARCH · OpenAI News English(EN) · 97mo

Improving language understanding with unsupervised learning

OpenAI has detailed a new language understanding system that achieves state-of-the-art results across various tasks by combining unsupervised pre-training with supervised fine-tuning. The system first trains a transformer model on a massive dataset without labels, then adapts it to specific tasks using smaller, labeled datasets. This approach, which builds on prior work like ULMFiT and ELMo, demonstrates strong performance, particularly in commonsense reasoning and reading comprehension, suggesting unsupervised methods can effectively develop complex language skills. AI
RESEARCH · OpenAI News English(EN) · 97mo · [3 sources]

Generative language modeling for automated theorem proving

OpenAI has developed GPT-f, a generative language model applied to automated theorem proving within the Metamath formalization language. This system successfully generated novel, short proofs that were integrated into the main Metamath library, marking a significant advancement for AI in formal mathematics. Additionally, OpenAI introduced GamePad, a learning environment for exploring machine learning in the Coq proof assistant, focusing on tasks like proof synthesis and step prediction. AI
RESEARCH · OpenAI News English(EN) · 99mo · [2 sources]

Retro Contest: Results

OpenAI has concluded its Retro Contest, which challenged participants to develop reinforcement learning algorithms capable of generalizing from prior experience to new, unseen video game levels. The contest utilized a benchmark based on Sonic the Hedgehog levels, with top-performing solutions primarily involving fine-tuning existing algorithms like PPO and Rainbow DQN. While the winning algorithms showed significant improvement through transfer learning, they still fell short of human performance levels, indicating a substantial gap in generalization capabilities. AI
RESEARCH · OpenAI News English(EN) · 100mo

Ingredients for robotics research

OpenAI has released eight simulated robotics environments and an implementation of Hindsight Experience Replay (HER) to advance robotics research. These new environments, built for the MuJoCo physics simulator, feature more complex manipulation tasks than previous benchmarks and utilize sparse rewards to mimic real-world robotics applications. The HER algorithm, also released, enables reinforcement learning agents to learn from failures by treating achieved states as goals, even if they weren't the original target. AI
RESEARCH · OpenAI News English(EN) · 101mo

Interpretable machine learning through teaching

OpenAI has developed a novel machine learning technique where an AI 'teacher' agent selects the most informative examples to help a 'student' AI learn a concept. This method encourages the teacher to choose examples that are not only effective for the student but also understandable to humans, facilitating better human-AI collaboration. The approach was tested and found to be effective in teaching AI agents, and human subjects also performed better when guided by the AI-generated examples. AI
RESEARCH · OpenAI News English(EN) · 103mo · [2 sources]

Understanding neural networks through sparse circuits

OpenAI has published research on training more interpretable neural networks by encouraging sparsity, meaning most internal connections (weights) are zero. This approach aims to simplify the complex web of connections within AI models, making their decision-making processes easier to understand. By forcing a majority of weights to be zero, the models are constrained to use fewer connections, potentially leading to disentangled "circuits" that perform specific behaviors. This research complements existing safety efforts by providing a path towards understanding the internal mechanisms of AI systems. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 104mo · [6 sources]

Object Detection Part 4: Fast Detection Models

Two new research papers propose novel approaches to object detection. VFM4SDG aims to improve single-domain generalized object detection by using a frozen vision foundation model to maintain cross-domain stability, addressing issues with weather and illumination changes. UHR-DETR tackles the challenge of detecting small objects in ultra-high-resolution remote sensing imagery by efficiently allocating computational resources and integrating global and local scene information. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 105mo · [13 sources]

Learning with not Enough Data Part 3: Data Generation

Google Research has introduced "Nested Learning," a novel machine learning paradigm designed to address the challenge of catastrophic forgetting in continual learning. This approach views models as interconnected optimization problems, allowing them to acquire new knowledge without losing proficiency on previous tasks. A proof-of-concept architecture named "Hope" has demonstrated superior performance in language modeling and long-context memory management using this paradigm. OpenAI has also published research on meta-learning algorithms, including Reptile, which focuses on learning how to learn efficiently for new tasks, and a hierarchical reinforcement learning algorithm that enables faster task completion by breaking down complex problems into high-level actions. AI
- Google Research
- Hope
- OpenAI
- MLSH
- SGD
- Adam
- MuJoCo
- NeurIPS 2025
- Nested Learning
RESEARCH · OpenAI News English(EN) · 105mo

Generalizing from simulation

OpenAI has developed new robotics techniques that enable controllers trained entirely in simulation to perform tasks on physical robots, even with unexpected environmental changes. By randomizing aspects of the simulation like friction and sensor noise, the trained models can generalize to real-world dynamics without needing a perfect replica. This approach, which includes using LSTMs and a modified reinforcement learning algorithm called Hindsight Experience Replay, allows robots to adapt and learn from binary rewards, making them more capable of handling complex tasks. AI
RESEARCH · OpenAI News English(EN) · 105mo

Asymmetric actor critic for image-based robot learning

OpenAI has developed a new reinforcement learning technique for robot control that leverages simulation data more effectively. The method uses an asymmetric actor-critic algorithm where the critic observes the full state of the simulated environment, while the actor receives only partial, image-based observations. This approach allows for training more robust policies that can be transferred to real-world robots without requiring any real-world training data, demonstrating success in tasks like picking and pushing. AI
RESEARCH · OpenAI News English(EN) · 105mo

Sim-to-real transfer of robotic control with dynamics randomization

OpenAI researchers have developed a method to improve the transfer of robotic control policies from simulation to the real world. By randomizing the simulator's dynamics during training, the AI agents learn to adapt to variations, effectively bridging the "reality gap." This approach was demonstrated on an object-pushing task with a robotic arm, where policies trained solely in simulation achieved comparable performance on a physical robot without any real-world training. AI
RESEARCH · OpenAI News English(EN) · 105mo

Domain randomization and generative models for robotic grasping

OpenAI has developed a new method for training robots to grasp objects using generative models and domain randomization. Their approach synthesizes millions of unique, procedurally generated objects to train a deep neural network, bypassing the need for extensive real-world object data. This technique allows the model to achieve over 90% success in simulation and 80% in real-world tests on unseen objects, demonstrating strong generalization capabilities. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 105mo · [9 sources]

Learning Word Embedding

Hugging Face has released a suite of tools and guides for training and fine-tuning various types of sentence embedding and reranker models. These resources leverage the Sentence Transformers library, offering methods for static embeddings, multimodal embeddings, and sparse embeddings. The guides cover training with up to 1 billion training pairs and achieving significant speedups, aiming to make advanced embedding model development more accessible. AI
RESEARCH · OpenAI News English(EN) · 105mo

Competitive self-play

OpenAI has demonstrated that competitive self-play can enable simulated AI agents to develop complex physical skills without explicit programming. By pitting agents against increasingly skilled versions of themselves in simple games, OpenAI observed the emergence of behaviors like tackling, faking, and diving. This method also showed that agents trained via self-play can transfer learned skills to novel situations, outperforming agents trained with traditional reinforcement learning. AI
RESEARCH · OpenAI News English(EN) · 105mo

Meta-learning for wrestling

OpenAI researchers have developed a meta-learning agent capable of quickly adapting its strategy in simulated robot wrestling matches. This agent, an extension of the MAML algorithm, optimizes its objective function against pairs of environments to enable rapid learning in new situations. The meta-learning approach allows the agent not only to defeat stronger opponents but also to adapt to physical malfunctions, such as losing limbs, suggesting potential applications for agents that can handle both external environmental changes and internal bodily alterations. OpenAI is releasing the MuJoCo environments and trained policies to facilitate further research in this area. AI
RESEARCH · OpenAI News English(EN) · 106mo

Learning to model other minds

Researchers from OpenAI and the University of Oxford have developed a new algorithm called Learning with Opponent-Learning Awareness (LOLA). This algorithm enables reinforcement learning agents to account for the fact that other agents are also learning and adapting their strategies. LOLA agents can discover self-interested yet collaborative strategies, outperforming current methods that often lead to purely selfish actions. The approach is inspired by human collaboration and the concept of 'theory of mind,' allowing agents to anticipate and influence the learning process of others to achieve mutually beneficial outcomes. AI