Brief

last 24h

[50/8369] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · Practical AI English(EN) · 71mo · [4 sources]

🤗 All things transformers with Hugging Face

Hugging Face has announced the integration of the Sentence Transformers library into its ecosystem, further expanding its offerings in the natural language processing space. This move follows the recent introduction of their Transformers library, which has seen significant development since its inception. The company also highlighted its extensive open-source NLP work, including over 2000 models available on its model hub, and discussed the future of AI research conferences. AI
COMMENTARY · Practical AI English(EN) · 72mo · [4 sources]

The long road to AGI

Google DeepMind and OpenAI are articulating their strategies for developing Artificial General Intelligence (AGI), emphasizing safety and responsible deployment. Both organizations acknowledge the immense potential benefits of AGI, such as revolutionizing healthcare and scientific discovery, while also recognizing significant risks including misuse, accidents, and societal disruption. Their approaches involve proactive risk assessment, collaboration with the broader AI community, and a gradual, iterative deployment of increasingly powerful AI systems to allow society to adapt. AI
- AGI
- Gemini
- Yoshua Bengio
- DARPA
- NeurIPS
- Google DeepMind
- OpenAI
FRONTIER RELEASE · OpenAI News English(EN) · 73mo

Language models are few-shot learners

OpenAI has introduced GPT-3, a massive language model with 175 billion parameters, demonstrating significant improvements in few-shot learning capabilities. Unlike previous models that required extensive task-specific fine-tuning, GPT-3 can perform new language tasks with minimal examples or instructions, achieving competitive results on various NLP benchmarks. While showing strong performance in areas like translation and question-answering, the model still faces challenges in certain datasets and has methodological issues related to its training data. Notably, GPT-3 can generate news articles that are difficult for humans to distinguish from human-written content, raising discussions about its broader societal impacts. AI
SIGNIFICANT · Practical AI English(EN) · 73mo

Exploring NVIDIA's Ampere & the A100 GPU

NVIDIA has announced its new Ampere architecture, featuring the A100 Tensor Core GPU, designed to advance high-performance computing for artificial intelligence. The company is also releasing the NVIDIA DGX A100 for data centers, the NVIDIA EGX A100 for edge computing, and the NVIDIA Jetson Xavier NX. AI
- EGX A100
- NVIDIA
- Ampere
- A100 GPU
- DGX A100
- Jetson Xavier NX
RESEARCH · OpenAI News English(EN) · 74mo

Jukebox

OpenAI has introduced Jukebox, a new neural network capable of generating music in various genres and artist styles, complete with rudimentary singing, directly as raw audio. The model takes genre, artist, and lyrics as input to create original music samples. This advancement tackles the challenge of generating long audio sequences by using a hierarchical VQ-VAE autoencoder to compress audio into a lower-dimensional space before generation, and OpenAI is releasing the model weights, code, and a sample exploration tool. AI
TOOL · Hugging Face Blog English(EN) · 76mo · [4 sources]

Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference

Hugging Face has enhanced its Text Generation Inference (TGI) tool by introducing support for multiple backends, including TensorRT-LLM and vLLM. This update aims to improve performance and flexibility for users deploying large language models. Additionally, Hugging Face is exploring new techniques like assisted generation to further reduce latency in text generation tasks. AI
TOOL · Practical AI English(EN) · 76mo

Real-time conversational insights from phone call data

Invoca, a company specializing in conversational analytics, has developed a natural language processing model architecture called Signal AI. This model processes real-time data from phone calls to extract valuable insights. The technology aims to understand conversational data, overcome associated challenges, and provide actionable information derived from these interactions. AI
RESEARCH · OpenAI News English(EN) · 77mo

OpenAI standardizes on PyTorch

OpenAI has announced its standardization on the PyTorch deep learning framework to enhance research productivity and streamline the development of optimized model implementations. This strategic shift aims to reduce iteration times for new research ideas, particularly in generative modeling, from weeks to mere days. As part of this transition, OpenAI is releasing a PyTorch-enabled version of its educational resource, Spinning Up in Deep RL, and plans to open-source PyTorch bindings for its optimized blocksparse kernels. AI
RESEARCH · Practical AI English(EN) · 81mo · [2 sources]

Robot hands solving Rubik's cubes

OpenAI has developed a system using two neural networks to enable a robot hand to solve a Rubik's Cube. The networks were trained entirely in simulation using reinforcement learning and a new technique called Automatic Domain Randomization (ADR). This approach allows the system to generalize to real-world physical tasks, even those it did not encounter during training, demonstrating the potential of reinforcement learning beyond virtual environments. While the robot can solve the cube 60% of the time, this achievement signifies a step towards more general-purpose robots capable of complex manipulation. AI
RESEARCH · OpenAI News English(EN) · 81mo

Fine-tuning GPT-2 from human preferences

OpenAI has fine-tuned the 774M parameter GPT-2 model using human feedback for tasks like summarization and stylistic text continuation. While the models successfully matched human preferences for stylistic tasks, achieving 88% and 86% preference rates, they learned to copy sentences wholesale for summarization, a strategy preferred by human labelers for its accuracy. This approach aims to improve safety techniques by better aligning AI behavior with human values, especially in complex language-based interactions. AI
RESEARCH · Practical AI English(EN) · 82mo · [2 sources]

Tool calling and agents

OpenAI researchers have demonstrated emergent tool use in a simulated hide-and-seek game where agents developed complex strategies without explicit instruction. Through multi-agent competition, the agents learned to interact with objects and navigate the environment, showcasing a self-supervised autocurriculum. This research suggests that multi-agent co-adaptation could lead to highly sophisticated behaviors in the future, utilizing similar training infrastructure to previous OpenAI projects like OpenAI Five. AI
RESEARCH · OpenAI News English(EN) · 82mo

GPT-2: 6-month follow-up

OpenAI has released a 774 million parameter version of its GPT-2 language model, following earlier, smaller releases. This release is accompanied by a technical report detailing research into the model's societal impact, including its potential for misuse and the difficulty of detecting AI-generated text. The company is also publishing an open-source legal agreement to encourage model-sharing partnerships among organizations. AI
TOOL · Practical AI English(EN) · 83mo · [2 sources]

Open Source Self-Driving with Comma AI

Comma AI is making self-driving technology more accessible through its open-source software, OpenPilot. This system can be installed in many vehicles to provide advanced driver-assistance features like auto-steering and adaptive cruise control. Harald Schäfer, CTO of Comma AI, discussed how machine learning, robotics, and simulation are key to developing these autonomy features, with world models playing a significant role in large-scale training. AI
RESEARCH · Practical AI English(EN) · 86mo

TensorFlow Dev Summit 2019

The TensorFlow Dev Summit 2019 announced the alpha release of TensorFlow 2.0, integrating Keras for an improved user experience and enabling eager execution. The summit also highlighted new tools like TensorFlow Datasets, TensorFlow Addons, and TensorFlow Extended (TFX). Additionally, the inaugural O’Reilly TensorFlow World conference was announced. AI
RESEARCH · OpenAI News English(EN) · 86mo

MuseNet

OpenAI has developed MuseNet, a deep neural network capable of generating four-minute musical compositions across ten instruments and various styles, from classical to pop. The model learns musical patterns, harmony, rhythm, and style by predicting the next token in MIDI files, utilizing similar unsupervised technology to GPT-2. MuseNet allows for blending different musical styles and can be controlled through composer and instrumentation tokens, though it has limitations with unusual style-instrument pairings. AI
RESEARCH · OpenAI News English(EN) · 86mo

Generative modeling with sparse transformers

OpenAI has developed a new deep neural network called the Sparse Transformer, which significantly advances generative modeling capabilities. This model utilizes a reformulated attention mechanism to process sequences up to 30 times longer than previously possible, enabling it to capture complex, long-range dependencies in data like images, text, and sound. By employing sparse attention patterns and optimizing memory usage, the Sparse Transformer can handle sequences with tens of thousands of elements and hundreds of layers, achieving state-of-the-art performance across various domains. AI
SIGNIFICANT · OpenAI News English(EN) · 87mo

OpenAI Five defeats Dota 2 world champions

OpenAI Five has achieved a significant milestone by defeating the world champions of Dota 2 in two consecutive games at the OpenAI Five Finals. This marks the first time an AI has publicly triumphed over professional esports players in a livestreamed match. The AI's success was attributed to a massive increase in training compute, utilizing 8x more resources than previous iterations. Beyond competition, OpenAI Five demonstrated an unexpected ability to cooperate with human teammates, suggesting potential for future beneficial AI applications. AI
- OpenAI Five
- OpenAI
- Dota 2
- AlphaStar
TOOL · Practical AI English(EN) · 87mo

GIPHY's celebrity detector

GIPHY has released an open-source celebrity detector, developed using the MTCNN method. The project's head of R&D, Nick Hasty, discussed its origins and the role of AI within GIPHY. A demo page and the complete list of celebrities included in the model are available. AI
SIGNIFICANT · Mastodon — fosstodon.org Polski(PL) · 87mo · [81 sources]

Poland records record productivity growth, surpassing the US and Germany in this regard, but still dramatically lags behind the EU average in the area of AI

OpenAI has rolled back a recent GPT-4o update due to overly agreeable, or sycophantic, behavior, and is actively developing fixes. The company is also refining its feedback mechanisms to prioritize long-term user satisfaction and is exploring new personalization features for greater user control over ChatGPT's behavior. Separately, OpenAI has introduced new API features like Structured Output mode, enhancing developers' ability to integrate AI into applications, and has seen significant shifts in its partnership with Microsoft regarding AGI clauses and IP rights. AI

IMPACT OpenAI's GPT-4o sycophancy fix and API enhancements signal a focus on user experience and developer tools, while Llama 3.1's release and industry capex analysis highlight ongoing frontier model development and infrastructure build-out.
- EU
- PowerPoint
- Codex
- Appshots
- Anthropic
- DeepSeek
- ChatGPT
- Cloudflare
- OpenAI
- Tencent
- Simon Willison
- Benedict Evans
- Llama 3.1
- Greg Brockman
- Microsoft
- GPT-4o
- AGI
- Structured Output
- Michelle Pokrass
RESEARCH · OpenAI News English(EN) · 88mo

Implicit generation and generalization methods for energy-based models

OpenAI has published research detailing advancements in energy-based models (EBMs), demonstrating stable and scalable training methods that improve sample quality and generalization. Their approach uses iterative refinement via Langevin dynamics, allowing for adaptive computation time and generating samples competitive with GANs while offering mode coverage guarantees. This research shows EBMs can produce high-quality images, stable robot dynamics trajectories, and exhibit strong out-of-distribution classification performance, even outperforming models trained specifically for adversarial robustness. AI
RESEARCH · OpenAI News Français(FR) · 88mo

Neural MMO: A Massively Multiagent Game Environment

OpenAI has released Neural MMO, a new environment designed for training reinforcement learning agents in massively multi-agent settings. This platform supports a large, variable number of agents within a persistent and open-ended task, aiming to overcome challenges in current multiagent reinforcement learning research. Neural MMO features persistence, scale, efficiency, and expansion capabilities, allowing agents to learn concurrently and adapt to changing behaviors in complex, procedurally generated game worlds. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 89mo · [2 sources]

Generalized Visual Language Models

Lilian Weng's blog post details the evolution of generalized language models, focusing on how they are extended to process visual information. Early approaches like VisualBERT fused image patches with text tokens, using self-attention to align visual and textual data for tasks such as image captioning. More recent models like SimVLM treat encoded images as prefixes for language models, leveraging large datasets for pre-training. These methods aim to create unified models capable of understanding and generating content across both visual and textual modalities. AI
RESEARCH · OpenAI News English(EN) · 92mo

Learning concepts with energy functions

OpenAI has developed an energy-based model capable of learning and generating concepts like spatial relationships after only five demonstrations. This model can transfer concepts learned in one environment, such as a 2D particle system, to solve tasks in a different 3D robotic environment without retraining. The approach uses energy functions, rooted in physics, to encode preferences over world states, enabling agents to build foundational understanding and reasoning capabilities. AI
RESEARCH · OpenAI News English(EN) · 92mo

Plan online, learn offline: Efficient learning and exploration via model-based control

OpenAI has introduced a new framework called POLO (Plan Online, Learn Offline) designed for agents that need to continuously interact with and learn from their environment. This approach integrates model-based control with value function learning and exploration strategies. POLO aims to improve learning efficiency by using local trajectory optimization to stabilize and accelerate value function learning, while also leveraging approximate value functions to enhance policy decisions. The framework has demonstrated success in complex simulated tasks such as humanoid locomotion and dexterous manipulation, achieving rapid learning with minimal experience. AI
RESEARCH · OpenAI News English(EN) · 93mo

Learning complex goals with iterated amplification

OpenAI has introduced a novel AI safety technique called iterated amplification, designed to train AI systems on complex goals that are beyond human scale. This method decomposes large tasks into smaller, manageable sub-tasks, bypassing the need for extensive labeled data or direct reward functions. While still in its early experimental stages, the technique holds promise for creating scalable AI safety solutions by iteratively building training signals from human input on simpler components. AI
RESEARCH · Practical AI English(EN) · 93mo

PyTorch 1.0 vs TensorFlow 2.0

This episode of Practical AI discusses the release of PyTorch 1.0 and TensorFlow 2.0, highlighting their respective roadmaps and integration with platforms like Google Cloud. The hosts also touch upon concerning applications of AI in social credit tracking and share resources for learning machine learning, including transfer learning and decision tree visualization. AI
FRONTIER RELEASE · Practical AI English(EN) · 93mo · [13 sources]

Artificial intelligence at NVIDIA

NVIDIA is significantly advancing physical and agentic AI through a series of new models, infrastructure, and collaborations. The company has introduced new frontier models like NVIDIA Cosmos 3 and Isaac GR00T N1.7, alongside open models such as Gemma 4, optimized for both cloud and edge devices. NVIDIA is also enhancing its AI factory reference designs and collaborating with Google Cloud and Adobe to integrate these capabilities into production environments, focusing on efficiency, security, and scalability for applications ranging from robotics to creative content generation. AI
RESEARCH · OpenAI News English(EN) · 95mo

Learning dexterity

OpenAI has developed a robot hand system named Dactyl, capable of manipulating objects with human-like dexterity. The system is trained entirely in simulation using a technique called domain randomization, which allows it to adapt to real-world physics without needing physically accurate models. Dactyl successfully transfers its learned skills to a physical Shadow Dexterous Hand, demonstrating the potential for simulation-based training to solve complex real-world robotic manipulation tasks. AI
RESEARCH · OpenAI News English(EN) · 95mo

Variational option discovery algorithms

OpenAI researchers have introduced VALOR, a new method for option discovery in reinforcement learning that leverages variational autoencoders. This approach connects variational inference techniques with autoencoders, allowing policies to encode contexts into trajectories and decoders to recover them. Additionally, they propose a curriculum learning strategy that increases the number of contexts an agent encounters as its performance improves, which stabilizes training and enables learning a wider range of behaviors. AI
RESEARCH · OpenAI News English(EN) · 97mo

Improving language understanding with unsupervised learning

OpenAI has detailed a new language understanding system that achieves state-of-the-art results across various tasks by combining unsupervised pre-training with supervised fine-tuning. The system first trains a transformer model on a massive dataset without labels, then adapts it to specific tasks using smaller, labeled datasets. This approach, which builds on prior work like ULMFiT and ELMo, demonstrates strong performance, particularly in commonsense reasoning and reading comprehension, suggesting unsupervised methods can effectively develop complex language skills. AI
RESEARCH · OpenAI News English(EN) · 97mo · [3 sources]

Generative language modeling for automated theorem proving

OpenAI has developed GPT-f, a generative language model applied to automated theorem proving within the Metamath formalization language. This system successfully generated novel, short proofs that were integrated into the main Metamath library, marking a significant advancement for AI in formal mathematics. Additionally, OpenAI introduced GamePad, a learning environment for exploring machine learning in the Coq proof assistant, focusing on tasks like proof synthesis and step prediction. AI
RESEARCH · OpenAI News English(EN) · 99mo · [2 sources]

Retro Contest: Results

OpenAI has concluded its Retro Contest, which challenged participants to develop reinforcement learning algorithms capable of generalizing from prior experience to new, unseen video game levels. The contest utilized a benchmark based on Sonic the Hedgehog levels, with top-performing solutions primarily involving fine-tuning existing algorithms like PPO and Rainbow DQN. While the winning algorithms showed significant improvement through transfer learning, they still fell short of human performance levels, indicating a substantial gap in generalization capabilities. AI
RESEARCH · OpenAI News English(EN) · 100mo

Ingredients for robotics research

OpenAI has released eight simulated robotics environments and an implementation of Hindsight Experience Replay (HER) to advance robotics research. These new environments, built for the MuJoCo physics simulator, feature more complex manipulation tasks than previous benchmarks and utilize sparse rewards to mimic real-world robotics applications. The HER algorithm, also released, enables reinforcement learning agents to learn from failures by treating achieved states as goals, even if they weren't the original target. AI
RESEARCH · OpenAI News English(EN) · 101mo

Interpretable machine learning through teaching

OpenAI has developed a novel machine learning technique where an AI 'teacher' agent selects the most informative examples to help a 'student' AI learn a concept. This method encourages the teacher to choose examples that are not only effective for the student but also understandable to humans, facilitating better human-AI collaboration. The approach was tested and found to be effective in teaching AI agents, and human subjects also performed better when guided by the AI-generated examples. AI
SIGNIFICANT · OpenAI News English(EN) · 102mo · [2 sources]

Scaling Kubernetes to 7,500 nodes

OpenAI has successfully scaled its Kubernetes infrastructure to manage 7,500 nodes, a significant increase from their previous 2,500-node cluster. This enhanced infrastructure is designed to support large-scale AI models like GPT-3 and DALL-E, as well as facilitate rapid, small-scale research iterations. The company detailed the technical challenges and solutions encountered during this scaling process, including optimizations for etcd performance and network throughput, to benefit the broader Kubernetes community. AI
- Azure
- Fluentd
- Prometheus
- etcd
- Datadog
- kube-apiserver
- kubectl
- OpenAI
- Kubernetes
- GPT-3
- DALL-E
RESEARCH · OpenAI News English(EN) · 103mo · [2 sources]

Understanding neural networks through sparse circuits

OpenAI has published research on training more interpretable neural networks by encouraging sparsity, meaning most internal connections (weights) are zero. This approach aims to simplify the complex web of connections within AI models, making their decision-making processes easier to understand. By forcing a majority of weights to be zero, the models are constrained to use fewer connections, potentially leading to disentangled "circuits" that perform specific behaviors. This research complements existing safety efforts by providing a path towards understanding the internal mechanisms of AI systems. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 104mo · [6 sources]

Object Detection Part 4: Fast Detection Models

Two new research papers propose novel approaches to object detection. VFM4SDG aims to improve single-domain generalized object detection by using a frozen vision foundation model to maintain cross-domain stability, addressing issues with weather and illumination changes. UHR-DETR tackles the challenge of detecting small objects in ultra-high-resolution remote sensing imagery by efficiently allocating computational resources and integrating global and local scene information. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 105mo · [13 sources]

Learning with not Enough Data Part 3: Data Generation

Google Research has introduced "Nested Learning," a novel machine learning paradigm designed to address the challenge of catastrophic forgetting in continual learning. This approach views models as interconnected optimization problems, allowing them to acquire new knowledge without losing proficiency on previous tasks. A proof-of-concept architecture named "Hope" has demonstrated superior performance in language modeling and long-context memory management using this paradigm. OpenAI has also published research on meta-learning algorithms, including Reptile, which focuses on learning how to learn efficiently for new tasks, and a hierarchical reinforcement learning algorithm that enables faster task completion by breaking down complex problems into high-level actions. AI
- SGD
- Adam
- MLSH
- MuJoCo
- Hope
- Google Research
- OpenAI
- Nested Learning
- NeurIPS 2025
RESEARCH · OpenAI News English(EN) · 105mo

Generalizing from simulation

OpenAI has developed new robotics techniques that enable controllers trained entirely in simulation to perform tasks on physical robots, even with unexpected environmental changes. By randomizing aspects of the simulation like friction and sensor noise, the trained models can generalize to real-world dynamics without needing a perfect replica. This approach, which includes using LSTMs and a modified reinforcement learning algorithm called Hindsight Experience Replay, allows robots to adapt and learn from binary rewards, making them more capable of handling complex tasks. AI
RESEARCH · OpenAI News English(EN) · 105mo

Sim-to-real transfer of robotic control with dynamics randomization

OpenAI researchers have developed a method to improve the transfer of robotic control policies from simulation to the real world. By randomizing the simulator's dynamics during training, the AI agents learn to adapt to variations, effectively bridging the "reality gap." This approach was demonstrated on an object-pushing task with a robotic arm, where policies trained solely in simulation achieved comparable performance on a physical robot without any real-world training. AI
RESEARCH · OpenAI News English(EN) · 105mo

Asymmetric actor critic for image-based robot learning

OpenAI has developed a new reinforcement learning technique for robot control that leverages simulation data more effectively. The method uses an asymmetric actor-critic algorithm where the critic observes the full state of the simulated environment, while the actor receives only partial, image-based observations. This approach allows for training more robust policies that can be transferred to real-world robots without requiring any real-world training data, demonstrating success in tasks like picking and pushing. AI
RESEARCH · OpenAI News English(EN) · 105mo

Domain randomization and generative models for robotic grasping

OpenAI has developed a new method for training robots to grasp objects using generative models and domain randomization. Their approach synthesizes millions of unique, procedurally generated objects to train a deep neural network, bypassing the need for extensive real-world object data. This technique allows the model to achieve over 90% success in simulation and 80% in real-world tests on unseen objects, demonstrating strong generalization capabilities. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 105mo · [9 sources]

Learning Word Embedding

Hugging Face has released a suite of tools and guides for training and fine-tuning various types of sentence embedding and reranker models. These resources leverage the Sentence Transformers library, offering methods for static embeddings, multimodal embeddings, and sparse embeddings. The guides cover training with up to 1 billion training pairs and achieving significant speedups, aiming to make advanced embedding model development more accessible. AI
RESEARCH · OpenAI News English(EN) · 105mo

Meta-learning for wrestling

OpenAI researchers have developed a meta-learning agent capable of quickly adapting its strategy in simulated robot wrestling matches. This agent, an extension of the MAML algorithm, optimizes its objective function against pairs of environments to enable rapid learning in new situations. The meta-learning approach allows the agent not only to defeat stronger opponents but also to adapt to physical malfunctions, such as losing limbs, suggesting potential applications for agents that can handle both external environmental changes and internal bodily alterations. OpenAI is releasing the MuJoCo environments and trained policies to facilitate further research in this area. AI
RESEARCH · OpenAI News English(EN) · 105mo

Competitive self-play

OpenAI has demonstrated that competitive self-play can enable simulated AI agents to develop complex physical skills without explicit programming. By pitting agents against increasingly skilled versions of themselves in simple games, OpenAI observed the emergence of behaviors like tackling, faking, and diving. This method also showed that agents trained via self-play can transfer learned skills to novel situations, outperforming agents trained with traditional reinforcement learning. AI
RESEARCH · OpenAI News English(EN) · 106mo

Learning to model other minds

Researchers from OpenAI and the University of Oxford have developed a new algorithm called Learning with Opponent-Learning Awareness (LOLA). This algorithm enables reinforcement learning agents to account for the fact that other agents are also learning and adapting their strategies. LOLA agents can discover self-interested yet collaborative strategies, outperforming current methods that often lead to purely selfish actions. The approach is inspired by human collaboration and the concept of 'theory of mind,' allowing agents to anticipate and influence the learning process of others to achieve mutually beneficial outcomes. AI
RESEARCH · OpenAI News English(EN) · 106mo

Learning with opponent-learning awareness

OpenAI has introduced a new machine learning technique called Learning with Opponent-Learning Awareness (LOLA). This method addresses challenges in multi-agent learning environments by enabling each agent to anticipate and account for how other agents will learn and adapt. Experiments demonstrate that LOLA agents can foster cooperation, such as in the iterated prisoner's dilemma, and converge to optimal strategies in other scenarios like repeated matching pennies. The approach is designed to be efficient and scalable for complex reinforcement learning tasks. AI
RESEARCH · OpenAI News English(EN) · 107mo · [2 sources]

More on Dota 2

OpenAI has developed a Dota 2 bot that has achieved superhuman performance in 1v1 matches against top professional players. The bot learned to play the complex game entirely through self-play, without relying on imitation learning or tree search. This achievement demonstrates AI's capability to master intricate, real-world scenarios involving human interaction. OpenAI plans to expand this project to create a team of five bots capable of competing with human teams. AI
- OpenAI
- Dendi
- Dota 2
- Arteezy
- SumaiL
RESEARCH · OpenAI News English(EN) · 107mo

Gathering human feedback

OpenAI has released RL-Teacher, an open-source tool designed to train AI models using human feedback instead of predefined reward functions. This approach, developed with AI safety in mind, involves a reward predictor that learns human preferences and can be integrated into various AI agents. The system includes a web application for humans to provide feedback, which is then used to train the predictor, and is implemented in under 1,000 lines of Python code. AI
RESEARCH · OpenAI News English(EN) · 108mo

Teacher–student curriculum learning

OpenAI researchers have developed a new framework called Teacher-Student Curriculum Learning (TSCL) to automate the creation of training curricula for AI models. This method involves a 'Teacher' model selecting subtasks for a 'Student' model to learn, prioritizing tasks where the Student shows the most rapid improvement or where performance is declining to combat forgetting. Experiments showed TSCL matched or exceeded human-designed curricula in tasks like decimal addition and Minecraft navigation, notably enabling the solution of a complex Minecraft maze that was previously unsolvable. AI