Pulse

last 48h

[50/2008] 98 sources

What AI is actually talking about — clusters surfacing on Bluesky, Reddit, HN, Mastodon and Lobsters, re-ranked to elevate originality and crush noise.

RESEARCH · Lil'Log (Lilian Weng) English(EN) · 59mo · BLOG

What are Diffusion Models?

Lilian Weng's blog post provides a comprehensive overview of diffusion models, a type of generative model inspired by non-equilibrium thermodynamics. The post details the forward diffusion process, where noise is gradually added to data until it resembles a Gaussian distribution. It also explains the reverse diffusion process, which learns to reconstruct data from noise, and discusses connections to stochastic gradient Langevin dynamics. The article has been updated multiple times to include recent advancements like classifier-free guidance and latent diffusion models. AI
COMMENTARY · HN — AI infrastructure stories English(EN) · 60mo · HN

CVPR panels on the future of data and ML infra (R.Socher, HF, W&B, Google, MSFT)

Two panels are scheduled to coincide with the CVPR conference, focusing on the future of datasets and next-generation ML infrastructure. The first panel, on data-centric approaches, will feature experts from ImageNet, Hugging Face, Google, and Microsoft. The second panel will delve into ML infrastructure for computer vision, with speakers from Weights & Biases, Anyscale, OctoML, Paperspace, Gantry, and Activeloop. AI

IMPACT Discusses key trends in ML data and infrastructure, offering insights into future development directions.
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 61mo · [3 sources] · BLOG

Contrastive Representation Learning

Contrastive learning is a machine learning technique that creates an embedding space where similar data points are grouped together and dissimilar ones are separated. This method can be applied in both supervised and unsupervised settings, offering advantages over traditional cross-entropy loss functions, particularly in safety-critical applications. Research indicates that supervised contrastive learning can lead to more trustworthy and transparent neural networks by improving feature attribution explanations. AI
RESEARCH · Eugene Yan English(EN) · 62mo · BLOG

Search: Query Matching via Lexical, Graph, and Embedding Methods

Eugene Yan's article explores three primary methods for matching search queries to documents: lexical, graph, and embedding-based approaches. Lexical methods involve direct query string manipulation like normalization, spell checking, and expansion/relaxation. Graph-based techniques leverage knowledge graphs for deeper query understanding and expansion. Embedding-based methods utilize learned representations to achieve similar goals. The post details preprocessing steps, query expansion strategies, and how these techniques are applied in real-world systems like DoorDash's. AI
COMMENTARY · Eugene Yan English(EN) · 64mo · BLOG

How to Write Design Docs for Machine Learning Systems

Eugene Yan's article outlines a structured approach to creating design documents for machine learning systems, emphasizing their role in clarifying thought and facilitating feedback. The author suggests a 'Why, What, How' framework to guide the document's content, covering problem motivation, success criteria, and system requirements. Yan also details a two-step review process to ensure thorough evaluation and mitigate risks associated with late-stage design flaws. AI
COMMENTARY · Eugene Yan English(EN) · 64mo · BLOG

How to Write Better with The Why, What, How Framework

Eugene Yan's article outlines a framework for effective technical writing, particularly for data science and machine learning projects. He emphasizes the importance of detailed documentation, drawing parallels to Amazon's rigorous writing culture. Yan introduces three types of documents: one-pagers for stakeholder alignment, design documents for peer feedback, and after-action reviews for reflection and learning. The core of his approach is the "Why-What-How" framework, which structures documents by first establishing the importance and context (Why), then detailing the proposed solution (What), and finally outlining the implementation plan (How). AI
COMMENTARY · Eugene Yan English(EN) · 64mo · BLOG

Feature Stores: A Hierarchy of Needs

Eugene Yan's article explores the concept of feature stores in machine learning, drawing an analogy to Maslow's Hierarchy of Needs. The author posits that managing features is a significant bottleneck in deploying ML models. Yan categorizes feature store needs into a hierarchy, starting with fundamental requirements like data access and reusability, progressing to serving features in real-time, ensuring data integrity, and finally reaching higher-level needs such as convenience and automation. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 65mo · [4 sources] · BLOGREDDIT

Large Transformer Model Inference Optimization

Large transformer models present significant inference challenges due to their substantial memory footprint and computation costs, which scale quadratically with input length. Researchers and practitioners are exploring various optimization techniques to mitigate these issues. These methods include network compression strategies like pruning, quantization, and knowledge distillation, as well as architectural improvements and efficient parallelism. The goal is to reduce memory usage, computation complexity, and inference latency for practical, large-scale deployment. AI
RESEARCH · Eugene Yan English(EN) · 66mo · [9 sources] · BLOG

Improving Recommendation Systems & Search in the Age of LLMs

A new paper explores the critical role of user state representation in contextual multi-armed bandit (CMAB) recommender systems, finding that variations in state representation can yield greater performance improvements than changes to the bandit algorithm itself. The research highlights that no single embedding or aggregation strategy is universally superior, emphasizing the need for domain-specific evaluations. Another study introduces BEAR, a novel fine-tuning objective for Large Language Models (LLMs) in recommendation tasks that explicitly accounts for beam search behavior during training to address inconsistencies between training and inference. Additionally, a paper proposes a methodology to measure the stability and plasticity of recommender systems, evaluating how models adapt to retraining and changes in data patterns. AI

IMPACT Advances in user state representation and LLM fine-tuning for recommendations could lead to more personalized and effective user experiences.
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 66mo · BLOG

Controllable Neural Text Generation

This post explores methods for controlling the output of large language models, which are typically trained on vast amounts of unsupervised web data. Current methods aim to steer these models without altering their core weights, focusing on techniques like guided decoding strategies and prompt design. While these approaches offer ways to influence generated text attributes such as topic and style, the author notes that true model steerability remains an active research area with ongoing exploration of various pros and cons. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 68mo · BLOG

How to Build an Open-Domain Question Answering System?

Lilian Weng's blog post details methods for constructing open-domain question-answering (ODQA) systems, focusing on Transformer-based language models. The post distinguishes ODQA from reading comprehension by highlighting the absence of provided context for factual questions. It also discusses challenges in QA data fine-tuning, where test-set questions or answers may appear in training sets, potentially inflating performance metrics. AI
TOOL · Eugene Yan English(EN) · 68mo · BLOG

How to Install Google Scalable Nearest Neighbors (ScaNN) on Mac

Eugene Yan has published a guide detailing the process of installing Google's Scalable Nearest Neighbors (ScaNN) library on a Mac operating system. The guide addresses the complexities encountered during installation, providing step-by-step instructions for setting up necessary compilers, Python versions, and virtual environments. It also outlines specific code modifications and build commands required to successfully compile and install the ScaNN package, which is designed for efficient vector similarity search and reportedly outperforms state-of-the-art benchmarks. AI
RESEARCH · Eugene Yan English(EN) · 69mo · [3 sources] · BLOG

RecSys 2022: Recap, Favorite Papers, and Lessons

Eugene Yan's RecSys 2022 recap highlights a significant increase in industry submissions and a focus on algorithmic advancements and real-world applications. Key papers explored efficient training for sequential recommendations using recency sampling and the application of bandit algorithms to simulate industry challenges, particularly concerning concept drift. The conference also saw continued emphasis on fairness, privacy, and reproducibility, with several papers reproducing established models like BERT4Rec. AI
RESEARCH · Eugene Yan English(EN) · 70mo · [2 sources] · BLOG

How Reading Papers Helps You Be a More Effective Data Scientist

A new arXiv paper details a study comparing BERT and T5 models for Named Entity Recognition (NER), analyzing their performance with different tag schemes and hyperparameters. The research aims to provide insights into common errors and compare the architectures for practical applications. Separately, an article discusses the benefits of reading research papers for data scientists, highlighting how it can improve effectiveness by learning from existing work and staying updated on advancements. AI

IMPACT Research papers offer valuable insights and practical applications for AI professionals, helping them stay updated and avoid reinventing the wheel.
RESEARCH · 量子位 (QbitAI) 中文(ZH) · 71mo · [234 sources] · BSKYHNMASTOREDDITX

Secured 70 billion yuan in funding! DeepSeek Code is really coming, ACM gold medalist Cui Tianyi is in charge

New research explores the challenges and advancements in AI-native code generation, focusing on improving efficiency, reliability, and safety. Papers introduce novel architectures like MicroSkill for better context management and modular knowledge encapsulation, reducing token consumption and increasing compilation success rates. Other studies benchmark coding agents' performance on complex tasks, including their ability to handle underspecified user intent and detect potential sabotage, highlighting the need for human-centric safety mechanisms and robust evaluation frameworks. AI

IMPACT New benchmarks and architectures are pushing the boundaries of AI coding agents, addressing efficiency, safety, and complex task handling.
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 71mo · BLOG

Neural Architecture Search

Neural Architecture Search (NAS) is a field focused on automating the design of high-performance neural network architectures. It typically involves three main components: a search space defining possible operations and connections, a search algorithm to sample candidate architectures, and an evaluation strategy to assess their performance. Early NAS methods, like those by Zoph & Le and Baker et al., used sequential layer-wise operations, which were computationally intensive, requiring hundreds of GPUs for extended periods. More recent approaches, inspired by successful modular designs, employ cell-based representations to improve efficiency. AI
RESEARCH · Eugene Yan English(EN) · 71mo · [2 sources] · BLOG

Building the Same App Using Various Web Frameworks

Eugene Yan details his experience building a web application using various modern frameworks, including FastHTML, Next.js, and SvelteKit. He compares their developer experiences by implementing the same data manipulation app in each. Yan also explores extending a FastAPI application with interactive elements like checkboxes and download buttons, demonstrating how to handle form submissions and file responses. AI

IMPACT Provides practical examples of web app development using Python frameworks and interactive HTML elements.
RESEARCH · Eugene Yan English(EN) · 71mo · BLOG

How to Set Up a HTML App with FastAPI, Jinja, Forms & Templates

Eugene Yan has published a guide detailing how to create HTML applications using FastAPI, Jinja, and HTML forms. The article addresses a gap in existing documentation by explaining how to serve HTML content with FastAPI, a framework Yan recently adopted from Flask. The tutorial includes code examples for setting up the necessary dependencies, creating a basic REST API, and integrating Jinja templating for dynamic web pages, along with a GitHub repository for reference. AI
RESEARCH · Eugene Yan English(EN) · 72mo · [2 sources] · BLOG

My Notes From Spark+AI Summit 2020 (Application-Specific Talks)

Eugene Yan's notes from the Spark+AI Summit 2020 cover practical applications and agnostic talks in deep learning and data engineering. Application-specific sessions highlighted frameworks like Airbnb's Zipline for feature engineering and Sputnik for data engineering, alongside Gojek's Feast and Netflix's data quality approaches. The agnostic talks focused on improving deep learning efficiency through techniques such as model pruning, quantization, and distillation, with examples from IBM and Instagram. AI
COMMENTARY · Eugene Yan English(EN) · 72mo · [2 sources] · BLOG

What I Do During A Data Science Project To Deliver Success

Eugene Yan outlines best practices for executing data science projects, emphasizing the importance of a clear plan and effective communication. He suggests starting with a literature review to build upon existing research and using tools like Jupyter notebooks for rapid experimentation. Yan also highlights the value of daily stand-up meetings to maintain team alignment and identify potential blockers early in the process. AI
COMMENTARY · Eugene Yan English(EN) · 74mo · BLOG

Serendipity: Accuracy’s Unpopular Best Friend in Recommenders

Accuracy is not the sole metric for evaluating recommender systems, as serendipity—the ability to pleasantly surprise users—is also crucial for long-term engagement. While accuracy metrics like NDCG and MAP are widely available and taught, metrics for serendipity are scarce, making them harder to implement and evaluate. Incorporating serendipity can lead to better assortment health and seller diversity by promoting long-tail products, creating a virtuous cycle of user engagement and data collection. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 75mo · [2 sources] · BLOG

The Transformer Family Version 2.0

Lilian Weng has updated her comprehensive blog post detailing the Transformer architecture and its numerous advancements since its initial introduction. The updated version, "The Transformer Family Version 2.0," significantly expands on the original, incorporating recent research and modifications to the foundational model. It delves into core concepts like attention, self-attention, multi-head self-attention, and the encoder-decoder structure, providing a detailed overview of how these components function and have been enhanced. AI
RESEARCH · Eugene Yan English(EN) · 76mo · BLOG

Simpler Experimentation with Jupyter, Papermill, and MLflow

Eugene Yan's article details a streamlined workflow for machine learning experimentation using Jupyter, Papermill, and MLflow. This approach avoids notebook duplication and manual tracking by parameterizing notebooks with Papermill for running multiple experiments and logging results. MLflow then centralizes the metrics and artifacts, providing a unified interface for managing and referencing experiment outputs, which is particularly useful for tasks like fraud detection across different regions or stock index prediction. AI
RESEARCH · Practical AI English(EN) · 77mo · [6 sources] · BLOGX

Stanford's AI Index Report 2024

Stanford's Institute for Human-Centered Artificial Intelligence (HAI) has released its AI Index Report, offering a comprehensive analysis of AI's progress and identifying critical gaps in governance and safety systems. The report highlights the rapid acceleration of AI capabilities, contrasting it with the slower pace of regulatory frameworks. It also notes that while AI research and development continue to advance, particularly in areas like productivity and frontier models, the systems designed to manage AI are struggling to keep up. AI
RESEARCH · Practical AI English(EN) · 77mo · [2 sources] · BLOG

Testing ML systems

Eugene Yan's article details a comprehensive approach to testing machine learning systems, differentiating between traditional software tests and ML-specific tests. ML tests are further categorized into pre-train tests for implementation correctness, post-train tests for expected learned behavior, and evaluation metrics for performance assessment. The author uses a DecisionTree implementation and the Titanic dataset to demonstrate these testing methodologies, incorporating practices like unit testing, code coverage, linting, and type checking. AI
RESEARCH · Eugene Yan English(EN) · 78mo · [2 sources] · BLOG

Beating the Baseline Recommender with Graph & NLP in Pytorch

Eugene Yan's blog posts detail methods for building recommender systems that outperform baseline matrix factorization models. The approach involves using Natural Language Processing (NLP) techniques, specifically word2vec, to generate vector representations of products based on their relationships. These product embeddings are then used to make recommendations by identifying similar items, drawing inspiration from graph-based learning methods like DeepWalk. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 80mo · BLOG

Self-Supervised Representation Learning

This post explores self-supervised learning, a method that leverages readily available unlabeled data by creating supervised tasks from the data itself. The core idea is to train models on these 'pretext' tasks, not for their own sake, but to learn intermediate representations that are useful for various downstream applications. This approach addresses the high cost and limited scalability of manual data labeling, enabling the exploitation of vast amounts of unlabeled text and images. The post highlights its application in language modeling and discusses image-based self-supervised learning techniques. AI
TOOL · HN — AI infrastructure stories English(EN) · 82mo · HN

MLIR Primer: A Compiler Infrastructure for the End of Moore’s Law

Google researchers have published a primer on MLIR, a compiler infrastructure designed to address the challenges posed by the end of Moore's Law in AI development. MLIR aims to provide a unified framework for optimizing machine learning workloads across diverse hardware architectures. This approach is crucial for maintaining performance gains as traditional hardware scaling slows down. AI

IMPACT MLIR offers a unified approach to optimize AI workloads across diverse hardware, crucial for continued performance gains as traditional hardware scaling slows.
COMMENTARY · Eugene Yan English(EN) · 86mo · BLOG

OMSCS CS7646 (Machine Learning for Trading) Review and Tips

Eugene Yan shares his experience and insights from the OMSCS CS7646 (Machine Learning for Trading) course. He highlights the course's focus on sequential modeling and its applicability beyond financial markets, such as in healthcare. Yan details the course structure, emphasizing the eight coding assignments in Python and the importance of object-oriented programming, with grading scripts providing initial feedback. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 86mo · BLOG

Domain Randomization for Sim2Real Transfer

Domain Randomization (DR) is a technique used in robotics to bridge the gap between simulated training environments and the real world. This method involves training models across a wide variety of simulated scenarios with randomized physical parameters and visual appearances. The goal is for the trained model to generalize effectively to the real-world environment, which is assumed to be one of the many variations encountered during training. DR is particularly useful because it can require minimal or no real-world data, unlike domain adaptation methods. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 88mo · BLOG

Are Deep Neural Networks Dramatically Overfitted?

This post delves into the question of why deep neural networks, despite their numerous parameters, can generalize well to new data. It explores classic principles like Occam's Razor and the Minimum Description Length (MDL) principle, which suggest that simpler models are more likely to be correct and that learning can be viewed as data compression. The MDL principle, in particular, formalizes the idea that a good model should not only explain the data but also be concise, thereby aiding generalization. AI
RESEARCH · Eugene Yan English(EN) · 88mo · [2 sources] · BLOG

DataScience SG x ODSC Meetup - Applying ML to Healthcare

Eugene Yan presented a case study on how uCare.ai developed a machine learning system for Parkway Pantai Group, Southeast Asia's largest healthcare provider. This system estimates patient pre-admission costs, enhancing transparency and patient experience. The implementation significantly reduced prediction errors, with mean absolute error decreasing by 55% and root mean squared error by 60%. Yan emphasized that building such data products is a team effort, with machine learning comprising only about 20% of the overall work, highlighting the importance of engineering and methodology. AI

IMPACT Demonstrates practical application of ML in healthcare for cost prediction, improving patient experience and operational efficiency.
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 89mo · [2 sources] · BLOG

Generalized Visual Language Models

Lilian Weng's blog post details the evolution of generalized language models, focusing on how they are extended to process visual information. Early approaches like VisualBERT fused image patches with text tokens, using self-attention to align visual and textual data for tasks such as image captioning. More recent models like SimVLM treat encoded images as prefixes for language models, leveraging large datasets for pre-training. These methods aim to create unified models capable of understanding and generating content across both visual and textual modalities. AI
COMMENTARY · Eugene Yan English(EN) · 90mo · [2 sources] · BLOG

DataScience SG Meetup - RecSys, Beyond the Baseline

Eugene Yan shared insights from two DataScience SG meetups, one focusing on recommender systems and another on various roles within the data field. The recommender systems talk explored baseline approaches and novel graph and NLP techniques, detailing the end-to-end process from data acquisition to result comparison. The panel discussion on data roles highlighted essential skills like logical thinking and programming, emphasizing the importance of curiosity, persistence, and humility for career success. Both events underscored the necessity of continuous self-learning in the rapidly advancing data industry. AI
RESEARCH · OpenAI News English(EN) · 91mo · [1085 sources] · HNLOBSTERSMASTOBLOGREDDIT

Better language models and their implications

Google DeepMind has introduced the FACTS Benchmark Suite, a new set of evaluations designed to systematically measure the factuality of large language models across various use cases. This suite includes benchmarks for parametric knowledge, search-based information retrieval, and multimodal understanding, alongside an updated grounding benchmark. The initiative aims to provide a more comprehensive understanding of LLM factuality and drive industry-wide improvements in accuracy and trustworthiness. AI

IMPACT Provides new evaluation tools to drive progress in LLM factuality and reduce hallucinations.
RESEARCH · Practical AI English(EN) · 92mo · [14 sources] · BLOG

The mathematics of machine learning

Eugene Yan's series of articles explores practical aspects of applying machine learning in real-world systems. He emphasizes starting projects with heuristics before implementing ML, the importance of design patterns for efficient data processing and system maintenance, and the need for careful problem selection based on cost-benefit analysis. Yan also details common challenges encountered after deploying ML models, such as data contamination and feedback loops, and suggests strategies for effective project management and system upkeep. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 93mo · [2 sources] · BLOG

Flow-based Deep Generative Models

GFlowState is a new visual analytics system designed to improve the interpretability of Generative Flow Networks (GFlowNets), a probabilistic framework used for generating samples proportional to a reward function. The system offers multiple visualization tools, such as trajectory analysis and state projections, to help developers understand how these models explore the sample space and evolve their sampling probabilities during training. By making the structural dynamics of GFlowNets observable, GFlowState aims to accelerate their development and debugging across various application domains. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 95mo · BLOG

From Autoencoder to Beta-VAE

This article provides a detailed explanation of autoencoders, a type of neural network used for unsupervised learning to reconstruct high-dimensional data. Autoencoders consist of an encoder that compresses input into a low-dimensional latent code and a decoder that reconstructs the original data from this code. A key variant, the Denoising Autoencoder, improves robustness by training the model to recover the original input from a corrupted version, forcing it to learn underlying data relationships. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 97mo · BLOG

Attention? Attention!

This 2018 blog post by Lilian Weng explains the concept of attention mechanisms in deep learning, drawing parallels to human visual and linguistic attention. It details how attention allows models to weigh the importance of different input elements when generating an output, addressing limitations of traditional sequence-to-sequence models that struggled with long inputs. The post highlights that attention was initially developed to improve neural machine translation by creating direct connections between the output and the entire input sequence. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 102mo · [33 sources] · BLOG

The Multi-Armed Bandit Problem and Its Solutions

Several recent arXiv papers explore advancements in multi-armed bandit problems, a framework for sequential decision-making under uncertainty. Research includes handling changing action availability with "Flickering Multi-Armed Bandits" and improving regret bounds in logistic bandits without strict context diversity assumptions. Other work focuses on geometry-aware offline-to-online learning, spectral bandits for smooth functions on graphs, and privacy-preserving algorithms for generalized linear contextual bandits. AI

IMPACT Advances in bandit algorithms could lead to more efficient online learning systems and improved decision-making in recommendation, advertising, and resource allocation.
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 104mo · [6 sources] · BLOG

Object Detection Part 4: Fast Detection Models

Two new research papers propose novel approaches to object detection. VFM4SDG aims to improve single-domain generalized object detection by using a frozen vision foundation model to maintain cross-domain stability, addressing issues with weather and illumination changes. UHR-DETR tackles the challenge of detecting small objects in ultra-high-resolution remote sensing imagery by efficiently allocating computational resources and integrating global and local scene information. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 105mo · [13 sources] · BLOG

Learning with not Enough Data Part 3: Data Generation

Google Research has introduced "Nested Learning," a novel machine learning paradigm designed to address the challenge of catastrophic forgetting in continual learning. This approach views models as interconnected optimization problems, allowing them to acquire new knowledge without losing proficiency on previous tasks. A proof-of-concept architecture named "Hope" has demonstrated superior performance in language modeling and long-context memory management using this paradigm. OpenAI has also published research on meta-learning algorithms, including Reptile, which focuses on learning how to learn efficiently for new tasks, and a hierarchical reinforcement learning algorithm that enables faster task completion by breaking down complex problems into high-level actions. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 105mo · [9 sources] · BLOG

Learning Word Embedding

Hugging Face has released a suite of tools and guides for training and fine-tuning various types of sentence embedding and reranker models. These resources leverage the Sentence Transformers library, offering methods for static embeddings, multimodal embeddings, and sparse embeddings. The guides cover training with up to 1 billion training pairs and achieving significant speedups, aiming to make advanced embedding model development more accessible. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 107mo · BLOG

From GAN to WGAN

This article explains the mathematical underpinnings of Generative Adversarial Networks (GANs), a type of generative model inspired by game theory. It details the roles of the generator and discriminator models, which compete to improve each other's performance. The post also discusses challenges in training GANs, such as instability, and introduces variations like Wasserstein GAN (WGAN) designed to address these issues by modifying the loss function. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 107mo · BLOG

How to Explain the Prediction of a Machine Learning Model?

Lilian Weng's blog post delves into the critical need for machine learning model interpretability, especially as AI systems are increasingly deployed in sensitive sectors like finance, healthcare, and criminal justice. The post highlights how regulatory requirements and the inherent 'black-box' nature of deep learning models necessitate methods to understand their decision-making processes. Weng discusses the properties of interpretable models and explores interpretation techniques for classic models such as linear regression and Naive Bayes, while also acknowledging the ongoing development of new tools for more complex models. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 108mo · [2 sources] · BLOG

Predict Stock Prices Using RNN: Part 2

Lilian Weng's blog posts detail the construction of a recurrent neural network (RNN) using TensorFlow for stock price prediction. The first part focuses on building a basic RNN with LSTM cells to predict S&P 500 closing prices using historical data from Yahoo! Finance. The second part extends this model to handle multiple stocks by incorporating stock symbol embeddings as input, allowing the network to differentiate patterns across various price sequences. AI
RESEARCH · OpenAI News English(EN) · 109mo · [2 sources] · BLOG

Learning from human preferences

OpenAI and DeepMind have developed a new algorithm that learns desired behaviors from human feedback, reducing the need for explicit goal functions. This method uses a three-step cycle where humans compare two agent behaviors, allowing the AI to infer the reward function and improve its performance. The approach has shown promising sample efficiency, requiring minimal human input to learn complex tasks like a backflip, and has achieved strong results in simulated robotics and Atari games, sometimes surpassing performance with standard reward functions. However, the system can be susceptible to agents that trick human evaluators, a problem being addressed with additional visual cues. AI
RESEARCH · Lil'Log (Lilian Weng) English(EN) · 112mo · [2 sources] · BLOG

Evolution Strategies

OpenAI researchers have found that evolution strategies (ES), a decades-old optimization technique, can rival the performance of modern reinforcement learning (RL) methods on benchmarks like Atari and MuJoCo. ES offers advantages such as simpler implementation without backpropagation, easier scalability in distributed settings, and better handling of sparse rewards. This approach trains agents significantly faster than traditional RL, with one experiment reducing training time for a humanoid walker from 10 hours to 10 minutes. AI
RESEARCH · OpenAI News English(EN) · 113mo · [32 sources] · BLOG

Transfer of adversarial robustness between perturbation types

OpenAI researchers are exploring the transferability of adversarial robustness across different types of perturbations in neural networks. Their findings indicate that robustness against one perturbation type does not always guarantee robustness against others and can sometimes be detrimental. They recommend evaluating adversarial defenses using a diverse range of perturbation types and sizes to ensure comprehensive security. Additionally, OpenAI is investigating adversarial examples as a concrete AI safety problem, noting their potential to cause significant issues, such as tricking autonomous vehicles. AI

IMPACT Highlights the ongoing challenges in securing AI systems against sophisticated adversarial attacks, necessitating robust evaluation and defense strategies.
TOOL · Eugene Yan English(EN) · 114mo · BLOG

Image search is now live!

Eugene Yan has developed a reverse image search engine, allowing users to find similar products by uploading an image. The tool, built using neural networks to generate image features and calculate similarities, was initially available as an API but has since been discontinued due to cloud costs. Yan detailed the process, including data acquisition, feature generation with models like VGG16, and challenges in efficient similarity calculation and serving images. He noted that the system works best with product images on white backgrounds and is part of a larger series on building a product classification API. AI