The Berkeley Artificial Intelligence Research Blog

Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

Mar 25, 2025. Training Diffusion Models with Reinforcement Learning We deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone. Our goal is to tackle "stop-and-go" waves, those frustrating... Continue

Virtual Personas for Language Models via an Anthology of Backstories

Nov 12, 2024. We introduce Anthology, a method for conditioning LLMs to representative, consistent, and diverse virtual personas by generating and utilizing naturalistic backstories with rich details of individual values and experience. What does it mean for large... Continue

Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination

Sep 20, 2024. Sample language model responses to different varieties of English and native speaker reactions. ChatGPT does amazingly well at communicating with people in English. But whose English? Only 15% of ChatGPT users are from the US,... Continue

How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

Aug 28, 2024. When we began studying jailbreak evaluations, we found a fascinating paper claiming that you could jailbreak frontier LLMs simply by translating forbidden prompts into obscure languages. Excited by this result, we attempted to reproduce it... Continue

Are We Ready for Multi-Image Reasoning? Launching VHs: The Visual Haystacks Benchmark!

Jul 20, 2024. Humans excel at processing vast arrays of visual information, a skill that is crucial for achieving artificial general intelligence (AGI). Over the decades, AI researchers have developed Visual Question Answering (VQA) systems to interpret scenes... Continue

TinyAgent: Function Calling at the Edge

May 29, 2024. The ability of LLMs to execute commands through plain language (e.g. English) has enabled agentic systems that can complete a user query by orchestrating the right set of tools (e.g. ToolFormer, Gorilla). This, along with... Continue

Modeling Extremely Large Images with xT

Mar 21, 2024. As computer vision researchers, we believe that every pixel can tell a story. However, there seems to be a writer’s block settling into the field when it comes to dealing with large images. Large images... Continue

2024 BAIR Graduate Directory

Mar 11, 2024. Every year, the Berkeley Artificial Intelligence Research (BAIR) Lab graduates some of the most talented and innovative minds in artificial intelligence and machine learning. Our Ph.D. graduates have each expanded the frontiers of AI research... Continue

The Shift from Models to Compound AI Systems

Feb 18, 2024. AI caught everyone’s attention in 2023 with Large Language Models (LLMs) that can be instructed to perform general tasks, such as translation or coding, just by prompting. This naturally led to an intense focus on... Continue

Ghostbuster: Detecting Text Ghostwritten by Large Language Models

Nov 14, 2023. The structure of Ghostbuster, our new state-of-the-art method for detecting AI-generated text. Large language models like ChatGPT write impressively well—so well, in fact, that they’ve become a problem. Students have begun using these models to... Continue

Asymmetric Certified Robustness via Feature-Convex Neural Networks

Nov 14, 2023. Asymmetric Certified Robustness via Feature-Convex Neural Networks TLDR: We propose the asymmetric certified robustness problem, which requires certified robustness for only one class and reflects real-world adversarial scenarios. This focused setting allows us to introduce... Continue

Goal Representations for Instruction Following

Oct 17, 2023. Goal Representations for Instruction Following A longstanding goal of the field of robot learning has been to create generalist agents that can perform tasks for humans. Natural language has the potential to be an easy-to-use... Continue

Rethinking the Role of PPO in RLHF

Oct 16, 2023. Rethinking the Role of PPO in RLHF TL;DR: In RLHF, there’s tension between the reward learning phase, which uses human preference in the form of comparisons, and the RL fine-tuning phase, which optimizes a single,... Continue

Training Diffusion Models with
Reinforcement Learning

Jul 14, 2023. Training Diffusion Models with Reinforcement Learning replay Diffusion models have recently emerged as the de facto standard for generating complex, high-dimensional outputs. You may know them for their ability to produce stunning AI art and... Continue

On the Stepwise Nature of
Self-Supervised Learning

Jul 10, 2023. Figure 1: stepwise behavior in self-supervised learning. When training common SSL algorithms, we find that the loss descends in a stepwise fashion (top left) and the learned embeddings iteratively increase in dimensionality (bottom left). Direct... Continue

Generating 3D Molecular Conformers via Equivariant Coarse-Graining and Aggregated Attention

Jun 29, 2023. Figure 1: CoarsenConf architecture. Molecular conformer generation is a fundamental task in computational chemistry. The objective is to predict stable low-energy 3D molecular structures, known as conformers, given the 2D molecule. Accurate molecular conformations are... Continue

GPT-4 + Stable-Diffusion = ?: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models

May 23, 2023. TL;DR: Text Prompt -> LLM -> Intermediate Representation (such as an image layout) -> Stable Diffusion -> Image. Recent advancements in text-to-image generation with diffusion models have yielded remarkable results synthesizing highly realistic and diverse... Continue

Interactive Fleet Learning

Apr 6, 2023. Figure 1: “Interactive Fleet Learning” (IFL) refers to robot fleets in industry and academia that fall back on human teleoperators when necessary and continually learn from them over time. In the last few years we... Continue

Koala: A Dialogue Model for Academic Research

Apr 3, 2023. In this post, we introduce Koala, a chatbot trained by fine-tuning Meta’s LLaMA on dialogue data gathered from the web. We describe the dataset curation and training process of our model, and also present the... Continue

Fully Autonomous Real-World Reinforcement Learning with Applications to Mobile Manipulation

Jan 20, 2023. Reinforcement learning provides a conceptual framework for autonomous agents to learn from experience, analogously to how one might train a pet with treats. But practical applications of reinforcement learning are often far from natural: instead... Continue

Keeping Learning-Based Control Safe by Regulating Distributional Shift

Sep 19, 2022. To regulate the distribution shift experience by learning-based controllers, we seek a mechanism for constraining the agent to regions of high data density throughout its trajectory (left). Here, we present an approach which achieves this... Continue

Reverse engineering the NTK: towards first-principles architecture design

Aug 29, 2022. Deep neural networks have enabled technological wonders ranging from voice recognition to machine transition to protein engineering, but their design and application is nonetheless notoriously unprincipled. The development of tools and methods to guide this... Continue

Why do Policy Gradient Methods work so well in Cooperative MARL? Evidence from Policy Representation

Jul 10, 2022. In cooperative multi-agent reinforcement learning (MARL), due to its on-policy nature, policy gradient (PG) methods are typically believed to be less sample efficient than value decomposition (VD) methods, which are off-policy. However, some recent empirical... Continue

FIGS: Attaining XGBoost-level performance with the interpretability and speed of CART

Jun 30, 2022. FIGS (Fast Interpretable Greedy-tree Sums): A method for building interpretable models by simultaneously growing an ensemble of decision trees in competition with one another. Recent machine-learning advances have led to increasingly complex predictive models, often... Continue

The Berkeley Crossword Solver

May 20, 2022. We recently published the Berkeley Crossword Solver (BCS), the current state of the art for solving American-style crossword puzzles. The BCS combines neural question answering and probabilistic inference to achieve near-perfect performance on most American-style... Continue

Rethinking Human-in-the-Loop for Artificial Augmented Intelligence

May 3, 2022. Figure 1: In real-world applications, we think there exist a human-machine loop where humans and machines are mutually augmenting each other. We call it Artificial Augmented Intelligence. How do we build and evaluate an AI... Continue

Designing Societally Beneficial Reinforcement Learning Systems

Apr 29, 2022. Deep reinforcement learning (DRL) is transitioning from a research field focused on game playing to a technology with real-world applications. Notable examples include DeepMind’s work on controlling a nuclear reactor or on improving Youtube video... Continue

Should I Use Offline RL or Imitation Learning?

Apr 25, 2022. Figure 1: Summary of our recommendations for when a practitioner should BC and various imitation learning style methods, and when they should use offline RL approaches. Offline reinforcement learning allows learning policies from previously collected... Continue

Offline RL Made Easier: No TD Learning, Advantage Reweighting, or Transformers

Apr 20, 2022. A demonstration of the RvS policy we learn with just supervised learning and a depth-two MLP. It uses no TD learning, advantage reweighting, or Transformers! Offline reinforcement learning (RL) is conventionally approached using value-based methods... Continue

Accelerating Ukraine Intelligence Analysis with Computer Vision on Synthetic Aperture Radar Imagery

Mar 21, 2022. Figure 1: Airmass measurements (clouds) over Ukraine from February 18, 2022 - March 01, 2022 from the SEVIRI instrument. Data accessed via the EUMETSAT Viewer. Satellite imagery is a critical source of information during the... Continue

All You Need is LUV: Unsupervised Collection of Labeled Images Using UV-Fluorescent Markings

Feb 23, 2022. Large-scale semantic image annotation is a significant challenge for learning-based perception systems in robotics. Supervised learning requires labeled data, and a common approach is for humans to hand-label images with segmentation masks, keypoints, and class... Continue

Unsupervised Skill Discovery with Contrastive Intrinsic Control

Feb 23, 2022. Unsupervised Reinforcement Learning (RL), where RL agents pre-train with self-supervised rewards, is an emerging paradigm for developing RL agents that are capable of generalization. Recently, we released the Unsupervised RL Benchmark (URLB) which we covered... Continue

imodels: leveraging the unreasonable effectiveness of rules

Feb 2, 2022. imodels: A python package with cutting-edge techniques for concise, transparent, and accurate predictive modeling. All sklearn-compatible and easy to use. Recent machine-learning advances have led to increasingly complex predictive models, often at the cost of... Continue

The Unsupervised Reinforcement Learning Benchmark

Dec 15, 2021. The shortcomings of supervised RL Reinforcement Learning (RL) is a powerful paradigm for solving many problems of interest in AI, such as controlling autonomous vehicles, digital assistants, and resource allocation to name a few. We’ve... Continue

Sequence Modeling Solutions
for Reinforcement Learning Problems

Nov 19, 2021. Sequence Modeling Solutions for Reinforcement Learning Problems Long-horizon predictions of (top) the Trajectory Transformer compared to those of (bottom) a single-step dynamics model. Modern machine learning success stories often have one thing in common: they... Continue

Which Mutual Information Representation Learning Objectives are Sufficient for Control?

Nov 19, 2021. Processing raw sensory inputs is crucial for applying deep RL algorithms to real-world problems. For example, autonomous vehicles must make decisions about how to drive safely given information flowing from cameras, radar, and microphones about... Continue

Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets

Nov 18, 2021. Fig. 1: The BRIDGE dataset contains 7200 demonstrations of kitchen-themed manipulation tasks across 71 tasks in 10 domains. Note that any GIF compression artifacts in this animation are not present in the dataset itself. When... Continue

How should we compare neural network representations?

Nov 8, 2021. Cross-posted from Bounded Regret. To understand neural networks, researchers often use similarity metrics to measure how similar or different two neural networks are to each other. For instance, they are used to compare vision transformers... Continue

Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability

Nov 5, 2021. Many experimental works have observed that generalization in deep RL appears to be difficult: although RL agents can learn to perform very complex tasks, they don’t seem to generalize over diverse task distributions as well... Continue

RECON: Learning to Explore the Real World with a Ground Robot

Nov 3, 2021. An example of our method deployed on a Clearpath Jackal ground robot (left) exploring a suburban environment to find a visual target (inset). (Right) Egocentric observations of the robot. Imagine you’re in an unfamiliar neighborhood... Continue

Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability

Nov 1, 2021. Many experimental works have observed that generalization in deep RL appears to be difficult: although RL agents can learn to perform very complex tasks, they don’t seem to generalize over diverse task distributions as well... Continue

Designs from Data: Offline Black-Box Optimization via Conservative Training

Oct 25, 2021. Figure 1: Offline Model-Based Optimization (MBO): The goal of offline MBO is to optimize an unknown objective function $f(x)$ with respect to $x$, provided access to only as static, previously-collected dataset of designs. Machine learning... Continue

A First-Principles Theory of Neural
Network Generalization

Oct 25, 2021. Fig 1. Measures of generalization performance for neural networks trained on four different boolean functions (colors) with varying training set size. For both MSE (left) and learnability (right), theoretical predictions (curves) closely match true performance... Continue

Making RL Tractable by Learning More Informative Reward Functions: Example-Based Control, Meta-Learning, and Normalized Maximum Likelihood

Oct 22, 2021. Diagram of MURAL, our method for learning uncertainty-aware rewards for RL. After the user provides a few examples of desired outcomes, MURAL automatically infers a reward function that takes into account these examples and the... Continue

Updates and Lessons from AI Forecasting

Oct 14, 2021. Cross-posted from Bounded Regret. Earlier this year, my research group commissioned 6 questions for professional forecasters to predict about AI. Broadly speaking, 2 were on geopolitical aspects of AI and 4 were on future capabilities:... Continue

PICO: Pragmatic Compression for Human-in-the-Loop Decision-Making

Oct 6, 2021. Fig. 1: Given the original image $\mathbf{x}$, we would like to generate a compressed image $\hat{\mathbf{x}}$ such that the user's action $\mathbf{a}$ upon seeing the compressed image is similar to what it would have been... Continue

Unsolved ML Safety Problems

Sep 29, 2021. Along with researchers from Google Brain and OpenAI, we are releasing a paper on Unsolved Problems in ML Safety. Due to emerging safety challenges in ML, such as those introduced by recent large-scale models, we... Continue

Distilling neural networks into wavelet models using interpretations

Sep 28, 2021. Fig 1. A wavelet adapting to new data. Recent deep neural networks (DNNs) often predict extremely well, but sacrifice interpretability and computational efficiency. Interpretability is crucial in many disciplines, such as science and medicine, where... Continue

What Can I Do Here? Learning New Skills by Imagining Visual Affordances

Sep 24, 2021. How do humans become so skillful? Well, initially we are not, but from infancy, we discover and practice increasingly complex skills through self-supervised play. But this play is not random - the child development literature... Continue

Universal Weakly Supervised Segmentation by Pixel-to-Segment Contrastive Learning

Jul 22, 2021. We consider a problem: Can a machine learn from a few labeled pixels to predict every pixel in a new image? This task is extremely challenging (see Fig. 1) as a single body part could... Continue

The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games

Jul 14, 2021. Recent years have demonstrated the potential of deep multi-agent reinforcement learning (MARL) to train groups of AI agents that can collaborate to solve complex tasks - for instance, AlphaStar achieved professional-level performance in the Starcraft... Continue

BASALT: A Benchmark for
Learning from Human Feedback

Jul 8, 2021. TL;DR: We are launching a NeurIPS competition and benchmark called BASALT: a set of Minecraft environments and a human evaluation protocol that we hope will stimulate research and investigation into solving tasks with no pre-specified... Continue

Learning What To Do by Simulating the Past

May 3, 2021. Reinforcement learning (RL) has been used successfully for solving tasks which have a well defined reward function – think AlphaZero for Go, OpenAI Five for Dota, or AlphaStar for StarCraft. However, in many practical situations... Continue

An EPIC way to evaluate reward functions

Apr 20, 2021. Cross-posted from the DeepMind Safety blog. In many reinforcement learning problems the objective is too complex to be specified procedurally, and a reward function must instead be learned from user data. However, how can you... Continue

The Importance of Hyperparameter Optimization for Model-based Reinforcement Learning

Apr 19, 2021. Model-based reinforcement learning (MBRL) is a variant of the iterative learning framework, reinforcement learning, that includes a structured component of the system that is solely optimized to model the environment dynamics. Learning a model is... Continue

Pretrained Transformers as Universal Computation Engines

Mar 23, 2021. Transformers have been successfully applied to a wide variety of modalities: natural language, vision, protein modeling, music, robotics, and more. A common trend with using large models is to train a transformer on a large... Continue

Maximum Entropy RL (Provably) Solves Some Robust RL Problems

Mar 9, 2021. Nearly all real-world applications of reinforcement learning involve some degree of shift between the training environment and the testing environment. However, prior work has observed that even small shifts in the environment cause most RL... Continue

Self-Supervised Policy Adaptation during Deployment

Feb 25, 2021. Our method learns a task in a fixed, simulated environment and quickly adapts to new environments (e.g. the real world) solely from online interaction during deployment. The ability for humans to generalize their knowledge and... Continue

The Successor Representation, $\gamma$-Models,
and Infinite-Horizon Prediction

Jan 5, 2021. The Successor Representation, Gamma-Models, and Infinite-Horizon Prediction Standard single-step models have a horizon of one. This post describes a method for training predictive dynamics models in continuous state spaces with an infinite, probabilistic horizon. Reinforcement... Continue

Example Post Title

Jan 1, 2021. This is a template for BAIR blog posts. Here is an example image. Figure title. Figure caption. This image is centered and set to 50% page width. The content here after the excerpt separator will... Continue

Does GPT-2 Know Your Phone Number?

Dec 20, 2020. Most likely not. Yet, OpenAI’s GPT-2 language model does know how to reach a certain Peter W--- (name redacted for privacy). When prompted with a short snippet of Internet text, the model accurately generates Peter’s... Continue

Offline Reinforcement Learning: How Conservative Algorithms Can Enable New Applications

Dec 7, 2020. Deep reinforcement learning has made significant progress in the last few years, with success stories in robotic control, game playing and science problems. While RL methods present a general paradigm where an agent learns from... Continue

Learning State Abstractions for Long-Horizon Planning

Nov 20, 2020. Many tasks that we do on a regular basis, such as navigating a city, cooking a meal, or loading a dishwasher, require planning over extended periods of time. Accomplishing these tasks may seem simple to... Continue

EvolveGraph: Dynamic Neural Relational Reasoning for Interacting Systems

Nov 18, 2020. Multi-agent interacting systems are prevalent in the world, from purely physical systems to complicated social dynamic systems. The interactions between entities / components can give rise to very complex behavior patterns at the level of... Continue

Training on Test Inputs with Amortized Conditional Normalized Maximum Likelihood

Nov 16, 2020. Current machine learning methods provide unprecedented accuracy across a range of domains, from computer vision to natural language processing. However, in many important high-stakes applications, such as medical diagnosis or autonomous driving, rare mistakes can... Continue

Goodhart’s Law, Diversity and a Series of Seemingly Unrelated Toy Problems

Nov 13, 2020. Goodhart’s Law is an adage which states the following: “When a measure becomes a target, it ceases to be a good measure.” This is particularly pertinent in machine learning, where the source of many of... Continue

Adapting on the Fly to Test Time Distribution Shift

Nov 5, 2020. Imagine that you are building the next generation machine learning model for handwriting transcription. Based on previous iterations of your product, you have identified a key challenge for this rollout: after deployment, new end users... Continue

Reinforcement learning is supervised learning on optimized data

Oct 13, 2020. The two most common perspectives on Reinforcement learning (RL) are optimization and dynamic programming. Methods that compute the gradients of the non-differentiable expected reward objective, such as the REINFORCE trick are commonly grouped into the... Continue

Plan2Explore: Active Model-Building for Self-Supervised Visual Reinforcement Learning

Oct 6, 2020. This post is cross-listed on the CMU ML blog. To operate successfully in unstructured open-world environments, autonomous intelligent agents need to solve many different tasks and learn new tasks quickly. Reinforcement learning has enabled artificial... Continue

AWAC: Accelerating Online Reinforcement Learning with Offline Datasets

Sep 10, 2020. Our method learns complex behaviors by training offline from prior datasets (expert demonstrations, data from previous experiments, or random exploration data) and then fine-tuning quickly with online interaction. Robots trained with reinforcement learning (RL) have... Continue

AI Will Change the World.
Who Will Change AI?
We Will.

Aug 16, 2020. Editor’s Note: The following blog is a special guest post by a recent graduate of Berkeley BAIR’s AI4ALL summer program for high school students. AI4ALL is a nonprofit dedicated to increasing diversity and inclusion in... Continue

Estimating the fatality rate is difficult but doable with better data

Aug 3, 2020. The case fatality rate quantifies how dangerous COVID-19 is, and how risk of death varies with strata like geography, age, and race. Current estimates of the COVID-19 case fatality rate (CFR) are biased for dozens... Continue

Exploring Exploration: Comparing Children with RL Agents in Unified Environments

Jul 24, 2020. Despite recent advances in artificial intelligence (AI) research, human children are still by far the best learners we know of, learning impressive skills like language and high-level reasoning from very little data. Children’s learning is... Continue

Can RL From Pixels be as Efficient as RL From State?

Jul 19, 2020. A remarkable characteristic of human intelligence is our ability to learn tasks quickly. Most humans can learn reasonably complex skills like tool-use and gameplay within just a few hours, and understand the basics after only... Continue

Decentralized Reinforcement Learning:
Global Decision-Making via
Local Economic Transactions

Jul 11, 2020. Many neural network architectures that underlie various artificial intelligence systems today bear an interesting similarity to the early computers a century ago. Just as early computers were specialized circuits for specific purposes like solving linear... Continue

D4RL: Building Better Benchmarks for Offline Reinforcement Learning

Jun 25, 2020. In the last decade, one of the biggest drivers for success in machine learning has arguably been the rise of high-capacity models such as neural networks along with large datasets such as ImageNet to produce... Continue

Open Compound Domain Adaptation

Jun 14, 2020. The World is Continuously Varying Imagine we want to train a self-driving car in New York so that we can take it all the way to Seattle without tediously driving it for over 48 hours.... Continue

OmniTact: A Multi-Directional High-Resolution Touch Sensor

May 14, 2020. Human thumb next to our OmniTact sensor, and a US penny for scale. Touch has been shown to be important for dexterous manipulation in robotics. Recently, the GelSight sensor has caught significant interest for learning-based... Continue

Four Novel Approaches to Manipulating Fabric using Model-Free and Model-Based Deep Learning in Simulation

May 5, 2020. Humans manipulate 2D deformable structures such as fabric on a daily basis, from putting on clothes to making beds. Can robots learn to perform similar tasks? Successful approaches can advance applications such as dressing assistance... Continue

Unsupervised Meta-Learning: Learning to Learn without Supervision

May 1, 2020. This post is cross-listed on the CMU ML blog. The history of machine learning has largely been a story of increasing abstraction. In the dawn of ML, researchers spent considerable effort engineering features. As deep... Continue

The Ingredients of Real World Robotic Reinforcement Learning

Apr 27, 2020. Robots have been useful in environments that can be carefully controlled, such as those commonly found in industrial settings (e.g. assembly lines). However, in unstructured settings like the home, we need robotic systems that are... Continue

Making Decision Trees Accurate Again: Explaining What Explainable AI Did Not

Apr 23, 2020. The interpretability of neural networks is becoming increasingly necessary, as deep learning is being adopted in settings where accurate and justifiable predictions are required. These applications range from finance to medical imaging. However, deep neural... Continue

Robots Learning to Move like Animals

Apr 3, 2020. Quadruped robot learning locomotion skills by imitating a dog. Whether it’s a dog chasing after a ball, or a monkey swinging through the trees, animals can effortlessly perform an incredibly rich repertoire of agile locomotion... Continue

Physically Realistic Attacks on Deep Reinforcement Learning

Mar 27, 2020. Deep reinforcement learning (RL) has achieved superhuman performance in problems ranging from data center cooling to video games. RL policies may soon be widely deployed, with research underway in autonomous driving, negotiation and automated trading.... Continue

Does On-Policy Data Collection Fix Errors in Off-Policy Reinforcement Learning?

Mar 16, 2020. Reinforcement learning has seen a great deal of success in solving complex decision making problems ranging from robotics to games to supply chain management to recommender systems. Despite their success, deep reinforcement learning algorithms can... Continue

BADGR:
The Berkeley Autonomous Driving Ground Robot

Mar 12, 2020. Look at the images above. If I asked you to bring me a picnic blanket in the grassy field, would you be able to? Of course. If I asked you to bring over a cart... Continue

Speeding Up Transformer Training and Inference By Increasing Model Size

Mar 5, 2020. Model Training Can Be Slow In deep learning, using more compute (e.g., increasing model size, dataset size, or training steps) often leads to higher accuracy. This is especially true given the recent success of unsupervised... Continue

Large Scale Training at BAIR with Ray Tune

Jan 16, 2020. In this blog post, we share our experiences in developing two critical software libraries that many BAIR researchers use to execute large-scale AI experiments: Ray Tune and the Ray Cluster Launcher, both of which now... Continue

Emergent Behavior by Minimizing Chaos

Dec 18, 2019. All living organisms carve out environmental niches within which they can maintain relative predictability amidst the ever-increasing entropy around them (1), (2). Humans, for example, go to great lengths to shield themselves from surprise —... Continue

What is My Data Worth?

Dec 16, 2019. People give massive amounts of their personal data to companies every day and these data are used to generate tremendous business values. Some economists and politicians argue that people should be paid for their contributions—but... Continue

Learning to Imitate Human Demonstrations via CycleGAN

Dec 13, 2019. This work presents AVID, a method that allows a robot to learn a task, such as making coffee, directly by watching a human perform the task. One of the most important markers of intelligence is... Continue

Model-Based Reinforcement Learning:
Theory and Practice

Dec 12, 2019. Reinforcement learning systems can make decisions in one of two ways. In the model-based approach, a system uses a predictive model of the world to ask questions of the form “what will happen if I... Continue

Data-Driven Deep Reinforcement Learning

Dec 5, 2019. One of the primary factors behind the success of machine learning approaches in open world settings, such as image recognition and natural language processing, has been the ability of high-capacity deep neural network function approximators... Continue

RoboNet: A Dataset for Large-Scale Multi-Robot Learning

Nov 26, 2019. This post is cross-listed at the SAIL Blog and the CMU ML blog. In the last decade, we’ve seen learning-based systems provide transformative solutions for a wide range of perception and reasoning problems, from recognizing... Continue

Prof. Anca Dragan Talks About Human-Robot Interaction for WIRED

Nov 22, 2019. Prof. Anca Dragan gave a talk as part of the WIRED25 summit, explaining some of the challenges robots face when interacting with people. First, robots that share space with people, from autonomous cars to quadrotors... Continue

Can We Learn the Language of Proteins?

Nov 4, 2019. The incredible success of BERT in Natural Language Processing (NLP) showed that large models trained on unlabeled data are able to learn powerful representations of language. These representations have been shown to encode information about... Continue

Look then Listen: Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following

Oct 28, 2019. When learning to follow natural language instructions, neural networks tend to be very data hungry – they require a huge number of examples pairing language with actions in order to learn effectively. This post is... Continue

Collaborating with Humans Requires Understanding Them

Oct 21, 2019. AI agents have learned to play Dota, StarCraft, and Go, by training to beat an automated system that increases in difficulty as the agent gains skill at the game: in vanilla self-play, the AI agent... Continue

Functional RL with Keras and Tensorflow Eager

Oct 14, 2019. In this blog post, we explore a functional paradigm for implementing reinforcement learning (RL) algorithms. The paradigm will be that developers write the numerics of their algorithm as independent, pure functions, and then use a... Continue

Deep Dynamics Models for Dexterous Manipulation

Sep 30, 2019. Figure 1: Our approach (PDDM) can efficiently and effectively learn complex dexterous manipulation skills in both simulation and the real world. Here, the learned model is able to control the 24-DoF Shadow Hand to rotate... Continue

Sample Efficient Evolutionary Algorithm for Analog Circuit Design

Sep 26, 2019. In this post, we share some recent promising results regarding the applications of Deep Learning in analog IC design. While this work targets a specific application, the proposed methods can be used in other black... Continue

rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch

Sep 24, 2019. UPDATE (15 Feb 2020): Documentation is now available for rlpyt! See it at rlpyt.readthedocs.io. It describes program flow, code organization, and implementation details, including class, method, and function references for all components. The code examples still introduce... Continue

A Deep Learning Approach to Data Compression

Sep 19, 2019. We introduce Bit-Swap, a scalable and effective lossless data compression technique based on deep learning. It extends previous work on practical compression with latent variable models, based on bits-back coding and asymmetric numeral systems. In... Continue

Evaluating and Testing Unintended Memorization in Neural Networks

Aug 13, 2019. It is important whenever designing new technologies to ask “how will this affect people’s privacy?” This topic is especially important with regard to machine learning, where machine learning models are often trained on sensitive user... Continue

Learning to Learn with Probabilistic Task Embeddings

Jun 10, 2019. To operate successfully in a complex and changing environment, learning agents must be able to acquire new skills quickly. Humans display remarkable skill in this area — we can learn to recognize a new object... Continue

1000x Faster Data Augmentation

Jun 7, 2019. Effect of Population Based Augmentation applied to images, which differs at different percentages into training. In this blog post we introduce Population Based Augmentation (PBA), an algorithm that quickly and efficiently learns a state-of-the-art approach... Continue

Autonomous Vehicles for Social Good: Learning to Solve Congestion

Jun 3, 2019. We are in the midst of an unprecedented convergence of two rapidly growing trends on our roadways: sharply increasing congestion and the deployment of autonomous vehicles. Year after year, highways get slower and slower: famously,... Continue

End-to-End Deep Reinforcement Learning
without Reward Engineering

May 28, 2019. Communicating the goal of a task to another person is easy: we can use language, show them an image of the desired outcome, point them to a how-to video, or use some combination of all... Continue

Model-Based Reinforcement Learning from Pixels with Structured Latent Variable Models

May 20, 2019. Imagine a robot trying to learn how to stack blocks and push objects using visual inputs from a camera feed. In order to minimize cost and safety concerns, we want our robot to learn these... Continue

Large-Scale Long-Tailed Recognition in an Open World

May 13, 2019. Existing Computer Vision Setting v.s. Real-World Scenario One day, an ecologist came to us. He wanted to use modern computer vision techniques to perform automatic animal identification in his wildlife camera trap image datasets. We... Continue

Robots that Learn to Adapt

May 6, 2019. Figure 1: Our model-based meta reinforcement learning algorithm enables a legged robot to adapt online in the face of an unexpected system malfunction (note the broken front right leg). Humans have the ability to seamlessly... Continue

Robots that Learn to Use Improvised Tools

Apr 11, 2019. In many animals, tool-use skills emerge from a combination of observational learning and experimentation. For example, by watching one another, chimpanzees can learn how to use twigs to “fish” for insects. Similarly, capuchin monkeys demonstrate... Continue

CVPR 2019 Challenges on Domain Adaptation in Autonomous Driving

Mar 27, 2019. We all dream of a future in which autonomous cars can drive us to every corner of the world. Numerous researchers and companies are working day and night to chase this dream by overcoming scientific... Continue

Announcing the BAIR Open Research Commons

Mar 24, 2019. Last updated November 2020. The University of California Berkeley Artificial Intelligence Research (BAIR) Lab is pleased to announce the BAIR Open Research Commons, a new industrial affiliate program launched to accelerate cutting-edge AI research. AI... Continue

Manipulation By Feel

Mar 21, 2019. Guiding our fingers while typing, enabling us to nimbly strike a matchstick, and inserting a key in a keyhole all rely on our sense of touch. It has been shown that the sense of touch... Continue

Assessing Generalization in Deep Reinforcement Learning

Mar 18, 2019. TL;DR We present a benchmark for studying generalization in deep reinforcement learning (RL). Systematic empirical evaluation shows that vanilla deep RL algorithms generalize better than specialized deep RL algorithms designed specifically for generalization. In other... Continue

Controlling False Discoveries in Large-Scale Experimentation: Challenges and Solutions

Feb 15, 2019. “Scientific research has changed the world. Now it needs to change itself.”- The Economist, 2013 There has been a growing concern about the validity of scientific findings. A multitude of journals, papers and reports have... Continue

Learning Preferences by Looking at the World

Feb 11, 2019. It would be great if we could all have household robots do our chores for us. Chores are tasks that we want done to make our houses cater more to our preferences; they are a... Continue

Soft Actor Critic—Deep Reinforcement Learning with Real-World Robots

Dec 14, 2018. We are announcing the release of our state-of-the-art off-policy model-free reinforcement learning algorithm, soft actor-critic (SAC). This algorithm has been developed jointly at UC Berkeley and Google, and we have been using it internally for... Continue

Scaling Multi-Agent Reinforcement Learning

Dec 12, 2018. An earlier version of this post is on the RISELab blog. It is posted here with the permission of the authors. We just rolled out general support for multi-agent reinforcement learning in Ray RLlib 0.6.0.... Continue

Building Gene Expression Atlases with Deep Generative Models for Single-cell Transcriptomics

Dec 5, 2018. Figure: An artistic representation of single-cell RNA sequencing. The stars in the sky represent cells in a heterogeneous tissue. The projection of the stars onto the river reveals relationships among them that are not apparent... Continue

Visual Model-Based Reinforcement Learning as a Path towards Generalist Robots

Nov 30, 2018. With very little explicit supervision and feedback, humans are able to learn a wide range of motor skills by simply interacting with and observing the world through their senses. While there has been significant progress... Continue

Physics-Based Learned Design: Teaching a Microscope How to Image

Nov 26, 2018. Figure 1: (left) LED Array Microscope constructed using a standard commercial microscope and an LED array. (middle) Close up on the LED array dome mounted on the microscope. (right) LED array displaying patterns at 100Hz.... Continue

AdaSearch: A Successive Elimination Approach to Adaptive Search

Nov 14, 2018. In many tasks in machine learning, it is common to want to answer questions given fixed, pre-collected datasets. In some applications, however, we are not given data a priori; instead, we must collect the data... Continue

Drilling Down on Depth Sensing and Deep Learning

Oct 23, 2018. Top left: image of a 3D cube. Top right: example depth image, with darker points representing areas closer to the camera (source: Wikipedia). Next two rows: examples of depth and RGB image pairs for grasping... Continue

Learning Acrobatics by Watching YouTube

Oct 9, 2018. Simulated characters imitating skills from YouTube videos. Whether it’s everyday tasks like washing our hands or stunning feats of acrobatic prowess, humans are able to learn an incredible array of skills by watching other humans.... Continue

Visual Reinforcement Learning with Imagined Goals

Sep 6, 2018. We want to build agents that can accomplish arbitrary goals in unstructured complex environments, such as a personal robot that can perform household chores. A promising approach is to use deep reinforcement learning, which is... Continue

Dexterous Manipulation with Reinforcement Learning: Efficient, General, and Low-Cost

Aug 31, 2018. In this post, we demonstrate how deep reinforcement learning (deep RL) can be used to learn how to control dexterous hands for a variety of manipulation tasks. We discuss how such methods can learn to... Continue

When Recurrent Models Don't Need to be Recurrent

Aug 6, 2018. An earlier version of this post was published on Off the Convex Path. It is reposted here with the author’s permission. In the last few years, deep learning practitioners have proposed a litany of different... Continue

One-Shot Imitation from Watching Videos

Jun 28, 2018. Learning a new skill by observing another individual, the ability to imitate, is a key part of intelligence in human and animals. Can we enable a robot to do the same, learning to manipulate a... Continue

BDD100K Blog Update

Jun 18, 2018. We are excited by the interest and excitement generated by our BDD100K dataset. Our data release and blog post were covered in an unsolicited article by the UC Berkeley newspaper, the Daily Cal, which was... Continue

BDD100K: A Large-scale Diverse Driving Video Database

May 30, 2018. Update 06/18/2018: please also check our follow-up blog post after reading this. TL;DR, we released the largest and most diverse driving video dataset with rich annotations called BDD100K. You can access the data for research... Continue

Delayed Impact of Fair Machine Learning

May 17, 2018. Machine learning systems trained to minimize prediction error may often exhibit discriminatory behavior based on sensitive characteristics such as race and gender. One reason could be due to historical bias in the data. In various... Continue

TDM: From Model-Free to Model-Based Deep Reinforcement Learning

Apr 26, 2018. You’ve decided that you want to bike from your house by UC Berkeley to the Golden Gate Bridge. It’s a nice 20 mile ride, but there’s a problem: you’ve never ridden a bike before! To... Continue

Shared Autonomy via Deep Reinforcement Learning

Apr 18, 2018. A blind, autonomous pilot (left), suboptimal human pilot (center), and combined human-machine team (right) play the Lunar Lander game. Imagine a drone pilot remotely flying a quadrotor, using an onboard camera to navigate and land.... Continue

Towards a Virtual Stuntman

Apr 10, 2018. Simulated humanoid performing a variety of highly dynamic and acrobatic skills. Motion control problems have become standard benchmarks for reinforcement learning, and deep RL methods have been shown to be effective for a diverse suite... Continue

Transfer Your Font Style with GANs

Mar 13, 2018. Left: Given movie poster, Right: New movie title generated by MC-GAN. Text is a prominent visual element of 2D design. Artists invest significant time into designing glyphs that are visually compatible with other elements in... Continue

Learning Robot Objectives from Physical Human Interaction

Feb 6, 2018. Humans physically interact with each other every day – from grabbing someone’s hand when they are about to spill their drink, to giving your friend a nudge to steer them in the right direction, physical... Continue

Kernel Feature Selection via Conditional Covariance Minimization

Jan 23, 2018. Feature selection is a common method for dimensionality reduction that encourages model interpretability. With large data sets becoming ever more prevalent, feature selection has seen widespread usage across a variety of real-world tasks in recent... Continue

Ray: A Distributed System for AI

Jan 9, 2018. As machine learning algorithms and techniques have advanced, more and more machine learning applications require multiple machines and must exploit parallelism. However, the infrastructure for doing machine learning on clusters remains ad-hoc. While good solutions... Continue

Physical Adversarial Examples Against Deep Neural Networks

Dec 30, 2017. This post is based on recent research by Ivan Evtimov, Kevin Eykholt, Earlence Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, Dawn Song, and Florian Tramèr. Deep neural networks (DNNs) have enabled great progress... Continue

Reverse Curriculum Generation for Reinforcement Learning Agents

Dec 20, 2017. Reinforcement Learning (RL) is a powerful technique capable of solving complex tasks such as locomotion, Atari games, racing games, and robotic manipulation tasks, all through training an agent to optimize behaviors over a reward function.... Continue

Towards Intelligent Industrial Co-robots

Dec 12, 2017. Democratization of Robots in Factories In modern factories, human workers and robots are two major workforces. For safety concerns, the two are normally separated with robots confined in metal cages, which limits the productivity as... Continue

FaSTrack: Ensuring Safe Real-Time Navigation of Dynamic Systems

Dec 5, 2017. The Problem: Fast and Safe Motion Planning Real time autonomous motion planning and navigation is hard, especially when we care about safety. This becomes even more difficult when we have systems with complicated dynamics, external... Continue

Model-based Reinforcement Learning with Neural Network Dynamics

Nov 30, 2017. Fig 1. A learned neural network dynamics model enables a hexapod robot to learn to run and follow desired trajectories, using just 17 minutes of real-world experience. Enabling robots to act autonomously in the real-world... Continue

The Emergence of a Fovea while Learning to Attend

Nov 9, 2017. Why we need Attention What we see through our eyes is only a very small part of the world around us. At any given time our eyes are sampling only a fraction of the surrounding... Continue

DART: Noise Injection for Robust Imitation Learning

Oct 26, 2017. Toyota HSR Trained with DART to Make a Bed. In Imitation Learning (IL), also known as Learning from Demonstration (LfD), a robot learns a control policy from analyzing demonstrations of the policy performed by an... Continue

Learning Long Duration Sequential Task Structure From Demonstrations with Application in Surgical Robotics

Oct 17, 2017. Deep imitation learning and deep reinforcement learning have potential to learn robot control policies that map high-dimensional sensor inputs to controls. While these approaches have been very successful at learning short duration tasks, such as... Continue

Learning Diverse Skills via Maximum Entropy Deep Reinforcement Learning

Oct 6, 2017. Deep reinforcement learning (deep RL) has achieved success in many tasks, such as playing video games from raw pixels (Mnih et al., 2015), playing the game of Go (Silver et al., 2016), and simulated robotic... Continue

Learning to Optimize with Reinforcement Learning

Sep 12, 2017. Since we posted our paper on “Learning to Optimize” last year, the area of optimizer learning has received growing attention. In this article, we provide an introduction to this line of work and share our... Continue

Learning a Multi-View Stereo Machine

Sep 5, 2017. Consider looking at a photograph of a chair. We humans have the remarkable capacity of inferring properties about the 3D shape of the chair from this single photograph even if we might not have seen... Continue

How to Escape Saddle Points Efficiently

Aug 31, 2017. This post was initially published on Off the Convex Path. It is reposted here with authors’ permission. A core, emerging problem in nonconvex optimization involves the escape of saddle points. While recent research has shown... Continue

High Quality 3D Object Reconstruction from a Single Color Image

Aug 23, 2017. Digitally reconstructing 3D geometry from images is a core problem in computer vision. There are various applications, such as movie productions, content generation for video games, virtual and augmented reality, 3D printing and many more.... Continue

Cooperatively Learning Human Values

Aug 17, 2017. Be careful what you reward “Be careful what you wish for!” – we’ve all heard it! The story of King Midas is there to warn us of what might happen when we’re not. Midas, a... Continue

Captioning Novel Objects in Images

Aug 8, 2017. Given an image, humans can easily infer the salient entities in it, and describe the scene effectively, such as, where objects are located (in a forest or in a kitchen?), what attributes an object has... Continue

Minibatch Metropolis-Hastings

Aug 2, 2017. Over the last few years we have experienced an enormous data deluge, which has played a key role in the surge of interest in AI. A partial list of some large datasets: ImageNet, with over... Continue

Learning to Learn

Jul 18, 2017. A key aspect of intelligence is versatility – the capability of doing many different things. Current AI systems excel at mastering a single skill, such as Go, Jeopardy, or even helicopter aerobatics. But, when you... Continue

The Confluence of Geometry and Learning

Jul 11, 2017. Given only a single 2D image, humans are able to effortlessly infer the rich 3D structure of the underlying scene. Since inferring 3D from 2D is an ambiguous task by itself (see e.g. the left... Continue

Constrained Policy Optimization

Jul 6, 2017. (Based on joint work with David Held, Aviv Tamar, and Pieter Abbeel.) Deep reinforcement learning (RL) has enabled some remarkable achievements in hard control problems: with deep RL, agents have learned to play video games... Continue

Releasing the Dexterity Network (Dex-Net) 2.0 Dataset for Deep Grasping

Jun 27, 2017. Reliable robot grasping across many objects is challenging due to sensor noise and occlusions that lead to uncertainty about the precise shape, position, and mass of objects. The Dexterity Network (Dex-Net) 2.0 is a project... Continue

Learning to Reason with Neural Module Networks

Jun 20, 2017. (Joint work with Ronghang Hu, Marcus Rohrbach, Trevor Darrell, Dan Klein and Kate Saenko.) Suppose we’re building a household robot, and want it to be able to answer questions about its surroundings. We might ask... Continue

Introducing the BAIR Blog

Jun 20, 2017. Berkeley AI Research (BAIR) brings together researchers at UC Berkeley across the areas of computer vision, machine learning, natural language processing, planning, and robotics, and each year we publish cutting edge research across all of... Continue