Virtual Personas for Language Models via an Anthology of Backstories
We introduce Anthology, a method for conditioning LLMs to representative, consistent, and diverse virtual personas by generating and utilizing naturalistic backstories with rich details of individual values and experience. What does it mean for large...Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination
Sample language model responses to different varieties of English and native speaker reactions. ChatGPT does amazingly well at communicating with people in English. But whose English? Only 15% of ChatGPT users are from the US,...How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark
When we began studying jailbreak evaluations, we found a fascinating paper claiming that you could jailbreak frontier LLMs simply by translating forbidden prompts into obscure languages. Excited by this result, we attempted to reproduce it...Are We Ready for Multi-Image Reasoning? Launching VHs: The Visual Haystacks Benchmark!
Humans excel at processing vast arrays of visual information, a skill that is crucial for achieving artificial general intelligence (AGI). Over the decades, AI researchers have developed Visual Question Answering (VQA) systems to interpret scenes...TinyAgent: Function Calling at the Edge
The ability of LLMs to execute commands through plain language (e.g. English) has enabled agentic systems that can complete a user query by orchestrating the right set of tools (e.g. ToolFormer, Gorilla). This, along with...Modeling Extremely Large Images with xT
As computer vision researchers, we believe that every pixel can tell a story. However, there seems to be a writer’s block settling into the field when it comes to dealing with large images. Large images...2024 BAIR Graduate Directory
Every year, the Berkeley Artificial Intelligence Research (BAIR) Lab graduates some of the most talented and innovative minds in artificial intelligence and machine learning. Our Ph.D. graduates have each expanded the frontiers of AI research...The Shift from Models to Compound AI Systems
AI caught everyone’s attention in 2023 with Large Language Models (LLMs) that can be instructed to perform general tasks, such as translation or coding, just by prompting. This naturally led to an intense focus on...Ghostbuster: Detecting Text Ghostwritten by Large Language Models
The structure of Ghostbuster, our new state-of-the-art method for detecting AI-generated text. Large language models like ChatGPT write impressively well—so well, in fact, that they’ve become a problem. Students have begun using these models to...Asymmetric Certified Robustness via Feature-Convex Neural Networks
Asymmetric Certified Robustness via Feature-Convex Neural Networks TLDR: We propose the asymmetric certified robustness problem, which requires certified robustness for only one class and reflects real-world adversarial scenarios. This focused setting allows us to introduce...Goal Representations for Instruction Following
Goal Representations for Instruction Following A longstanding goal of the field of robot learning has been to create generalist agents that can perform tasks for humans. Natural language has the potential to be an easy-to-use...Rethinking the Role of PPO in RLHF
Rethinking the Role of PPO in RLHF TL;DR: In RLHF, there’s tension between the reward learning phase, which uses human preference in the form of comparisons, and the RL fine-tuning phase, which optimizes a single,...
Training Diffusion Models with
Reinforcement Learning
Training Diffusion Models with Reinforcement Learning replay Diffusion models have recently emerged as the de facto standard for generating complex, high-dimensional outputs. You may know them for their ability to produce stunning AI art and...
On the Stepwise Nature of
Self-Supervised Learning
Figure 1: stepwise behavior in self-supervised learning. When training common SSL algorithms, we find that the loss descends in a stepwise fashion (top left) and the learned embeddings iteratively increase in dimensionality (bottom left). Direct... Generating 3D Molecular Conformers via Equivariant Coarse-Graining and Aggregated Attention
Figure 1: CoarsenConf architecture. Molecular conformer generation is a fundamental task in computational chemistry. The objective is to predict stable low-energy 3D molecular structures, known as conformers, given the 2D molecule. Accurate molecular conformations are...GPT-4 + Stable-Diffusion = ?: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
TL;DR: Text Prompt -> LLM -> Intermediate Representation (such as an image layout) -> Stable Diffusion -> Image. Recent advancements in text-to-image generation with diffusion models have yielded remarkable results synthesizing highly realistic and diverse...Interactive Fleet Learning
Figure 1: “Interactive Fleet Learning” (IFL) refers to robot fleets in industry and academia that fall back on human teleoperators when necessary and continually learn from them over time. In the last few years we...Koala: A Dialogue Model for Academic Research
In this post, we introduce Koala, a chatbot trained by fine-tuning Meta’s LLaMA on dialogue data gathered from the web. We describe the dataset curation and training process of our model, and also present the...Fully Autonomous Real-World Reinforcement Learning with Applications to Mobile Manipulation
Reinforcement learning provides a conceptual framework for autonomous agents to learn from experience, analogously to how one might train a pet with treats. But practical applications of reinforcement learning are often far from natural: instead...Keeping Learning-Based Control Safe by Regulating Distributional Shift
To regulate the distribution shift experience by learning-based controllers, we seek a mechanism for constraining the agent to regions of high data density throughout its trajectory (left). Here, we present an approach which achieves this...Reverse engineering the NTK: towards first-principles architecture design
Deep neural networks have enabled technological wonders ranging from voice recognition to machine transition to protein engineering, but their design and application is nonetheless notoriously unprincipled. The development of tools and methods to guide this...Why do Policy Gradient Methods work so well in Cooperative MARL? Evidence from Policy Representation
In cooperative multi-agent reinforcement learning (MARL), due to its on-policy nature, policy gradient (PG) methods are typically believed to be less sample efficient than value decomposition (VD) methods, which are off-policy. However, some recent empirical...FIGS: Attaining XGBoost-level performance with the interpretability and speed of CART
FIGS (Fast Interpretable Greedy-tree Sums): A method for building interpretable models by simultaneously growing an ensemble of decision trees in competition with one another. Recent machine-learning advances have led to increasingly complex predictive models, often...The Berkeley Crossword Solver
We recently published the Berkeley Crossword Solver (BCS), the current state of the art for solving American-style crossword puzzles. The BCS combines neural question answering and probabilistic inference to achieve near-perfect performance on most American-style...Rethinking Human-in-the-Loop for Artificial Augmented Intelligence
Figure 1: In real-world applications, we think there exist a human-machine loop where humans and machines are mutually augmenting each other. We call it Artificial Augmented Intelligence. How do we build and evaluate an AI...Designing Societally Beneficial Reinforcement Learning Systems
Deep reinforcement learning (DRL) is transitioning from a research field focused on game playing to a technology with real-world applications. Notable examples include DeepMind’s work on controlling a nuclear reactor or on improving Youtube video...Should I Use Offline RL or Imitation Learning?
Figure 1: Summary of our recommendations for when a practitioner should BC and various imitation learning style methods, and when they should use offline RL approaches. Offline reinforcement learning allows learning policies from previously collected...Offline RL Made Easier: No TD Learning, Advantage Reweighting, or Transformers
A demonstration of the RvS policy we learn with just supervised learning and a depth-two MLP. It uses no TD learning, advantage reweighting, or Transformers! Offline reinforcement learning (RL) is conventionally approached using value-based methods...Accelerating Ukraine Intelligence Analysis with Computer Vision on Synthetic Aperture Radar Imagery
Figure 1: Airmass measurements (clouds) over Ukraine from February 18, 2022 - March 01, 2022 from the SEVIRI instrument. Data accessed via the EUMETSAT Viewer. Satellite imagery is a critical source of information during the...All You Need is LUV: Unsupervised Collection of Labeled Images Using UV-Fluorescent Markings
Large-scale semantic image annotation is a significant challenge for learning-based perception systems in robotics. Supervised learning requires labeled data, and a common approach is for humans to hand-label images with segmentation masks, keypoints, and class...Unsupervised Skill Discovery with Contrastive Intrinsic Control
Unsupervised Reinforcement Learning (RL), where RL agents pre-train with self-supervised rewards, is an emerging paradigm for developing RL agents that are capable of generalization. Recently, we released the Unsupervised RL Benchmark (URLB) which we covered...imodels: leveraging the unreasonable effectiveness of rules
imodels: A python package with cutting-edge techniques for concise, transparent, and accurate predictive modeling. All sklearn-compatible and easy to use. Recent machine-learning advances have led to increasingly complex predictive models, often at the cost of...The Unsupervised Reinforcement Learning Benchmark
The shortcomings of supervised RL Reinforcement Learning (RL) is a powerful paradigm for solving many problems of interest in AI, such as controlling autonomous vehicles, digital assistants, and resource allocation to name a few. We’ve...
Sequence Modeling Solutions
for Reinforcement Learning Problems
Sequence Modeling Solutions for Reinforcement Learning Problems Long-horizon predictions of (top) the Trajectory Transformer compared to those of (bottom) a single-step dynamics model. Modern machine learning success stories often have one thing in common: they... Which Mutual Information Representation Learning Objectives are Sufficient for Control?
Processing raw sensory inputs is crucial for applying deep RL algorithms to real-world problems. For example, autonomous vehicles must make decisions about how to drive safely given information flowing from cameras, radar, and microphones about...Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets
Fig. 1: The BRIDGE dataset contains 7200 demonstrations of kitchen-themed manipulation tasks across 71 tasks in 10 domains. Note that any GIF compression artifacts in this animation are not present in the dataset itself. When...How should we compare neural network representations?
Cross-posted from Bounded Regret. To understand neural networks, researchers often use similarity metrics to measure how similar or different two neural networks are to each other. For instance, they are used to compare vision transformers...Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability
Many experimental works have observed that generalization in deep RL appears to be difficult: although RL agents can learn to perform very complex tasks, they don’t seem to generalize over diverse task distributions as well...RECON: Learning to Explore the Real World with a Ground Robot
An example of our method deployed on a Clearpath Jackal ground robot (left) exploring a suburban environment to find a visual target (inset). (Right) Egocentric observations of the robot. Imagine you’re in an unfamiliar neighborhood...Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability
Many experimental works have observed that generalization in deep RL appears to be difficult: although RL agents can learn to perform very complex tasks, they don’t seem to generalize over diverse task distributions as well...Designs from Data: Offline Black-Box Optimization via Conservative Training
Figure 1: Offline Model-Based Optimization (MBO): The goal of offline MBO is to optimize an unknown objective function $f(x)$ with respect to $x$, provided access to only as static, previously-collected dataset of designs. Machine learning...
A First-Principles Theory of Neural
Network Generalization
Fig 1. Measures of generalization performance for neural networks trained on four different boolean functions (colors) with varying training set size. For both MSE (left) and learnability (right), theoretical predictions (curves) closely match true performance... Making RL Tractable by Learning More Informative Reward Functions: Example-Based Control, Meta-Learning, and Normalized Maximum Likelihood
Diagram of MURAL, our method for learning uncertainty-aware rewards for RL. After the user provides a few examples of desired outcomes, MURAL automatically infers a reward function that takes into account these examples and the...Updates and Lessons from AI Forecasting
Cross-posted from Bounded Regret. Earlier this year, my research group commissioned 6 questions for professional forecasters to predict about AI. Broadly speaking, 2 were on geopolitical aspects of AI and 4 were on future capabilities:...PICO: Pragmatic Compression for Human-in-the-Loop Decision-Making
Fig. 1: Given the original image $\mathbf{x}$, we would like to generate a compressed image $\hat{\mathbf{x}}$ such that the user's action $\mathbf{a}$ upon seeing the compressed image is similar to what it would have been...Unsolved ML Safety Problems
Along with researchers from Google Brain and OpenAI, we are releasing a paper on Unsolved Problems in ML Safety. Due to emerging safety challenges in ML, such as those introduced by recent large-scale models, we...Distilling neural networks into wavelet models using interpretations
Fig 1. A wavelet adapting to new data. Recent deep neural networks (DNNs) often predict extremely well, but sacrifice interpretability and computational efficiency. Interpretability is crucial in many disciplines, such as science and medicine, where...What Can I Do Here? Learning New Skills by Imagining Visual Affordances
How do humans become so skillful? Well, initially we are not, but from infancy, we discover and practice increasingly complex skills through self-supervised play. But this play is not random - the child development literature...Universal Weakly Supervised Segmentation by Pixel-to-Segment Contrastive Learning
We consider a problem: Can a machine learn from a few labeled pixels to predict every pixel in a new image? This task is extremely challenging (see Fig. 1) as a single body part could...The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games
Recent years have demonstrated the potential of deep multi-agent reinforcement learning (MARL) to train groups of AI agents that can collaborate to solve complex tasks - for instance, AlphaStar achieved professional-level performance in the Starcraft...
BASALT: A Benchmark for
Learning from Human Feedback
TL;DR: We are launching a NeurIPS competition and benchmark called BASALT: a set of Minecraft environments and a human evaluation protocol that we hope will stimulate research and investigation into solving tasks with no pre-specified... Learning What To Do by Simulating the Past
Reinforcement learning (RL) has been used successfully for solving tasks which have a well defined reward function – think AlphaZero for Go, OpenAI Five for Dota, or AlphaStar for StarCraft. However, in many practical situations...An EPIC way to evaluate reward functions
Cross-posted from the DeepMind Safety blog. In many reinforcement learning problems the objective is too complex to be specified procedurally, and a reward function must instead be learned from user data. However, how can you...The Importance of Hyperparameter Optimization for Model-based Reinforcement Learning
Model-based reinforcement learning (MBRL) is a variant of the iterative learning framework, reinforcement learning, that includes a structured component of the system that is solely optimized to model the environment dynamics. Learning a model is...Pretrained Transformers as Universal Computation Engines
Transformers have been successfully applied to a wide variety of modalities: natural language, vision, protein modeling, music, robotics, and more. A common trend with using large models is to train a transformer on a large...Maximum Entropy RL (Provably) Solves Some Robust RL Problems
Nearly all real-world applications of reinforcement learning involve some degree of shift between the training environment and the testing environment. However, prior work has observed that even small shifts in the environment cause most RL...Self-Supervised Policy Adaptation during Deployment
Our method learns a task in a fixed, simulated environment and quickly adapts to new environments (e.g. the real world) solely from online interaction during deployment. The ability for humans to generalize their knowledge and...
The Successor Representation, $\gamma$-Models,
and Infinite-Horizon Prediction
The Successor Representation, Gamma-Models, and Infinite-Horizon Prediction Standard single-step models have a horizon of one. This post describes a method for training predictive dynamics models in continuous state spaces with an infinite, probabilistic horizon. Reinforcement... Example Post Title
This is a template for BAIR blog posts. Here is an example image. Figure title. Figure caption. This image is centered and set to 50% page width. The content here after the excerpt separator will...Does GPT-2 Know Your Phone Number?
Most likely not. Yet, OpenAI’s GPT-2 language model does know how to reach a certain Peter W--- (name redacted for privacy). When prompted with a short snippet of Internet text, the model accurately generates Peter’s...Offline Reinforcement Learning: How Conservative Algorithms Can Enable New Applications
Deep reinforcement learning has made significant progress in the last few years, with success stories in robotic control, game playing and science problems. While RL methods present a general paradigm where an agent learns from...Learning State Abstractions for Long-Horizon Planning
Many tasks that we do on a regular basis, such as navigating a city, cooking a meal, or loading a dishwasher, require planning over extended periods of time. Accomplishing these tasks may seem simple to...EvolveGraph: Dynamic Neural Relational Reasoning for Interacting Systems
Multi-agent interacting systems are prevalent in the world, from purely physical systems to complicated social dynamic systems. The interactions between entities / components can give rise to very complex behavior patterns at the level of...Training on Test Inputs with Amortized Conditional Normalized Maximum Likelihood
Current machine learning methods provide unprecedented accuracy across a range of domains, from computer vision to natural language processing. However, in many important high-stakes applications, such as medical diagnosis or autonomous driving, rare mistakes can...Goodhart’s Law, Diversity and a Series of Seemingly Unrelated Toy Problems
Goodhart’s Law is an adage which states the following: “When a measure becomes a target, it ceases to be a good measure.” This is particularly pertinent in machine learning, where the source of many of...Adapting on the Fly to Test Time Distribution Shift
Imagine that you are building the next generation machine learning model for handwriting transcription. Based on previous iterations of your product, you have identified a key challenge for this rollout: after deployment, new end users...Reinforcement learning is supervised learning on optimized data
The two most common perspectives on Reinforcement learning (RL) are optimization and dynamic programming. Methods that compute the gradients of the non-differentiable expected reward objective, such as the REINFORCE trick are commonly grouped into the...Plan2Explore: Active Model-Building for Self-Supervised Visual Reinforcement Learning
This post is cross-listed on the CMU ML blog. To operate successfully in unstructured open-world environments, autonomous intelligent agents need to solve many different tasks and learn new tasks quickly. Reinforcement learning has enabled artificial...AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
Our method learns complex behaviors by training offline from prior datasets (expert demonstrations, data from previous experiments, or random exploration data) and then fine-tuning quickly with online interaction. Robots trained with reinforcement learning (RL) have...
AI Will Change the World.
Who Will Change AI?
We Will.
Editor’s Note: The following blog is a special guest post by a recent graduate of Berkeley BAIR’s AI4ALL summer program for high school students. AI4ALL is a nonprofit dedicated to increasing diversity and inclusion in... Estimating the fatality rate is difficult but doable with better data
The case fatality rate quantifies how dangerous COVID-19 is, and how risk of death varies with strata like geography, age, and race. Current estimates of the COVID-19 case fatality rate (CFR) are biased for dozens...Exploring Exploration: Comparing Children with RL Agents in Unified Environments
Despite recent advances in artificial intelligence (AI) research, human children are still by far the best learners we know of, learning impressive skills like language and high-level reasoning from very little data. Children’s learning is...Can RL From Pixels be as Efficient as RL From State?
A remarkable characteristic of human intelligence is our ability to learn tasks quickly. Most humans can learn reasonably complex skills like tool-use and gameplay within just a few hours, and understand the basics after only...
Decentralized Reinforcement Learning:
Global Decision-Making via
Local Economic Transactions
Many neural network architectures that underlie various artificial intelligence systems today bear an interesting similarity to the early computers a century ago. Just as early computers were specialized circuits for specific purposes like solving linear... D4RL: Building Better Benchmarks for Offline Reinforcement Learning
In the last decade, one of the biggest drivers for success in machine learning has arguably been the rise of high-capacity models such as neural networks along with large datasets such as ImageNet to produce...Open Compound Domain Adaptation
The World is Continuously Varying Imagine we want to train a self-driving car in New York so that we can take it all the way to Seattle without tediously driving it for over 48 hours....OmniTact: A Multi-Directional High-Resolution Touch Sensor
Human thumb next to our OmniTact sensor, and a US penny for scale. Touch has been shown to be important for dexterous manipulation in robotics. Recently, the GelSight sensor has caught significant interest for learning-based...Four Novel Approaches to Manipulating Fabric using Model-Free and Model-Based Deep Learning in Simulation
Humans manipulate 2D deformable structures such as fabric on a daily basis, from putting on clothes to making beds. Can robots learn to perform similar tasks? Successful approaches can advance applications such as dressing assistance...Unsupervised Meta-Learning: Learning to Learn without Supervision
This post is cross-listed on the CMU ML blog. The history of machine learning has largely been a story of increasing abstraction. In the dawn of ML, researchers spent considerable effort engineering features. As deep...The Ingredients of Real World Robotic Reinforcement Learning
Robots have been useful in environments that can be carefully controlled, such as those commonly found in industrial settings (e.g. assembly lines). However, in unstructured settings like the home, we need robotic systems that are...Making Decision Trees Accurate Again: Explaining What Explainable AI Did Not
The interpretability of neural networks is becoming increasingly necessary, as deep learning is being adopted in settings where accurate and justifiable predictions are required. These applications range from finance to medical imaging. However, deep neural...Robots Learning to Move like Animals
Quadruped robot learning locomotion skills by imitating a dog. Whether it’s a dog chasing after a ball, or a monkey swinging through the trees, animals can effortlessly perform an incredibly rich repertoire of agile locomotion...Physically Realistic Attacks on Deep Reinforcement Learning
Deep reinforcement learning (RL) has achieved superhuman performance in problems ranging from data center cooling to video games. RL policies may soon be widely deployed, with research underway in autonomous driving, negotiation and automated trading....Does On-Policy Data Collection Fix Errors in Off-Policy Reinforcement Learning?
Reinforcement learning has seen a great deal of success in solving complex decision making problems ranging from robotics to games to supply chain management to recommender systems. Despite their success, deep reinforcement learning algorithms can...
BADGR:
The Berkeley Autonomous Driving Ground Robot
Look at the images above. If I asked you to bring me a picnic blanket in the grassy field, would you be able to? Of course. If I asked you to bring over a cart... Speeding Up Transformer Training and Inference By Increasing Model Size
Model Training Can Be Slow In deep learning, using more compute (e.g., increasing model size, dataset size, or training steps) often leads to higher accuracy. This is especially true given the recent success of unsupervised...Large Scale Training at BAIR with Ray Tune
In this blog post, we share our experiences in developing two critical software libraries that many BAIR researchers use to execute large-scale AI experiments: Ray Tune and the Ray Cluster Launcher, both of which now...Emergent Behavior by Minimizing Chaos
All living organisms carve out environmental niches within which they can maintain relative predictability amidst the ever-increasing entropy around them (1), (2). Humans, for example, go to great lengths to shield themselves from surprise —...What is My Data Worth?
People give massive amounts of their personal data to companies every day and these data are used to generate tremendous business values. Some economists and politicians argue that people should be paid for their contributions—but...Learning to Imitate Human Demonstrations via CycleGAN
This work presents AVID, a method that allows a robot to learn a task, such as making coffee, directly by watching a human perform the task. One of the most important markers of intelligence is...
Model-Based Reinforcement Learning:
Theory and Practice
Reinforcement learning systems can make decisions in one of two ways. In the model-based approach, a system uses a predictive model of the world to ask questions of the form “what will happen if I... Data-Driven Deep Reinforcement Learning
One of the primary factors behind the success of machine learning approaches in open world settings, such as image recognition and natural language processing, has been the ability of high-capacity deep neural network function approximators...RoboNet: A Dataset for Large-Scale Multi-Robot Learning
This post is cross-listed at the SAIL Blog and the CMU ML blog. In the last decade, we’ve seen learning-based systems provide transformative solutions for a wide range of perception and reasoning problems, from recognizing...Prof. Anca Dragan Talks About Human-Robot Interaction for WIRED
Prof. Anca Dragan gave a talk as part of the WIRED25 summit, explaining some of the challenges robots face when interacting with people. First, robots that share space with people, from autonomous cars to quadrotors...Can We Learn the Language of Proteins?
The incredible success of BERT in Natural Language Processing (NLP) showed that large models trained on unlabeled data are able to learn powerful representations of language. These representations have been shown to encode information about...Look then Listen: Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following
When learning to follow natural language instructions, neural networks tend to be very data hungry – they require a huge number of examples pairing language with actions in order to learn effectively. This post is...Collaborating with Humans Requires Understanding Them
AI agents have learned to play Dota, StarCraft, and Go, by training to beat an automated system that increases in difficulty as the agent gains skill at the game: in vanilla self-play, the AI agent...Functional RL with Keras and Tensorflow Eager
In this blog post, we explore a functional paradigm for implementing reinforcement learning (RL) algorithms. The paradigm will be that developers write the numerics of their algorithm as independent, pure functions, and then use a...Deep Dynamics Models for Dexterous Manipulation
Figure 1: Our approach (PDDM) can efficiently and effectively learn complex dexterous manipulation skills in both simulation and the real world. Here, the learned model is able to control the 24-DoF Shadow Hand to rotate...Sample Efficient Evolutionary Algorithm for Analog Circuit Design
In this post, we share some recent promising results regarding the applications of Deep Learning in analog IC design. While this work targets a specific application, the proposed methods can be used in other black...rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch
UPDATE (15 Feb 2020): Documentation is now available for rlpyt! See it at rlpyt.readthedocs.io. It describes program flow, code organization, and implementation details, including class, method, and function references for all components. The code examples still introduce...A Deep Learning Approach to Data Compression
We introduce Bit-Swap, a scalable and effective lossless data compression technique based on deep learning. It extends previous work on practical compression with latent variable models, based on bits-back coding and asymmetric numeral systems. In...Evaluating and Testing Unintended Memorization in Neural Networks
It is important whenever designing new technologies to ask “how will this affect people’s privacy?” This topic is especially important with regard to machine learning, where machine learning models are often trained on sensitive user...Learning to Learn with Probabilistic Task Embeddings
To operate successfully in a complex and changing environment, learning agents must be able to acquire new skills quickly. Humans display remarkable skill in this area — we can learn to recognize a new object...1000x Faster Data Augmentation
Effect of Population Based Augmentation applied to images, which differs at different percentages into training. In this blog post we introduce Population Based Augmentation (PBA), an algorithm that quickly and efficiently learns a state-of-the-art approach...Autonomous Vehicles for Social Good: Learning to Solve Congestion
We are in the midst of an unprecedented convergence of two rapidly growing trends on our roadways: sharply increasing congestion and the deployment of autonomous vehicles. Year after year, highways get slower and slower: famously,...
End-to-End Deep Reinforcement Learning
without Reward Engineering
Communicating the goal of a task to another person is easy: we can use language, show them an image of the desired outcome, point them to a how-to video, or use some combination of all... Model-Based Reinforcement Learning from Pixels with Structured Latent Variable Models
Imagine a robot trying to learn how to stack blocks and push objects using visual inputs from a camera feed. In order to minimize cost and safety concerns, we want our robot to learn these...Large-Scale Long-Tailed Recognition in an Open World
Existing Computer Vision Setting v.s. Real-World Scenario One day, an ecologist came to us. He wanted to use modern computer vision techniques to perform automatic animal identification in his wildlife camera trap image datasets. We...Robots that Learn to Adapt
Figure 1: Our model-based meta reinforcement learning algorithm enables a legged robot to adapt online in the face of an unexpected system malfunction (note the broken front right leg). Humans have the ability to seamlessly...Robots that Learn to Use Improvised Tools
In many animals, tool-use skills emerge from a combination of observational learning and experimentation. For example, by watching one another, chimpanzees can learn how to use twigs to “fish” for insects. Similarly, capuchin monkeys demonstrate...CVPR 2019 Challenges on Domain Adaptation in Autonomous Driving
We all dream of a future in which autonomous cars can drive us to every corner of the world. Numerous researchers and companies are working day and night to chase this dream by overcoming scientific...Announcing the BAIR Open Research Commons
Last updated November 2020. The University of California Berkeley Artificial Intelligence Research (BAIR) Lab is pleased to announce the BAIR Open Research Commons, a new industrial affiliate program launched to accelerate cutting-edge AI research. AI...Manipulation By Feel
Guiding our fingers while typing, enabling us to nimbly strike a matchstick, and inserting a key in a keyhole all rely on our sense of touch. It has been shown that the sense of touch...Assessing Generalization in Deep Reinforcement Learning
TL;DR We present a benchmark for studying generalization in deep reinforcement learning (RL). Systematic empirical evaluation shows that vanilla deep RL algorithms generalize better than specialized deep RL algorithms designed specifically for generalization. In other...Controlling False Discoveries in Large-Scale Experimentation: Challenges and Solutions
“Scientific research has changed the world. Now it needs to change itself.”- The Economist, 2013 There has been a growing concern about the validity of scientific findings. A multitude of journals, papers and reports have...Learning Preferences by Looking at the World
It would be great if we could all have household robots do our chores for us. Chores are tasks that we want done to make our houses cater more to our preferences; they are a...Soft Actor Critic—Deep Reinforcement Learning with Real-World Robots
We are announcing the release of our state-of-the-art off-policy model-free reinforcement learning algorithm, soft actor-critic (SAC). This algorithm has been developed jointly at UC Berkeley and Google, and we have been using it internally for...Scaling Multi-Agent Reinforcement Learning
An earlier version of this post is on the RISELab blog. It is posted here with the permission of the authors. We just rolled out general support for multi-agent reinforcement learning in Ray RLlib 0.6.0....Building Gene Expression Atlases with Deep Generative Models for Single-cell Transcriptomics
Figure: An artistic representation of single-cell RNA sequencing. The stars in the sky represent cells in a heterogeneous tissue. The projection of the stars onto the river reveals relationships among them that are not apparent...Visual Model-Based Reinforcement Learning as a Path towards Generalist Robots
With very little explicit supervision and feedback, humans are able to learn a wide range of motor skills by simply interacting with and observing the world through their senses. While there has been significant progress...Physics-Based Learned Design: Teaching a Microscope How to Image
Figure 1: (left) LED Array Microscope constructed using a standard commercial microscope and an LED array. (middle) Close up on the LED array dome mounted on the microscope. (right) LED array displaying patterns at 100Hz....AdaSearch: A Successive Elimination Approach to Adaptive Search
In many tasks in machine learning, it is common to want to answer questions given fixed, pre-collected datasets. In some applications, however, we are not given data a priori; instead, we must collect the data...Drilling Down on Depth Sensing and Deep Learning
Top left: image of a 3D cube. Top right: example depth image, with darker points representing areas closer to the camera (source: Wikipedia). Next two rows: examples of depth and RGB image pairs for grasping...Learning Acrobatics by Watching YouTube
Simulated characters imitating skills from YouTube videos. Whether it’s everyday tasks like washing our hands or stunning feats of acrobatic prowess, humans are able to learn an incredible array of skills by watching other humans....Visual Reinforcement Learning with Imagined Goals
We want to build agents that can accomplish arbitrary goals in unstructured complex environments, such as a personal robot that can perform household chores. A promising approach is to use deep reinforcement learning, which is...Dexterous Manipulation with Reinforcement Learning: Efficient, General, and Low-Cost
In this post, we demonstrate how deep reinforcement learning (deep RL) can be used to learn how to control dexterous hands for a variety of manipulation tasks. We discuss how such methods can learn to...When Recurrent Models Don't Need to be Recurrent
An earlier version of this post was published on Off the Convex Path. It is reposted here with the author’s permission. In the last few years, deep learning practitioners have proposed a litany of different...One-Shot Imitation from Watching Videos
Learning a new skill by observing another individual, the ability to imitate, is a key part of intelligence in human and animals. Can we enable a robot to do the same, learning to manipulate a...BDD100K Blog Update
We are excited by the interest and excitement generated by our BDD100K dataset. Our data release and blog post were covered in an unsolicited article by the UC Berkeley newspaper, the Daily Cal, which was...BDD100K: A Large-scale Diverse Driving Video Database
Update 06/18/2018: please also check our follow-up blog post after reading this. TL;DR, we released the largest and most diverse driving video dataset with rich annotations called BDD100K. You can access the data for research...Delayed Impact of Fair Machine Learning
Machine learning systems trained to minimize prediction error may often exhibit discriminatory behavior based on sensitive characteristics such as race and gender. One reason could be due to historical bias in the data. In various...TDM: From Model-Free to Model-Based Deep Reinforcement Learning
You’ve decided that you want to bike from your house by UC Berkeley to the Golden Gate Bridge. It’s a nice 20 mile ride, but there’s a problem: you’ve never ridden a bike before! To...Shared Autonomy via Deep Reinforcement Learning
A blind, autonomous pilot (left), suboptimal human pilot (center), and combined human-machine team (right) play the Lunar Lander game. Imagine a drone pilot remotely flying a quadrotor, using an onboard camera to navigate and land....Towards a Virtual Stuntman
Simulated humanoid performing a variety of highly dynamic and acrobatic skills. Motion control problems have become standard benchmarks for reinforcement learning, and deep RL methods have been shown to be effective for a diverse suite...Transfer Your Font Style with GANs
Left: Given movie poster, Right: New movie title generated by MC-GAN. Text is a prominent visual element of 2D design. Artists invest significant time into designing glyphs that are visually compatible with other elements in...Learning Robot Objectives from Physical Human Interaction
Humans physically interact with each other every day – from grabbing someone’s hand when they are about to spill their drink, to giving your friend a nudge to steer them in the right direction, physical...Kernel Feature Selection via Conditional Covariance Minimization
Feature selection is a common method for dimensionality reduction that encourages model interpretability. With large data sets becoming ever more prevalent, feature selection has seen widespread usage across a variety of real-world tasks in recent...Ray: A Distributed System for AI
As machine learning algorithms and techniques have advanced, more and more machine learning applications require multiple machines and must exploit parallelism. However, the infrastructure for doing machine learning on clusters remains ad-hoc. While good solutions...Physical Adversarial Examples Against Deep Neural Networks
This post is based on recent research by Ivan Evtimov, Kevin Eykholt, Earlence Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, Dawn Song, and Florian Tramèr. Deep neural networks (DNNs) have enabled great progress...Reverse Curriculum Generation for Reinforcement Learning Agents
Reinforcement Learning (RL) is a powerful technique capable of solving complex tasks such as locomotion, Atari games, racing games, and robotic manipulation tasks, all through training an agent to optimize behaviors over a reward function....Towards Intelligent Industrial Co-robots
Democratization of Robots in Factories In modern factories, human workers and robots are two major workforces. For safety concerns, the two are normally separated with robots confined in metal cages, which limits the productivity as...FaSTrack: Ensuring Safe Real-Time Navigation of Dynamic Systems
The Problem: Fast and Safe Motion Planning Real time autonomous motion planning and navigation is hard, especially when we care about safety. This becomes even more difficult when we have systems with complicated dynamics, external...Model-based Reinforcement Learning with Neural Network Dynamics
Fig 1. A learned neural network dynamics model enables a hexapod robot to learn to run and follow desired trajectories, using just 17 minutes of real-world experience. Enabling robots to act autonomously in the real-world...The Emergence of a Fovea while Learning to Attend
Why we need Attention What we see through our eyes is only a very small part of the world around us. At any given time our eyes are sampling only a fraction of the surrounding...DART: Noise Injection for Robust Imitation Learning
Toyota HSR Trained with DART to Make a Bed. In Imitation Learning (IL), also known as Learning from Demonstration (LfD), a robot learns a control policy from analyzing demonstrations of the policy performed by an...Learning Long Duration Sequential Task Structure From Demonstrations with Application in Surgical Robotics
Deep imitation learning and deep reinforcement learning have potential to learn robot control policies that map high-dimensional sensor inputs to controls. While these approaches have been very successful at learning short duration tasks, such as...Learning Diverse Skills via Maximum Entropy Deep Reinforcement Learning
Deep reinforcement learning (deep RL) has achieved success in many tasks, such as playing video games from raw pixels (Mnih et al., 2015), playing the game of Go (Silver et al., 2016), and simulated robotic...Learning to Optimize with Reinforcement Learning
Since we posted our paper on “Learning to Optimize” last year, the area of optimizer learning has received growing attention. In this article, we provide an introduction to this line of work and share our...Learning a Multi-View Stereo Machine
Consider looking at a photograph of a chair. We humans have the remarkable capacity of inferring properties about the 3D shape of the chair from this single photograph even if we might not have seen...How to Escape Saddle Points Efficiently
This post was initially published on Off the Convex Path. It is reposted here with authors’ permission. A core, emerging problem in nonconvex optimization involves the escape of saddle points. While recent research has shown...High Quality 3D Object Reconstruction from a Single Color Image
Digitally reconstructing 3D geometry from images is a core problem in computer vision. There are various applications, such as movie productions, content generation for video games, virtual and augmented reality, 3D printing and many more....Cooperatively Learning Human Values
Be careful what you reward “Be careful what you wish for!” – we’ve all heard it! The story of King Midas is there to warn us of what might happen when we’re not. Midas, a...Captioning Novel Objects in Images
Given an image, humans can easily infer the salient entities in it, and describe the scene effectively, such as, where objects are located (in a forest or in a kitchen?), what attributes an object has...Minibatch Metropolis-Hastings
Over the last few years we have experienced an enormous data deluge, which has played a key role in the surge of interest in AI. A partial list of some large datasets: ImageNet, with over...Learning to Learn
A key aspect of intelligence is versatility – the capability of doing many different things. Current AI systems excel at mastering a single skill, such as Go, Jeopardy, or even helicopter aerobatics. But, when you...The Confluence of Geometry and Learning
Given only a single 2D image, humans are able to effortlessly infer the rich 3D structure of the underlying scene. Since inferring 3D from 2D is an ambiguous task by itself (see e.g. the left...Constrained Policy Optimization
(Based on joint work with David Held, Aviv Tamar, and Pieter Abbeel.) Deep reinforcement learning (RL) has enabled some remarkable achievements in hard control problems: with deep RL, agents have learned to play video games...Releasing the Dexterity Network (Dex-Net) 2.0 Dataset for Deep Grasping
Reliable robot grasping across many objects is challenging due to sensor noise and occlusions that lead to uncertainty about the precise shape, position, and mass of objects. The Dexterity Network (Dex-Net) 2.0 is a project...Learning to Reason with Neural Module Networks
(Joint work with Ronghang Hu, Marcus Rohrbach, Trevor Darrell, Dan Klein and Kate Saenko.) Suppose we’re building a household robot, and want it to be able to answer questions about its surroundings. We might ask...Introducing the BAIR Blog
Berkeley AI Research (BAIR) brings together researchers at UC Berkeley across the areas of computer vision, machine learning, natural language processing, planning, and robotics, and each year we publish cutting edge research across all of...
Newer
Older