Announcing the BAIR Open Research Commons


The University of California Berkeley Artificial Intelligence Research (BAIR) Lab is pleased to announce the BAIR Open Research Commons, a new industrial affiliate program launched to accelerate cutting-edge AI research. AI research is advancing rapidly in both university and corporate research settings, with existing collaborations already underway driven by individual researcher-to-researcher collaborations. The BAIR Commons is designed to enhance and streamline such collaborative cutting-edge research by students, faculty, and corporate research scholars.

The Commons agreement has been framed with the goal of promoting open research in AI: all on-campus effort, data, and results in the Commons program will be non-exclusive with open publication and open-source code release expected. Fostering an environment for excellence for graduate student research is the primary motivation of the new program: Berkeley students will lead the design of projects in the Commons, and the program of research must be approved by their home departments before a project commences. Students are expected to benefit from collaboration with leading researchers in industrial research labs, as well as the availability of partner resources useful to investigate certain open questions in state-of-the-art AI research. The University will benefit from membership fees paid by partners to participate in the program. The Commons agreement provides for collaborative joint projects between the partners and Berkeley, with intellectual property shared jointly and equally by the parties.

The agreement also provides for joint research “lablets”, which will be embedded collaborative open research spaces inside BAIR’s 27,000 sq. ft. research facility opening this summer in the Berkeley Way West facility on the Berkeley campus. More than a dozen faculty and 120 students will be assigned space in the new lab, with an equal number of visiting positions allocated for researchers from other BAIR labs and for visiting industrial partners.

Initial alliance participants include Amazon, Facebook, Google, Samsung, and Wave Computing. Funding for over twenty joint projects has been committed in the initial launch of the program, which will support both BAIR facilities and research efforts. Over 30 faculty and 200 graduate students and postdocs at Berkeley are affiliated with BAIR. For more information about BAIR or the Commons program please contact

BAIR will occupy the top floor of Berkeley Way West.


Manipulation By Feel


Guiding our fingers while typing, enabling us to nimbly strike a matchstick, and inserting a key in a keyhole all rely on our sense of touch. It has been shown that the sense of touch is very important for dexterous manipulation in humans. Similarly, for many robotic manipulation tasks, vision alone may not be sufficient – often, it may be difficult to resolve subtle details such as the exact position of an edge, shear forces or surface textures at points of contact, and robotic arms and fingers can block the line of sight between a camera and its quarry. Augmenting robots with this crucial sense, however, remains a challenging task.

Our goal is to provide a framework for learning how to perform tactile servoing, which means precisely relocating an object based on tactile information. To provide our robot with tactile feedback, we utilize a custom-built tactile sensor, based on similar principles as the GelSight sensor developed at MIT. The sensor is composed of a deformable, elastomer-based gel, backlit by three colored LEDs, and provides high-resolution RGB images of contact at the gel surface. Compared to other sensors, this tactile sensor sensor naturally provides geometric information in the form of rich visual information from which attributes such as force can be inferred. Previous work using similar sensors has leveraged the this kind of tactile sensor on tasks such as learning how to grasp, improving success rates when grasping a variety of objects.


Assessing Generalization in Deep Reinforcement Learning



We present a benchmark for studying generalization in deep reinforcement learning (RL). Systematic empirical evaluation shows that vanilla deep RL algorithms generalize better than specialized deep RL algorithms designed specifically for generalization. In other words, simply training on varied environments is so far the most effective strategy for generalization. The code can be found at and the full paper is at


Controlling False Discoveries in Large-Scale Experimentation: Challenges and Solutions


“Scientific research has changed the world. Now it needs to change itself.”

- The Economist, 2013

There has been a growing concern about the validity of scientific findings. A multitude of journals, papers and reports have recognized the ever smaller number of replicable scientific studies. In 2016, one of the giants of scientific publishing, Nature, surveyed about 1,500 researchers across many different disciplines, asking for their stand on the status of reproducibility in their area of research. One of the many takeaways to the worrisome results of this survey is the following: 90% of the respondents agreed that there is a reproducibility crisis, and the overall top answer to boosting reproducibility was “better understanding of statistics”. Indeed, many factors contributing to the explosion of irreproducible research stem from the neglect of the fact that statistics is no longer as static as it was in the first half of the 20th century, when statistical hypothesis testing came into prominence as a theoretically rigorous proposal for making valid discoveries with high confidence.


Learning Preferences by Looking at the World


It would be great if we could all have household robots do our chores for us. Chores are tasks that we want done to make our houses cater more to our preferences; they are a way in which we want our house to be different from the way it currently is. However, most “different” states are not very desirable:

Surely our robot wouldn’t be so dumb as to go around breaking stuff when we ask it to clean our house? Unfortunately, AI systems trained with reinforcement learning only optimize features specified in the reward function and are indifferent to anything we might’ve inadvertently left out. Generally, it is easy to get the reward wrong by forgetting to include preferences for things that should stay the same, since we are so used to having these preferences satisfied, and there are so many of them. Consider the room below, and imagine that we want a robot waiter that serves people at the dining table efficiently. We might implement this using a reward function that provides 1 reward whenever the robot serves a dish, and use discounting so that the robot is incentivized to be efficient. What could go wrong with such a reward function? How would we need to modify the reward function to take this into account? Take a minute to think about it.


Soft Actor Critic—Deep Reinforcement Learning with Real-World Robots


We are announcing the release of our state-of-the-art off-policy model-free reinforcement learning algorithm, soft actor-critic (SAC). This algorithm has been developed jointly at UC Berkeley and Google, and we have been using it internally for our robotics experiment. Soft actor-critic is, to our knowledge, one of the most efficient model-free algorithms available today, making it especially well-suited for real-world robotic learning. In this post, we will benchmark SAC against state-of-the-art model-free RL algorithms and showcase a spectrum of real-world robot examples, ranging from manipulation to locomotion. We also release our implementation of SAC, which is particularly designed for real-world robotic systems.


Scaling Multi-Agent Reinforcement Learning


An earlier version of this post is on the RISELab blog. It is posted here with the permission of the authors.

We just rolled out general support for multi-agent reinforcement learning in Ray RLlib 0.6.0. This blog post is a brief tutorial on multi-agent RL and how we designed for it in RLlib. Our goal is to enable multi-agent RL across a range of use cases, from leveraging existing single-agent algorithms to training with custom algorithms at large scale.


Building Gene Expression Atlases with Deep Generative Models for Single-cell Transcriptomics


Figure: An artistic representation of single-cell RNA sequencing. The stars in the sky represent cells in a heterogeneous tissue. The projection of the stars onto the river reveals relationships among them that are not apparent by looking directly at the sky. Like the river, our Bayesian model, called scVI, reveals relationships among cells.

The diversity of gene regulatory states in our body is one of the main reasons why such an amazing array of biological functions can be encoded in a single genome. Recent advances in microfluidics and sequencing technologies (such as inDrops) enabled measurement of gene expression at the single-cell level and has provided tremendous opportunities to unravel the underlying mechanisms of relationships between individual genes and specific biological phenomena. These experiments yield approximate measurements for mRNA counts of the entire transcriptome (i.e around $d = 20,000$ protein-coding genes) and a large number of cells $n$, which can vary from tens of thousands to a million cells. The early computational methods to interpret this data relied on linear model and empirical Bayes shrinkage approaches due to initially extremely low sample-size. While current research focuses on providing more accurate models for this gene expression data, most of the subsequent algorithms either exhibit prohibitive scalability issues or remain limited to a unique downstream analysis task. Consequently, common practices in the field still rely on ad-hoc preprocessing pipelines and specific algorithmic procedures, which limits the capabilities of capturing the underlying data generating process.

In this post, we propose to build up on the increased sample-size and recent developments in Bayesian approximate inference to improve modeling complexity as well as algorithmic scalability. Notably, we present our recent work on deep generative models for single-cell transcriptomics, which addresses all the mentioned limitations by formalizing biological questions into statistical queries over a unique graphical model, tailored to single-cell RNA sequencing (scRNA-seq) datasets. The resulting algorithmic inference procedure, which we named Single-cell Variational Inference (scVI), is open-source and scales to over a million cells.


Visual Model-Based Reinforcement Learning as a Path towards Generalist Robots


With very little explicit supervision and feedback, humans are able to learn a wide range of motor skills by simply interacting with and observing the world through their senses. While there has been significant progress towards building machines that can learn complex skills and learn based on raw sensory information such as image pixels, acquiring large and diverse repertoires of general skills remains an open challenge. Our goal is to build a generalist: a robot that can perform many different tasks, like arranging objects, picking up toys, and folding towels, and can do so with many different objects in the real world without re-learning for each object or task. While these basic motor skills are much simpler and less impressive than mastering Chess or even using a spatula, we think that being able to achieve such generality with a single model is a fundamental aspect of intelligence.

The key to acquiring generality is diversity. If you deploy a learning algorithm in a narrow, closed-world environment, the agent will recover skills that are successful only in a narrow range of settings. That’s why an algorithm trained to play Breakout will struggle when anything about the images or the game changes. Indeed, the success of image classifiers relies on large, diverse datasets like ImageNet. However, having a robot autonomously learn from large and diverse datasets is quite challenging. While collecting diverse sensory data is relatively straightforward, it is simply not practical for a person to annotate all of the robot’s experiences. It is more scalable to collect completely unlabeled experiences. Then, given only sensory data, akin to what humans have, what can you learn? With raw sensory data there is no notion of progress, reward, or success. Unlike games like Breakout, the real world doesn’t give us a score or extra lives.

We have developed an algorithm that can learn a general-purpose predictive model using unlabeled sensory experiences, and then use this single model to perform a wide range of tasks.

With a single model, our approach can perform a wide range of tasks, including lifting objects, folding shorts, placing an apple onto a plate, rearranging objects, and covering a fork with a towel.

In this post, we will describe how this works. We will discuss how we can learn based on only raw sensory interaction data (i.e. image pixels, without requiring object detectors or hand-engineered perception components). We will show how we can use what was learned to accomplish many different user-specified tasks. And, we will demonstrate how this approach can control a real robot from raw pixels, performing tasks and interacting with objects that the robot has never seen before.


Physics-Based Learned Design: Teaching a Microscope How to Image


Figure 1: (left) LED Array Microscope constructed using a standard commercial microscope and an LED array. (middle) Close up on the LED array dome mounted on the microscope. (right) LED array displaying patterns at 100Hz.

Computational imaging systems marry the design of hardware and image reconstruction. For example, in optical microscopy, tomographic, super-resolution, and phase imaging systems can be constructed from simple hardware modifications to a commercial microscope (Fig. 1) and computational reconstruction. Traditionally, we require a large number of measurements to recover the above quantities; however, for live cell imaging applications, we are limited in the number of measurements we can acquire due to motion. Naturally, we want to know what are the best measurements to acquire. In this post, we highlight our latest work that learns the experimental design to maximize the performance of a non-linear computational imaging system.