Large Scale Training at BAIR with Ray Tune


In this blog post, we share our experiences in developing two critical software libraries that many BAIR researchers use to execute large-scale AI experiments: Ray Tune and the Ray Cluster Launcher, both of which now back many popular open-source AI libraries.

As AI research becomes more compute intensive, many AI researchers have become squeezed for time and resources. Many researchers now rely on cloud providers like Amazon Web Services or Google Compute Platform to access the huge amounts of computational resources necessary for training large models.


Emergent Behavior by Minimizing Chaos


All living organisms carve out environmental niches within which they can maintain relative predictability amidst the ever-increasing entropy around them (1), (2). Humans, for example, go to great lengths to shield themselves from surprise — we band together in millions to build cities with homes, supplying water, food, gas, and electricity to control the deterioration of our bodies and living spaces amidst heat and cold, wind and storm. The need to discover and maintain such surprise-free equilibria has driven great resourcefulness and skill in organisms across very diverse natural habitats. Motivated by this, we ask: could the motive of preserving order amidst chaos guide the automatic acquisition of useful behaviors in artificial agents?


What is My Data Worth?


People give massive amounts of their personal data to companies every day and these data are used to generate tremendous business values. Some economists and politicians argue that people should be paid for their contributions—but the million-dollar question is: by how much?

This article discusses methods proposed in our recent AISTATS and VLDB papers that attempt to answer this question in the machine learning context. This is joint work with David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Nick Hynes, Bo Li, Ce Zhang, Costas J. Spanos, and Dawn Song, as well as a collaborative effort between UC Berkeley, ETH Zurich, and UIUC. More information about the work in our group can be found here.


Learning to Imitate Human Demonstrations via CycleGAN


This work presents AVID, a method that allows a robot to learn a task, such as making coffee, directly by watching a human perform the task.

One of the most important markers of intelligence is the ability to learn by watching others. Humans are particularly good at this, often being able to learn tasks by observing other humans. This is possible because we are not simply copying the actions that other humans take. Rather, we first imagine ourselves performing the task, and this provides a starting point for further practicing the task in the real world.

Robots are not yet adept at learning by watching humans or other robots. Prior methods for imitation learning, where robots learn from demonstrations of the task, typically assume that the demonstrations can be given directly through the robot, using techniques such as kinesthetic teaching or teleoperation. This assumption limits the applicability of robots in the real world, where robots may be frequently asked to learn new tasks quickly and without programmers, trained roboticists, or specialized hardware setups. Can we instead have robots learn directly from a video of a human demonstration?


Model-Based Reinforcement Learning:
Theory and Practice


Reinforcement learning systems can make decisions in one of two ways. In the model-based approach, a system uses a predictive model of the world to ask questions of the form “what will happen if I do x?” to choose the best x1. In the alternative model-free approach, the modeling step is bypassed altogether in favor of learning a control policy directly. Although in practice the line between these two techniques can become blurred, as a coarse guide it is useful for dividing up the space of algorithmic possibilities.

Predictive models can be used to ask “what if?” questions to guide future decisions.

The natural question to ask after making this distinction is whether to use such a predictive model. The field has grappled with this question for quite a while, and is unlikely to reach a consensus any time soon. However, we have learned enough about designing model-based algorithms that it is possible to draw some general conclusions about best practices and common pitfalls. In this post, we will survey various realizations of model-based reinforcement learning methods. We will then describe some of the tradeoffs that come into play when using a learned predictive model for training a policy and how these considerations motivate a simple but effective strategy for model-based reinforcement learning. The latter half of this post is based on our recent paper on model-based policy optimization, for which code is available here.


Data-Driven Deep Reinforcement Learning


One of the primary factors behind the success of machine learning approaches in open world settings, such as image recognition and natural language processing, has been the ability of high-capacity deep neural network function approximators to learn generalizable models from large amounts of data. Deep reinforcement learning methods, however, require active online data collection, where the model actively interacts with its environment. This makes such methods hard to scale to complex real-world problems, where active data collection means that large datasets of experience must be collected for every experiment – this can be expensive and, for systems such as autonomous vehicles or robots, potentially unsafe. In a number of domains of practical interest, such as autonomous driving, robotics, and games, there exist plentiful amounts of previously collected interaction data which, consists of informative behaviours that are a rich source of prior information. Deep RL algorithms that can utilize such prior datasets will not only scale to real-world problems, but will also lead to solutions that generalize substantially better. A data-driven paradigm for reinforcement learning will enable us to pre-train and deploy agents capable of sample-efficient learning in the real-world.

In this work, we ask the following question: Can deep RL algorithms effectively leverage prior collected offline data and learn without interaction with the environment? We refer to this problem statement as fully off-policy RL, previously also called batch RL in literature. A class of deep RL algorithms, known as off-policy RL algorithms can, in principle, learn from previously collected data. Recent off-policy RL algorithms such as Soft Actor-Critic (SAC), QT-Opt, and Rainbow, have demonstrated sample-efficient performance in a number of challenging domains such as robotic manipulation and atari games. However, all of these methods still require online data collection, and their ability to learn from fully off-policy data is limited in practice. In this work, we show why existing deep RL algorithms can fail in the fully off-policy setting. We then propose effective solutions to mitigate these issues.


RoboNet: A Dataset for Large-Scale Multi-Robot Learning


This post is cross-listed at the SAIL Blog and the CMU ML blog.

In the last decade, we’ve seen learning-based systems provide transformative solutions for a wide range of perception and reasoning problems, from recognizing objects in images to recognizing and translating human speech. Recent progress in deep reinforcement learning (i.e. integrating deep neural networks into reinforcement learning systems) suggests that the same kind of success could be realized in automated decision making domains. If fruitful, this line of work could allow learning-based systems to tackle active control tasks, such as robotics and autonomous driving, alongside the passive perception tasks to which they have already been successfully applied.

While deep reinforcement learning methods - like Soft Actor Critic - can learn impressive motor skills, they are challenging to train on large and broad data that is not from the target environment. In contrast, the success of deep networks in fields like computer vision was arguably predicated just as much on large datasets, such as ImageNet, as it was on large neural network architectures. This suggests that applying data-driven methods to robotics will require not just the development of strong reinforcement learning methods, but also access to large and diverse datasets for robotics. Not only can large datasets enable models that generalize effectively, but they can also be used to pre-train models that can then be adapted to more specialized tasks using much more modest datasets. Indeed, “ImageNet pre-training” has become a default approach for tackling diverse tasks with small or medium datasets - like 3D building reconstruction. Can the same kind of approach be adopted to enable broad generalization and transfer in active control domains, such as robotics?


Prof. Anca Dragan Talks About Human-Robot Interaction for WIRED


Prof. Anca Dragan gave a talk as part of the WIRED25 summit, explaining some of the challenges robots face when interacting with people. First, robots that share space with people, from autonomous cars to quadrotors to indoor mobile robots, need to anticipate what people plan on doing and make sure they can stay out of the way. This is already hard, because robots are not mind readers, and yet they need access to a rough simulator of us, humans, that they can use to help them decide how to act. The bar gets raised when it’s crowded, because then robots have to also understand how they can influence the actions that people take, like getting another driver to slow down and make space for a merging autonomous car. And what if the person decides to accelerate instead? Find out about the ways in which robots can negotiate these situations in the video below.


Can We Learn the Language of Proteins?


The incredible success of BERT in Natural Language Processing (NLP) showed that large models trained on unlabeled data are able to learn powerful representations of language. These representations have been shown to encode information about syntax and semantics. In this blog post we ask the question: Can similar methods be applied to biological sequences, specifically proteins? If so, to what degree do they improve performance on protein prediction problems that are relevant to biologists?

We discuss our recent work on TAPE: Tasks Assessing Protein Embeddings (preprint) (github), a benchmarking suite for protein representations learned by various neural architectures and self-supervised losses. We also discuss the challenges that proteins present to the ML community, previously described by xkcd:


Look then Listen: Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following


When learning to follow natural language instructions, neural networks tend to be very data hungry – they require a huge number of examples pairing language with actions in order to learn effectively. This post is about reducing those heavy data requirements by first watching actions in the environment before moving on to learning from language data. Inspired by the idea that it is easier to map language to meanings that have already been formed, we introduce a semi-supervised approach that aims to separate the formation of abstractions from the learning of language. Empirically, we find that pre-learning of patterns in the environment can help us learn grounded language with much less data.