Category: Reinforced Learning


Stunning AI Breakthrough Takes Us One Step Closer to the Singularity

By Hugo Angel,

As a new Nature paper points out, “There are an astonishing 10 to the
power of 170 possible board configurations in Go—more than the number of
atoms in the known universe.” (Image: DeepMind)
Remember AlphaGo, the first artificial intelligence to defeat a grandmaster at Go?
Well, the program just got a major upgrade, and it can now teach itself how to dominate the game without any human intervention. But get this: In a tournament that pitted AI against AI, this juiced-up version, called AlphaGo Zero, defeated the regular AlphaGo by a whopping 100 games to 0, signifying a major advance in the field. Hear that? It’s the technological singularity inching ever closer.A new paper published in Nature today describes how the artificially intelligent system that defeated Go grandmaster Lee Sedol in 2016 got its digital ass kicked by a new-and-improved version of itself. And it didn’t just lose by a little—it couldn’t even muster a single win after playing a hundred games. Incredibly, it took AlphaGo Zero (AGZ) just three days to train itself from scratch and acquire literally thousands of years of human Go knowledge simply by playing itself. The only input it had was what it does to the positions of the black and white pieces on the board.

  • In addition to devising completely new strategies,
  • the new system is also considerably leaner and meaner than the original AlphaGo.
Lee Sedol getting crushed by AlphaGo in 2016. (Image: AP)

Now, every once in a while the field of AI experiences a “holy shit” moment, and this would appear to be one of those moments. Looking back, other “holy shit” moments include:

This latest achievement qualifies as a “holy shit” moment for a number of reasons.

First of all, the original AlphaGo had the benefit of learning from literally thousands of previously played Go games, including those played by human amateurs and professionals. AGZ, on the other hand, received no help from its human handlers, and had access to absolutely nothing aside from the rules of the game. Using “reinforcement learning,” AGZ played itself over and over again, “starting from random play, and without any supervision or use of human data,” according to the Google-owned DeepMind researchers in their study. This allowed the system to improve and refine its digital brain, known as a neural network, as it continually learned from experience. This basically means that AlphaGo Zero was its own teacher.

This technique is more powerful than previous versions of AlphaGo because it is no longer constrained by the limits of human knowledge,” notes the DeepMind team in a release. “Instead, it is able to learn tabula rasa [from a clean slate] from the strongest player in the world: AlphaGo itself.

When playing Go, the system considers the most probable next moves (a “policy network”), and then estimates the probability of winning based on those moves (its “value network”). AGZ requires about 0.4 seconds to make these two assessments. The original AlphaGo was equipped with a pair of neural networks to make similar evaluations, but for AGZ, the Deepmind developers merged the policy and value networks into one, allowing the system to learn more efficiently. What’s more, the new system is powered by four tensor processing units (TPUS)—specialized chips for neural network training. Old AlphaGo needed 48 TPUs.

After just three days of self-play training and a total of 4.9 million games played against itself, AGZ acquired the expertise needed to trounce AlphaGo (by comparison, the original AlphaGo had 30 million games for inspiration). After 40 days of self-training, AGZ defeated another, more sophisticated version of AlphaGo called AlphaGo “Master” that defeated the world’s best Go players and the world’s top ranked Go player, Ke Jie. Earlier this year, both the original AlphaGo and AlphaGo Master won a combined 60 games against top professionals. The rise of AGZ, it would now appear, has made these previous versions obsolete.

The time when humans can have a meaningful conversation with an AI has always seemed far off and the stuff of science fiction. But for Go players, that day is here.

This is a major achievement for AI, and the subfield of reinforcement learning in particular. By teaching itself, the system matched and exceeded human knowledge by an order of magnitude in just a few days, while also developing 

  • unconventional strategies and
  • creative new moves.

For Go players, the breakthrough is as sobering as it is exciting; they’re learning things from AI that they could have never learned on their own, or would have needed an inordinate amount of time to figure out.
[AlphaGo Zero’s] games against AlphaGo Master will surely contain gems, especially because its victories seem effortless,” wrote Andy Okun and Andrew Jackson, members of the American Go Association, in a Nature News and Views article. “At each stage of the game, it seems to gain a bit here and lose a bit there, but somehow it ends up slightly ahead, as if by magic… The time when humans can have a meaningful conversation with an AI has always seemed far off and the stuff of science fiction. But for Go players, that day is here.”

No doubt, AGZ represents a disruptive advance in the world of Go, but what about its potential impact on the rest of the world? According to Nick Hynes, a grad student at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), it’ll be a while before a specialized tool like this will have an impact on our daily lives.“So far, the algorithm described only works for problems where there are a countable number of actions you can take, so it would need modification before it could be used for continuous control problems like locomotion [for instance],” Hynes told Gizmodo. “Also, it requires that you have a really good model of the environment. In this case, it literally knows all of the rules. That would be as if you had a robot for which you could exactly predict the outcomes of actions—which is impossible for real, imperfect physical systems.

The nice part, he says, is that there are several other lines of AI research that address both of these issues (e.g. machine learning, evolutionary algorithms, etc.), so it’s really just a matter of integration. “The real key here is the technique,” says Hynes.

It’s like an alien civilization inventing its own mathematics which allows it to do things like time travel…Although we’re still far from ‘The Singularity,’ we’re definitely heading in that direction.
As expected—and desired—we’re moving farther away from the classic pattern of getting a bunch of human-labeled data and training a model to imitate it,” he said. “What we’re seeing here is a model free from human bias and presuppositions: It can learn whatever it determines is optimal, which may indeed be more nuanced that our own conceptions of the same. It’s like an alien civilization inventing its own mathematics which allows it to do things like time travel,” to which he added: “Although we’re still far from ‘The Singularity,’ we’re definitely heading in that direction.


Noam Brown, a Carnegie Mellon University computer scientist who helped to develop the first AI to defeat top humans in no-limit poker, says the DeepMind researchers have achieved an impressive result, and that it could lead to bigger, better things in AI.

While the original AlphaGo managed to defeat top humans, it did so partly by relying on expert human knowledge of the game and human training data,” Brown told Gizmodo. “That led to questions of whether the techniques could extend beyond Go. AlphaGo Zero achieves even better performance without using any expert human knowledge. It seems likely that the same approach could extend to all perfect-information games [such as chess and checkers]. This is a major step toward developing general-purpose AIs.

As both Hynes and Brown admit, this latest breakthrough doesn’t mean the technological singularity—that hypothesized time in the future when greater-than-human machine intelligence achieves explosive growth—is imminent. But it should cause pause for thought. Once 

  • we teach a system the rules of a game or 
  • the constraints of a real-world problem, 

the power of reinforcement learning makes it possible to simply press the start button and let the system do the rest. It will then figure out the best ways to succeed at the task, devising solutions and strategies that are beyond human capacities, and possibly even human comprehension.

As noted, AGZ and the game of Go represent an oversimplified, constrained, and highly predictable picture of the world, but in the future, AI will be tasked with more complex challenges. Eventually, self-teaching systems will be used to solve more pressing problems, such as protein folding to conjure up new medicines and biotechnologies, figuring out ways to reduce energy consumption, or when we need to design new materials. A highly generalized self-learning system could also be tasked with improving itself, leading to artificial general intelligence (i.e. a very human-like intelligence) and even artificial superintelligence.

As the DeepMind researchers conclude in their study, “Our results comprehensively demonstrate that a pure reinforcement learning approach is fully feasible, even in the most challenging of domains: it is possible to train to superhuman level, without human examples or guidance, given no knowledge of the domain beyond basic rules.

And indeed, now that human players are no longer dominant in games like chess and Go, it can be said that we’ve already entered into the era of superintelligence. This latest breakthrough is the tiniest hint of what’s still to come.

[Nature]

ORIGINAL: Gizmodo

By George Dvorsky
2017/10/18

The Rise of Artificial Intelligence and the End of Code

By Hugo Angel,

EDWARD C. MONAGHAN
Soon We Won’t Program Computers. We’ll Train Them Like Dogs
Before the invention of the computer, most experimental psychologists thought the brain was an unknowable black box. You could analyze a subject’s behavior—ring bell, dog salivates—but thoughts, memories, emotions? That stuff was obscure and inscrutable, beyond the reach of science. So these behaviorists, as they called themselves, confined their work to the study of stimulus and response, feedback and reinforcement, bells and saliva. They gave up trying to understand the inner workings of the mind. They ruled their field for four decades.
Then, in the mid-1950s, a group of rebellious psychologists, linguists, information theorists, and early artificial-intelligence researchers came up with a different conception of the mind. People, they argued, were not just collections of conditioned responses. They absorbed information, processed it, and then acted upon it. They had systems for writing, storing, and recalling memories. They operated via a logical, formal syntax. The brain wasn’t a black box at all. It was more like a computer.
The so-called cognitive revolution started small, but as computers became standard equipment in psychology labs across the country, it gained broader acceptance. By the late 1970s, cognitive psychology had overthrown behaviorism, and with the new regime came a whole new language for talking about mental life. Psychologists began describing thoughts as programs, ordinary people talked about storing facts away in their memory banks, and business gurus fretted about the limits of mental bandwidth and processing power in the modern workplace. 
This story has repeated itself again and again. As the digital revolution wormed its way into every part of our lives, it also seeped into our language and our deep, basic theories about how things work. Technology always does this. During the Enlightenment, Newton and Descartes inspired people to think of the universe as an elaborate clock. In the industrial age, it was a machine with pistons. (Freud’s idea of psychodynamics borrowed from the thermodynamics of steam engines.) Now it’s a computer. Which is, when you think about it, a fundamentally empowering idea. Because if the world is a computer, then the world can be coded. 
Code is logical. Code is hackable. Code is destiny. These are the central tenets (and self-fulfilling prophecies) of life in the digital age. As software has eaten the world, to paraphrase venture capitalist Marc Andreessen, we have surrounded ourselves with machines that convert our actions, thoughts, and emotions into data—raw material for armies of code-wielding engineers to manipulate. We have come to see life itself as something ruled by a series of instructions that can be discovered, exploited, optimized, maybe even rewritten. Companies use code to understand our most intimate ties; Facebook’s Mark Zuckerberg has gone so far as to suggest there might be a “fundamental mathematical law underlying human relationships that governs the balance of who and what we all care about.In 2013, Craig Venter announced that, a decade after the decoding of the human genome, he had begun to write code that would allow him to create synthetic organisms. “It is becoming clear,” he said, “that all living cells that we know of on this planet are DNA-software-driven biological machines.” Even self-help literature insists that you can hack your own source code, reprogramming your love life, your sleep routine, and your spending habits.
In this world, the ability to write code has become not just a desirable skill but a language that grants insider status to those who speak it. They have access to what in a more mechanical age would have been called the levers of power. “If you control the code, you control the world,” wrote futurist Marc Goodman. (In Bloomberg Businessweek, Paul Ford was slightly more circumspect: “If coders don’t run the world, they run the things that run the world.” Tomato, tomahto.)
But whether you like this state of affairs or hate it—whether you’re a member of the coding elite or someone who barely feels competent to futz with the settings on your phone—don’t get used to it. Our machines are starting to speak a different language now, one that even the best coders can’t fully understand. 
Over the past several years, the biggest tech companies in Silicon Valley have aggressively pursued an approach to computing called machine learning. In traditional programming, an engineer writes explicit, step-by-step instructions for the computer to follow. With machine learning, programmers don’t encode computers with instructions. They train them. If you want to teach a neural network to recognize a cat, for instance, you don’t tell it to look for whiskers, ears, fur, and eyes. You simply show it thousands and thousands of photos of cats, and eventually it works things out. If it keeps misclassifying foxes as cats, you don’t rewrite the code. You just keep coaching it.
This approach is not new—it’s been around for decades—but it has recently become immensely more powerful, thanks in part to the rise of deep neural networks, massively distributed computational systems that mimic the multilayered connections of neurons in the brain. And already, whether you realize it or not, machine learning powers large swaths of our online activity. Facebook uses it to determine which stories show up in your News Feed, and Google Photos uses it to identify faces. Machine learning runs Microsoft’s Skype Translator, which converts speech to different languages in real time. Self-driving cars use machine learning to avoid accidents. Even Google’s search engine—for so many years a towering edifice of human-written rules—has begun to rely on these deep neural networks. In February the company replaced its longtime head of search with machine-learning expert John Giannandrea, and it has initiated a major program to retrain its engineers in these new techniques. “By building learning systems,” Giannandrea told reporters this fall, “we don’t have to write these rules anymore.
 
Our machines speak a different language now, one that even the best coders can’t fully understand. 
But here’s the thing: With machine learning, the engineer never knows precisely how the computer accomplishes its tasks. The neural network’s operations are largely opaque and inscrutable. It is, in other words, a black box. And as these black boxes assume responsibility for more and more of our daily digital tasks, they are not only going to change our relationship to technology—they are going to change how we think about ourselves, our world, and our place within it.
If in the old view programmers were like gods, authoring the laws that govern computer systems, now they’re like parents or dog trainers. And as any parent or dog owner can tell you, that is a much more mysterious relationship to find yourself in.
Andy Rubin is an inveterate tinkerer and coder. The cocreator of the Android operating system, Rubin is notorious in Silicon Valley for filling his workplaces and home with robots. He programs them himself. “I got into computer science when I was very young, and I loved it because I could disappear in the world of the computer. It was a clean slate, a blank canvas, and I could create something from scratch,” he says. “It gave me full control of a world that I played in for many, many years.
Now, he says, that world is coming to an end. Rubin is excited about the rise of machine learning—his new company, Playground Global, invests in machine-learning startups and is positioning itself to lead the spread of intelligent devices—but it saddens him a little too. Because machine learning changes what it means to be an engineer.
People don’t linearly write the programs,” Rubin says. “After a neural network learns how to do speech recognition, a programmer can’t go in and look at it and see how that happened. It’s just like your brain. You can’t cut your head off and see what you’re thinking.When engineers do peer into a deep neural network, what they see is an ocean of math: a massive, multilayer set of calculus problems that—by constantly deriving the relationship between billions of data points—generate guesses about the world. 
Artificial intelligence wasn’t supposed to work this way. Until a few years ago, mainstream AI researchers assumed that to create intelligence, we just had to imbue a machine with the right logic. Write enough rules and eventually we’d create a system sophisticated enough to understand the world. They largely ignored, even vilified, early proponents of machine learning, who argued in favor of plying machines with data until they reached their own conclusions. For years computers weren’t powerful enough to really prove the merits of either approach, so the argument became a philosophical one. “Most of these debates were based on fixed beliefs about how the world had to be organized and how the brain worked,” says Sebastian Thrun, the former Stanford AI professor who created Google’s self-driving car. “Neural nets had no symbols or rules, just numbers. That alienated a lot of people.
The implications of an unparsable machine language aren’t just philosophical. For the past two decades, learning to code has been one of the surest routes to reliable employment—a fact not lost on all those parents enrolling their kids in after-school code academies. But a world run by neurally networked deep-learning machines requires a different workforce. Analysts have already started worrying about the impact of AI on the job market, as machines render old skills irrelevant. Programmers might soon get a taste of what that feels like themselves.
Just as Newtonian physics wasn’t obviated by quantum mechanics, code will remain a powerful tool set to explore the world. 
I was just having a conversation about that this morning,” says tech guru Tim O’Reilly when I ask him about this shift. “I was pointing out how different programming jobs would be by the time all these STEM-educated kids grow up.” Traditional coding won’t disappear completely—indeed, O’Reilly predicts that we’ll still need coders for a long time yet—but there will likely be less of it, and it will become a meta skill, a way of creating what Oren Etzioni, CEO of the Allen Institute for Artificial Intelligence, calls the “scaffolding” within which machine learning can operate. Just as Newtonian physics wasn’t obviated by the discovery of quantum mechanics, code will remain a powerful, if incomplete, tool set to explore the world. But when it comes to powering specific functions, machine learning will do the bulk of the work for us. 
Of course, humans still have to train these systems. But for now, at least, that’s a rarefied skill. The job requires both a high-level grasp of mathematics and an intuition for pedagogical give-and-take. “It’s almost like an art form to get the best out of these systems,” says Demis Hassabis, who leads Google’s DeepMind AI team. “There’s only a few hundred people in the world that can do that really well.” But even that tiny number has been enough to transform the tech industry in just a couple of years.
Whatever the professional implications of this shift, the cultural consequences will be even bigger. If the rise of human-written software led to the cult of the engineer, and to the notion that human experience can ultimately be reduced to a series of comprehensible instructions, machine learning kicks the pendulum in the opposite direction. The code that runs the universe may defy human analysis. Right now Google, for example, is facing an antitrust investigation in Europe that accuses the company of exerting undue influence over its search results. Such a charge will be difficult to prove when even the company’s own engineers can’t say exactly how its search algorithms work in the first place.
This explosion of indeterminacy has been a long time coming. It’s not news that even simple algorithms can create unpredictable emergent behavior—an insight that goes back to chaos theory and random number generators. Over the past few years, as networks have grown more intertwined and their functions more complex, code has come to seem more like an alien force, the ghosts in the machine ever more elusive and ungovernable. Planes grounded for no reason. Seemingly unpreventable flash crashes in the stock market. Rolling blackouts.
These forces have led technologist Danny Hillis to declare the end of the age of Enlightenment, our centuries-long faith in logic, determinism, and control over nature. Hillis says we’re shifting to what he calls the age of Entanglement. “As our technological and institutional creations have become more complex, our relationship to them has changed,” he wrote in the Journal of Design and Science. “Instead of being masters of our creations, we have learned to bargain with them, cajoling and guiding them in the general direction of our goals. We have built our own jungle, and it has a life of its own.The rise of machine learning is the latest—and perhaps the last—step in this journey. 
This can all be pretty frightening. After all, coding was at least the kind of thing that a regular person could imagine picking up at a boot camp. Coders were at least human. Now the technological elite is even smaller, and their command over their creations has waned and become indirect. Already the companies that build this stuff find it behaving in ways that are hard to govern. Last summer, Google rushed to apologize when its photo recognition engine started tagging images of black people as gorillas. The company’s blunt first fix was to keep the system from labeling anything as a gorilla.

To nerds of a certain bent, this all suggests a coming era in which we forfeit authority over our machines. “One can imagine such technology 

  • outsmarting financial markets, 
  • out-inventing human researchers, 
  • out-manipulating human leaders, and 
  • developing weapons we cannot even understand,” 

wrote Stephen Hawking—sentiments echoed by Elon Musk and Bill Gates, among others. “Whereas the short-term impact of AI depends on who controls it, the long-term impact depends on whether it can be controlled at all.” 

 
But don’t be too scared; this isn’t the dawn of Skynet. We’re just learning the rules of engagement with a new technology. Already, engineers are working out ways to visualize what’s going on under the hood of a deep-learning system. But even if we never fully understand how these new machines think, that doesn’t mean we’ll be powerless before them. In the future, we won’t concern ourselves as much with the underlying sources of their behavior; we’ll learn to focus on the behavior itself. The code will become less important than the data we use to train it.
This isn’t the dawn of Skynet. We’re just learning the rules of engagement with a new technology. 
If all this seems a little familiar, that’s because it looks a lot like good old 20th-century behaviorism. In fact, the process of training a machine-learning algorithm is often compared to the great behaviorist experiments of the early 1900s. Pavlov triggered his dog’s salivation not through a deep understanding of hunger but simply by repeating a sequence of events over and over. He provided data, again and again, until the code rewrote itself. And say what you will about the behaviorists, they did know how to control their subjects.
In the long run, Thrun says, machine learning will have a democratizing influence. In the same way that you don’t need to know HTML to build a website these days, you eventually won’t need a PhD to tap into the insane power of deep learning. Programming won’t be the sole domain of trained coders who have learned a series of arcane languages. It’ll be accessible to anyone who has ever taught a dog to roll over. “For me, it’s the coolest thing ever in programming,” Thrun says, “because now anyone can program.
For much of computing history, we have taken an inside-out view of how machines work. First we write the code, then the machine expresses it. This worldview implied plasticity, but it also suggested a kind of rules-based determinism, a sense that things are the product of their underlying instructions. Machine learning suggests the opposite, an outside-in view in which code doesn’t just determine behavior, behavior also determines code. Machines are products of the world.
Ultimately we will come to appreciate both the power of handwritten linear code and the power of machine-learning algorithms to adjust it—the give-and-take of design and emergence. It’s possible that biologists have already started figuring this out. Gene-editing techniques like Crispr give them the kind of code-manipulating power that traditional software programmers have wielded. But discoveries in the field of epigenetics suggest that genetic material is not in fact an immutable set of instructions but rather a dynamic set of switches that adjusts depending on the environment and experiences of its host. Our code does not exist separate from the physical world; it is deeply influenced and transmogrified by it. Venter may believe cells are DNA-software-driven machines, but epigeneticist Steve Cole suggests a different formulation: “A cell is a machine for turning experience into biology.
A cell is a machine for turning experience into biology.” 
Steve Cole
And now, 80 years after Alan Turing first sketched his designs for a problem-solving machine, computers are becoming devices for turning experience into technology. For decades we have sought the secret code that could explain and, with some adjustments, optimize our experience of the world. But our machines won’t work that way for much longer—and our world never really did. We’re about to have a more complicated but ultimately more rewarding relationship with technology. We will go from commanding our devices to parenting them.

What the AI Behind AlphaGo Teaches Us About Humanity. Watch this on The Scene.
Editor at large Jason Tanz (@jasontanz) wrote about Andy Rubin’s new company, Playground, in issue 24.03.
This article appears in the June issue. Go Back to Top. Skip To: Start of Article.
ORIGINAL: Wired

OpenAI Gym Beta

By Hugo Angel,

We’re releasing the public beta of OpenAI Gym, a toolkit for developing and comparingreinforcement learning (RL) algorithms. It consists of a growing suite of environments (fromsimulated robots to Atari games), and a site for comparing and reproducing results. OpenAI Gym is compatible with algorithms written in any framework, such as Tensorflowand Theano. The environments are written in Python, but we’ll soon make them easy to use from any language.

We originally built OpenAI Gym as a tool to accelerate our own RL research. We hope it will be just as useful for the broader community.
Getting started
If you’d like to dive in right away, you can work through our tutorial. You can also help out while learning by reproducing a result.
Why RL?
Reinforcement learning (RL) is the subfield of machine learning concerned with decision making and motor control. It studies how an agent can learn how to achieve goals in a complex, uncertain environment. It’s exciting for two reasons:
  1. RL is very general, encompassing all problems that involve making a sequence of decisions: for example, controlling a robot’s motors so that it’s able to run and jump, making business decisions like pricing and inventory management, or playing video games and board games. RL can even be applied to supervised learning problems with sequential or structured outputs.
  2. RL algorithms have started to achieve good results in many difficult environments. RL has a long history, but until recent advances in deep learning, it required lots of problem-specific engineering. DeepMind’s Atari results, BRETT from Pieter Abbeel’s group, and AlphaGo all used deep RL algorithms which did not make too many assumptions about their environment, and thus can be applied in other settings.
However, RL research is also slowed down by two factors:
  1. The need for better benchmarks. In supervised learning, progress has been driven by large labeled datasets like ImageNet. In RL, the closest equivalent would be a large and diverse collection of environments. However, the existing open-source collections of RL environments don’t have enough variety, and they are often difficult to even set up and use.
  2. Lack of standardization of environments used in publications. Subtle differences in the problem definition, such as the reward function or the set of actions, can drastically alter a task’s difficulty. This issue makes it difficult to reproduce published research and compare results from different papers.
OpenAI Gym is an attempt to fix both problems.
The Environments
OpenAI Gym provides a diverse suite of environments that range from easy to difficult and involve many different kinds of data. We’re starting out with the following collections:
  • Classic control and toy text: complete small-scale tasks, mostly from the RL literature. They’re here to get you started.
  • Algorithmic: perform computations such as adding multi-digit numbers and reversing sequences. One might object that these tasks are easy for a computer. The challenge is to learn these algorithms purely from examples. These tasks have the nice property that it’s easy to vary the difficulty by varying the sequence length.
  • Atari: play classic Atari games. We’ve integrated the Arcade Learning Environment (which has had a big impact on reinforcement learning research) in an easy-to-install form.
  • Board games: play Go on 9×9 and 19×19 boards. Two-player games are fundamentally different than the other settings we’ve included, because there is an adversary playing against you. In our initial release, there is a fixed opponent provided by Pachi, and we may add other opponents later (patches welcome!). We’ll also likely expand OpenAI Gym to have first-class support for multi-player games.
  • 2D and 3D robots: control a robot in simulation. These tasks use the MuJoCo physics engine, which was designed for fast and accurate robot simulation. Included are some environments from a recent benchmark by UC Berkeley researchers (who incidentally will be joining us this summer). MuJoCo is proprietary software, but offers free trial licenses.
Over time, we plan to greatly expand this collection of environments. Contributions from the community are more than welcome.
Each environment has a version number (such as Hopper-v0). If we need to change an environment, we’ll bump the version number, defining an entirely new task. This ensures that results on a particular environment are always comparable.
Evaluations
We’ve made it easy to upload results to OpenAI Gym. However, we’ve opted not to create traditional leaderboards. What matters for research isn’t your score (it’s possible to overfit or hand-craft solutions to particular tasks), but instead the generality of your technique.
We’re starting out by maintaing a curated list of contributions that say something interesting about algorithmic capabilities. Long-term, we want this curation to be a community effort rather than something owned by us. We’ll necessarily have to figure out the details over time, and we’d would love your help in doing so.
We want OpenAI Gym to be a community effort from the beginning. We’ve starting working with partners to put together resources around OpenAI Gym:
During the public beta, we’re looking for feedback on how to make this into an even better tool for research. If you’d like to help, you can try your hand at improving the state-of-the-art on each environment, reproducing other people’s results, or even implementing your own environments. Also please join us in the community chat!
ORIGINAL: OpenAI
by Greg Brockman and John Schulman
April 27, 2016