Category: DeepMind


Google DeepMind has built an AI machine that could learn as quickly as humans before long

By Hugo Angel,

Neural Episodic Control. Architecture of episodic memory module for a single action

Emerging Technology from the arXiv

Intelligent machines have humans in their sights.

Deep-learning machines already have superhuman skills when it comes to tasks such as

  • face recognition,
  • video-game playing, and
  • even the ancient Chinese game of Go.

So it’s easy to think that humans are already outgunned.

But not so fast. Intelligent machines still lag behind humans in one crucial area of performance: the speed at which they learn. When it comes to mastering classic video games, for example, the best deep-learning machines take some 200 hours of play to reach the same skill levels that humans achieve in just two hours.

So computer scientists would dearly love to have some way to speed up the rate at which machines learn.

Today, Alexander Pritzel and pals at Google’s DeepMind subsidiary in London claim to have done just that. These guys have built a deep-learning machine that is capable of rapidly assimilating new experiences and then acting on them. The result is a machine that learns significantly faster than others and has the potential to match humans in the not too distant future.

First, some background.

Deep learning uses layers of neural networks to look for patterns in data. When a single layer spots a pattern it recognizes, it sends this information to the next layer, which looks for patterns in this signal, and so on.

So in face recognition,

  • one layer might look for edges in an image,
  • the next layer for circular patterns of edges (the kind that eyes and mouths make), and
  • the next for triangular patterns such as those made by two eyes and a mouth.
  • When all this happens, the final output is an indication that a face has been spotted.

Of course, the devil is in the details. There are various systems of feedback to allow the system to learn by adjusting various internal parameters such as the strength of connections between layers. These parameters must change slowly, since a big change in one layer can catastrophically affect learning in the subsequent layers. That’s why deep neural networks need so much training and why it takes so long.

Pritzel and co have tackled this problem with a technique they call Neural Episodic Control. “Neural episodic control demonstrates dramatic improvements on the speed of learning for a wide range of environments,” they say. “Critically, our agent is able to rapidly latch onto highly successful strategies as soon as they are experienced, instead of waiting for many steps of optimisation.

The basic idea behind DeepMind’s approach is to copy the way humans and animals learn quickly. The general consensus is that humans can tackle situations in two different ways.

  • If the situation is familiar, our brains have already formed a model of it, which they use to work out how best to behave. This uses a part of the brain called the prefrontal cortex.
  • But when the situation is not familiar, our brains have to fall back on another strategy. This is thought to involve a much simpler test-and-remember approach involving the hippocampus. So we try something and remember the outcome of this episode. If it is successful, we try it again, and so on. But if it is not a successful episode, we try to avoid it in future.

This episodic approach suffices in the short term while our prefrontal brain learns. But it is soon outperformed by the prefrontal cortex and its model-based approach.

Pritzel and co have used this approach as their inspiration. Their new system has two approaches.

  • The first is a conventional deep-learning system that mimics the behaviur of the prefrontal cortex.
  • The second is more like the hippocampus. When the system tries something new, it remembers the outcome.

But crucially, it doesn’t try to learn what to remember. Instead, it remembers everything. “Our architecture does not try to learn when to write to memory, as this can be slow to learn and take a significant amount of time,” say Pritzel and co. “Instead, we elect to write all experiences to the memory, and allow it to grow very large compared to existing memory architectures.

They then use a set of strategies to read from this large memory quickly. The result is that the system can latch onto successful strategies much more quickly than conventional deep-learning systems.

They go on to demonstrate how well all this works by training their machine to play classic Atari video games, such as Breakout, Pong, and Space Invaders. (This is a playground that DeepMind has used to train many deep-learning machines.)

The team, which includes DeepMind cofounder Demis Hassibis, shows that neural episodic control vastly outperforms other deep-learning approaches in the speed at which it learns. “Our experiments show that neural episodic control requires an order of magnitude fewer interactions with the environment,” they say.

That’s impressive work with significant potential. The researchers say that an obvious extension of this work is to test their new approach on more complex 3-D environments.

It’ll be interesting to see what environments the team chooses and the impact this will have on the real world. We’ll look forward to seeing how that works out.

Ref: Neural Episodic Control : arxiv.org/abs/1703.01988

ORIGINAL: MIT Technology Review

Google’s AI can now learn from its own memory independently

By Hugo Angel,

An artist’s impression of the DNC. Credit: DeepMind
The DeepMind artificial intelligence (AI) being developed by Google‘s parent company, Alphabet, can now intelligently build on what’s already inside its memory, the system’s programmers have announced.
Their new hybrid system – called a Differential Neural Computer (DNC)pairs a neural network with the vast data storage of conventional computers, and the AI is smart enough to navigate and learn from this external data bank. 
What the DNC is doing is effectively combining external memory (like the external hard drive where all your photos get stored) with the neural network approach of AI, where a massive number of interconnected nodes work dynamically to simulate a brain.
These models… can learn from examples like neural networks, but they can also store complex data like computers,” write DeepMind researchers Alexander Graves and Greg Wayne in a blog post.
At the heart of the DNC is a controller that constantly optimises its responses, comparing its results with the desired and correct ones. Over time, it’s able to get more and more accurate, figuring out how to use its memory data banks at the same time.
Take a family tree: after being told about certain relationships, the DNC was able to figure out other family connections on its own – writing, rewriting, and optimising its memory along the way to pull out the correct information at the right time.
Another example the researchers give is a public transit system, like the London Underground. Once it’s learned the basics, the DNC can figure out more complex relationships and routes without any extra help, relying on what it’s already got in its memory banks.
In other words, it’s functioning like a human brain, taking data from memory (like tube station positions) and figuring out new information (like how many stops to stay on for).
Of course, any smartphone mapping app can tell you the quickest way from one tube station to another, but the difference is that the DNC isn’t pulling this information out of a pre-programmed timetable – it’s working out the information on its own, and juggling a lot of data in its memory all at once.
The approach means a DNC system could take what it learned about the London Underground and apply parts of its knowledge to another transport network, like the New York subway.
The system points to a future where artificial intelligence could answer questions on new topics, by deducing responses from prior experiences, without needing to have learned every possible answer beforehand.
Credit: DeepMind

Of course, that’s how DeepMind was able to beat human champions at Go – by studying millions of Go moves. But by adding external memory, DNCs are able to take on much more complex tasks and work out better overall strategies, its creators say.

Like a conventional computer, [a DNC] can use its memory to represent and manipulate complex data structures, but, like a neural network, it can learn to do so from data,” the researchers explain in Nature.
In another test, the DNC was given two bits of information: “John is in the playground,” and “John picked up the football.” With those known facts, when asked “Where is the football?“, it was able to answer correctly by combining memory with deep learning. (The football is in the playground, if you’re stuck.)
Making those connections might seem like a simple task for our powerful human brains, but until now, it’s been a lot harder for virtual assistants, such as Siri, to figure out.
With the advances DeepMind is making, the researchers say we’re another step forward to producing a computer that can reason independently.
And then we can all start enjoying our robot-driven utopia – or technological dystopia – depending on your point of view.
ORIGINAL: ScienceAlert
By DAVID NIELD

14 OCT 2016

Google’s Deep Mind Gives AI a Memory Boost That Lets It Navigate London’s Underground

By Hugo Angel,

Photo: iStockphoto

Google’s DeepMind artificial intelligence lab does more than just develop computer programs capable of beating the world’s best human players in the ancient game of Go. The DeepMind unit has also been working on the next generation of deep learning software that combines the ability to recognize data patterns with the memory required to decipher more complex relationships within the data.

Deep learning is the latest buzz word for artificial intelligence algorithms called neural networks that can learn over time by filtering huge amounts of relevant data through many “deep” layers. The brain-inspired neural network layers consist of nodes (also known as neurons). Tech giants such as Google, Facebook, Amazon, and Microsoft have been training neural networks to learn how to better handle tasks such as recognizing images of dogs or making better Chinese-to-English translations. These AI capabilities have already benefited millions of people using Google Translate and other online services.
But neural networks face huge challenges when they try to rely solely on pattern recognition without having the external memory to store and retrieve information. To improve deep learning’s capabilities, Google DeepMind created a “differentiable neural computer” (DNC) that gives neural networks an external memory for storing information for later use.
Neural networks are like the human brain; we humans cannot assimilate massive amounts of data and we must rely on external read-write memory all the time,” says Jay McClelland, director of the Center for Mind, Brain and Computation at Stanford University. “We once relied on our physical address books and Rolodexes; now of course we rely on the read-write storage capabilities of regular computers.
McClelland is a cognitive scientist who served as one of several independent peer reviewers for the Google DeepMind paper that describes development of this improved deep learning system. The full paper is presented in the 12 Oct 2016 issue of the journal Nature.
The DeepMind team found that the DNC system’s combination of the neural network and external memory did much better than a neural network alone in tackling the complex relationships between data points in so-called “graph tasks.” For example, they asked their system to either simply take any path between points A and B or to find the shortest travel routes based on a symbolic map of the London Underground subway.
An unaided neural network could not even finish the first level of training, based on traveling between two subway stations without trying to find the shortest route. It achieved an average accuracy of just 37 percent after going through almost two million training examples. By comparison, the neural network with access to external memory in the DNC system successfully completed the entire training curriculum and reached an average of 98.8 percent accuracy on the final lesson.
The external memory of the DNC system also proved critical to success in performing logical planning tasks such as solving simple block puzzle challenges. Again, a neural network by itself could not even finish the first lesson of the training curriculum for the block puzzle challenge. The DNC system was able to use its memory to store information about the challenge’s goals and to effectively plan ahead by writing its decisions to memory before acting upon them.
In 2014, DeepMind’s researchers developed another system, called the neural Turing machine, that also combined neural networks with external memory. But the neural Turing machine was limited in the way it could access “memories” (information) because such memories were effectively stored and retrieved in fixed blocks or arrays. The latest DNC system can access memories in any arbitrary location, McClelland explains.
The DNC system’s memory architecture even bears a certain resemblance to how the hippocampus region of the brain supports new brain cell growth and new connections in order to store new memories. Just as the DNC system uses the equivalent of time stamps to organize the storage and retrieval of memories, human “free recall” experiments have shown that people are more likely to recall certain items in the same order as first presented.
Despite these similarities, the DNC’s design was driven by computational considerations rather than taking direct inspiration from biological brains, DeepMind’s researchers write in their paper. But McClelland says that he prefers not to think of the similarities as being purely coincidental.
The design decisions that motivated the architects of the DNC were the same as those that structured the human memory system, although the latter (in my opinion) was designed by a gradual evolutionary process, rather than by a group of brilliant AI researchers,” McClelland says.
Human brains still have significant advantages over any brain-inspired deep learning software. For example, human memory seems much better at storing information so that it is accessible by both context or content, McClelland says. He expressed hope that future deep learning and AI research could better capture the memory advantages of biological brains.
 
DeepMind’s DNC system and similar neural learning systems may represent crucial steps for the ongoing development of AI. But the DNC system still falls well short of what McClelland considers the most important parts of human intelligence.
The DNC is a sophisticated form of external memory, but ultimately it is like the papyrus on which Euclid wrote the elements. The insights of mathematicians that Euclid codified relied (in my view) on a gradual learning process that structured the neural circuits in their brains so that they came to be able to see relationships that others had not seen, and that structured the neural circuits in Euclid’s brain so that he could formulate what to write. We have a long way to go before we understand fully the algorithms the human brain uses to support these processes.
It’s unclear when or how Google might take advantage of the capabilities offered by the DNC system to boost its commercial products and services. The DeepMind team was “heads down in research” or too busy with travel to entertain media questions at this time, according to a Google spokesperson.
But Herbert Jaeger, professor for computational science at Jacobs University Bremen in Germany, sees the DeepMind team’s work as a “passing snapshot in a fast evolution sequence of novel neural learning architectures.” In fact, he’s confident that the DeepMind team already has something better than the DNC system described in the Nature paper. (Keep in mind that the paper was submitted back in January 2016.)
DeepMind’s work is also part of a bigger trend in deep learning, Jaeger says. The leading deep learning teams at Google and other companies are racing to build new AI architectures with many different functional modules—among them, attentional control or working memory; they then train the systems through deep learning.
The DNC is just one among dozens of novel, highly potent, and cleverly-thought-out neural learning systems that are popping up all over the place,” Jaeger says.
ORIGINAL: IEEE Spectrum
12 Oct 2016

Partnership on Artificial Intelligence to Benefit People and Society

By Hugo Angel,

Established to study and formulate best practices on AI technologies, to advance the public’s understanding of AI, and to serve as an open platform for discussion and engagement about AI and its influences on people and society.

THE LATEST
INDUSTRY LEADERS ESTABLISH PARTNERSHIP ON AI BEST PRACTICES

Press ReleasesSeptember 28, 2016 NEW YORK —  IBM, DeepMind,/Google,  Microsoft, Amazon, and Facebook today announced that they will create a non-profit organization that will work to advance public understanding of artificial intelligence technologies (AI) and formulate best practices on the challenges and opportunities within the field. Academics, non-profits, and specialists in policy and ethics will be invited to join the Board of the organization, named the Partnership on Artificial Intelligence to Benefit People and Society (Partnership on AI).

The objective of the Partnership on AI is to address opportunities and challenges with AI technologies to benefit people and society. Together, the organization’s members will conduct research, recommend best practices, and publish research under an open license in areas such as ethics, fairness, and inclusivity; transparency, privacy, and interoperability; collaboration between people and AI systems; and the trustworthiness, reliability, and robustness of the technology. It does not intend to lobby government or other policymaking bodies.

The organization’s founding members will each contribute financial and research resources to the partnership and will share leadership with independent third-parties, including academics, user group advocates, and industry domain experts. There will be equal representation of corporate and non-corporate members on the board of this new organization. The Partnership is in discussions with professional and scientific organizations, such as the Association for the Advancement of Artificial Intelligence (AAAI), as well as non-profit research groups including the Allen Institute for Artificial Intelligence (AI2), and anticipates announcements regarding additional participants in the near future.

AI technologies hold tremendous potential to improve many aspects of life, ranging from healthcare, education, and manufacturing to home automation and transportation. Through rigorous research, the development of best practices, and an open and transparent dialogue, the founding members of the Partnership on AI hope to maximize this potential and ensure it benefits as many people as possible.

… Continue reading

WaveNet: A Generative Model for Raw Audio by Google DeepMind

By Hugo Angel,

WaveNet: A Generative Model for Raw Audio
This post presents WaveNet, a deep generative model of raw audio waveforms. We show that WaveNets are able to generate speech which mimics any human voice and which sounds more natural than the best existing Text-to-Speech systems, reducing the gap with human performance by over 50%.
We also demonstrate that the same network can be used to synthesize other audio signals such as music, and present some striking samples of automatically generated piano pieces.
Talking Machines
Allowing people to converse with machines is a long-standing dream of human-computer interaction. The ability of computers to understand natural speech has been revolutionised in the last few years by the application of deep neural networks (e.g.,Google Voice Search). However, generating speech with computers — a process usually referred to as speech synthesis or text-to-speech (TTS) — is still largely based on so-called concatenative TTS, where a very large database of short speech fragments are recorded from a single speaker and then recombined to form complete utterances. This makes it difficult to modify the voice (for example switching to a different speaker, or altering the emphasis or emotion of their speech) without recording a whole new database.
This has led to a great demand for parametric TTS, where all the information required to generate the data is stored in the parameters of the model, and the contents and characteristics of the speech can be controlled via the inputs to the model. So far, however, parametric TTS has tended to sound less natural than concatenative, at least for syllabic languages such as English. Existing parametric models typically generate audio signals by passing their outputs through signal processing algorithms known asvocoders.
WaveNet changes this paradigm by directly modelling the raw waveform of the audio signal, one sample at a time. As well as yielding more natural-sounding speech, using raw waveforms means that WaveNet can model any kind of audio, including music.
WaveNets

 

Wave animation

 

Researchers usually avoid modelling raw audio because it ticks so quickly: typically 16,000 samples per second or more, with important structure at many time-scales. Building a completely autoregressive model, in which the prediction for every one of those samples is influenced by all previous ones (in statistics-speak, each predictive distribution is conditioned on all previous observations), is clearly a challenging task.
However, our PixelRNN and PixelCNN models, published earlier this year, showed that it was possible to generate complex natural images not only one pixel at a time, but one colour-channel at a time, requiring thousands of predictions per image. This inspired us to adapt our two-dimensional PixelNets to a one-dimensional WaveNet.
Architecture animation

 

 The above animation shows how a WaveNet is structured. It is a fully convolutional neural network, where the convolutional layers have various dilation factors that allow its receptive field to grow exponentially with depth and cover thousands of timesteps.At training time, the input sequences are real waveforms recorded from human speakers. After training, we can sample the network to generate synthetic utterances. At each step during sampling a value is drawn from the probability distribution computed by the network. This value is then fed back into the input and a new prediction for the next step is made. Building up samples one step at a time like this is computationally expensive, but we have found it essential for generating complex, realistic-sounding audio.
Improving the State of the Art
We trained WaveNet using some of Google’s TTS datasets so we could evaluate its performance. The following figure shows the quality of WaveNets on a scale from 1 to 5, compared with Google’s current best TTS systems (parametric and concatenative), and with human speech using Mean Opinion Scores (MOS). MOS are a standard measure for subjective sound quality tests, and were obtained in blind tests with human subjects (from over 500 ratings on 100 test sentences). As we can see, WaveNets reduce the gap between the state of the art and human-level performance by over 50% for both US English and Mandarin Chinese.
For both Chinese and English, Google’s current TTS systems are considered among the best worldwide, so improving on both with a single model is a major achievement.

 

Here are some samples from all three systems so you can listen and compare yourself:

US English:

Mandarin Chinese:

Knowing What to Say

In order to use WaveNet to turn text into speech, we have to tell it what the text is. We do this by transforming the text into a sequence of linguistic and phonetic features (which contain information about the current phoneme, syllable, word, etc.) and by feeding it into WaveNet. This means the network’s predictions are conditioned not only on the previous audio samples, but also on the text we want it to say.
If we train the network without the text sequence, it still generates speech, but now it has to make up what to say. As you can hear from the samples below, this results in a kind of babbling, where real words are interspersed with made-up word-like sounds:

 

Notice that non-speech sounds, such as breathing and mouth movements, are also sometimes generated by WaveNet; this reflects the greater flexibility of a raw-audio model.
As you can hear from these samples, a single WaveNet is able to learn the characteristics of many different voices, male and female. To make sure it knew which voice to use for any given utterance, we conditioned the network on the identity of the speaker. Interestingly, we found that training on many speakers made it better at modelling a single speaker than training on that speaker alone, suggesting a form of transfer learning.
By changing the speaker identity, we can use WaveNet to say the same thing in different voices:

 

Similarly, we could provide additional inputs to the model, such as emotions or accents, to make the speech even more diverse and interesting.
Making Music
Since WaveNets can be used to model any audio signal, we thought it would also be fun to try to generate music. Unlike the TTS experiments, we didn’t condition the networks on an input sequence telling it what to play (such as a musical score); instead, we simply let it generate whatever it wanted to. When we trained it on a dataset of classical piano music, it produced fascinating samples like the ones below:

 

WaveNets open up a lot of possibilities for TTS, music generation and audio modelling in general. The fact that directly generating timestep per timestep with deep neural networks works at all for 16kHz audio is really surprising, let alone that it outperforms state-of-the-art TTS systems. We are excited to see what we can do with them next.
For more details, take a look at our paper.


ORIGINAL: Google DeepMind
Aäron van den Oord. Research Scientist, DeepMind
Heiga Zen. Research Scientist, Google
Sander Dieleman. Research Scientist, DeepMind
8 September 2016

© 2016 DeepMind Technologies Limited