An artist’s impression of the DNC. Credit: DeepMind
The DeepMind artificial intelligence (AI) being developed by Google‘s parent company, Alphabet, can now intelligently build on what’s already inside its memory, the system’s programmers have announced.
Their new hybrid system – called a Differential Neural Computer (DNC) – pairs a neural network with the vast data storage of conventional computers, and the AI is smart enough to navigate and learn from this external data bank.
What the DNC is doing is effectively combining external memory (like the external hard drive where all your photos get stored) with the neural network approach of AI, where a massive number of interconnected nodes work dynamically to simulate a brain.
“These models… can learn from examples like neural networks, but they can also store complex data like computers,” write DeepMind researchers Alexander Graves and Greg Waynein a blog post.
At the heart of the DNC is a controller that constantly optimises its responses, comparing its results with the desired and correct ones. Over time, it’s able to get more and more accurate, figuring out how to use its memory data banks at the same time.
Take a family tree: after being told about certain relationships, the DNC was able to figure out other family connections on its own – writing, rewriting, and optimising its memory along the way to pull out the correct information at the right time.
Another example the researchers give is a public transit system, like the London Underground. Once it’s learned the basics, the DNC can figure out more complex relationships and routes without any extra help, relying on what it’s already got in its memory banks.
In other words, it’s functioning like a human brain, taking data from memory (like tube station positions) and figuring out new information (like how many stops to stay on for).
Of course, any smartphone mapping app can tell you the quickest way from one tube station to another, but the difference is that the DNC isn’t pulling this information out of a pre-programmed timetable – it’s working out the information on its own, and juggling a lot of data in its memory all at once.
The approach means a DNC system could take what it learned about the London Underground and apply parts of its knowledge to another transport network, like the New York subway.
The system points to a future where artificial intelligence could answer questions on new topics, by deducing responses from prior experiences, without needing to have learned every possible answer beforehand.
Of course, that’s how DeepMind was able to beat human champions at Go – by studying millions of Go moves. But by adding external memory, DNCs are able to take on much more complex tasks and work out better overall strategies, its creators say.
“Like a conventional computer, [a DNC] can use its memory to represent and manipulate complex data structures, but, like a neural network, it can learn to do so from data,” the researchers explain in Nature.
In another test, the DNC was given two bits of information: “John is in the playground,” and “John picked up the football.” With those known facts, when asked “Where is the football?“, it was able to answer correctly by combining memory with deep learning. (The football is in the playground, if you’re stuck.)
Making those connections might seem like a simple task for our powerful human brains, but until now, it’s been a lot harder for virtual assistants, such as Siri, to figure out.
With the advances DeepMind is making, the researchers say we’re another step forward to producing a computer that can reason independently.
And then we can all start enjoying our robot-driven utopia – or technological dystopia – depending on your point of view.
Google’s DeepMind artificial intelligence lab does more than just develop computer programs capable of beating the world’s best human players in the ancient game of Go. The DeepMind unit has also been working on the next generation of deep learning software that combines the ability to recognize data patterns with the memory required to decipher more complex relationships within the data.
Deep learning is the latest buzz word for artificial intelligence algorithms called neural networks that can learn over time by filtering huge amounts of relevant data through many “deep” layers. The brain-inspired neural network layers consist of nodes (also known as neurons). Tech giants such as Google, Facebook, Amazon, and Microsoft have been training neural networks to learn how to better handle tasks such as recognizing images of dogs or making better Chinese-to-English translations. These AI capabilities have already benefited millions of people using Google Translate and other online services.
But neural networks face huge challenges when they try to rely solely on pattern recognition without having the external memory to store and retrieve information. To improve deep learning’s capabilities, Google DeepMind created a “differentiable neural computer” (DNC) that gives neural networks an external memory for storing information for later use.
“Neural networks are like the human brain; we humans cannot assimilate massive amounts of data and we must rely on external read-write memory all the time,” says Jay McClelland, director of the Center for Mind, Brain and Computation at Stanford University. “We once relied on our physical address books and Rolodexes; now of course we rely on the read-write storage capabilities of regular computers.”
McClelland is a cognitive scientist who served as one of several independent peer reviewers for the Google DeepMind paper that describes development of this improved deep learning system. The full paper is presented in the 12 Oct 2016 issue of the journal Nature.
The DeepMind team found that the DNC system’s combination of the neural network and external memory did much better than a neural network alone in tackling the complex relationships between data points in so-called “graph tasks.” For example, they asked their system to either simply take any path between points A and B or to find the shortest travel routes based on a symbolic map of the London Underground subway.
An unaided neural network could not even finish the first level of training, based on traveling between two subway stations without trying to find the shortest route. It achieved an average accuracy of just 37 percent after going through almost two million training examples. By comparison, the neural network with access to external memory in the DNC system successfully completed the entire training curriculum and reached an average of 98.8 percent accuracy on the final lesson.
The external memory of the DNC system also proved critical to success in performing logical planning tasks such as solving simple block puzzle challenges. Again, a neural network by itself could not even finish the first lesson of the training curriculum for the block puzzle challenge. The DNC system was able to use its memory to store information about the challenge’s goals and to effectively plan ahead by writing its decisions to memory before acting upon them.
In 2014, DeepMind’s researchers developed another system, called the neural Turing machine, that also combined neural networks with external memory. But the neural Turing machine was limited in the way it could access “memories” (information) because such memories were effectively stored and retrieved in fixed blocks or arrays. The latest DNC system can access memories in any arbitrary location, McClelland explains.
The DNC system’s memory architecture even bears a certain resemblance to how the hippocampus region of the brain supports new brain cell growth and new connections in order to store new memories. Just as the DNC system uses the equivalent of time stamps to organize the storage and retrieval of memories, human “free recall” experiments have shown that people are more likely to recall certain items in the same order as first presented.
Despite these similarities, the DNC’s design was driven by computational considerations rather than taking direct inspiration from biological brains, DeepMind’s researchers write in their paper. But McClelland says that he prefers not to think of the similarities as being purely coincidental.
“The design decisions that motivated the architects of the DNC were the same as those that structured the human memory system, although the latter (in my opinion) was designed by a gradual evolutionary process, rather than by a group of brilliant AI researchers,” McClelland says.
Human brains still have significant advantages over any brain-inspired deep learning software. For example, human memory seems much better at storing information so that it is accessible by both context or content, McClelland says. He expressed hope that future deep learning and AI research could better capture the memory advantages of biological brains.
DeepMind’s DNC system and similar neural learning systems may represent crucial steps for the ongoing development of AI. But the DNC system still falls well short of what McClelland considers the most important parts of human intelligence.
The DNC is a sophisticated form of external memory, but ultimately it is like the papyrus on which Euclid wrote the elements. The insights of mathematicians that Euclid codified relied (in my view) on a gradual learning process that structured the neural circuits in their brains so that they came to be able to see relationships that others had not seen, and that structured the neural circuits in Euclid’s brain so that he could formulate what to write. We have a long way to go before we understand fully the algorithms the human brain uses to support these processes.
It’s unclear when or how Google might take advantage of the capabilities offered by the DNC system to boost its commercial products and services. The DeepMind team was “heads down in research” or too busy with travel to entertain media questions at this time, according to a Google spokesperson.
But Herbert Jaeger, professor for computational science at Jacobs University Bremen in Germany, sees the DeepMind team’s work as a “passing snapshot in a fast evolution sequence of novel neural learning architectures.” In fact, he’s confident that the DeepMind team already has something better than the DNC system described in the Nature paper. (Keep in mind that the paper was submitted back in January 2016.)
DeepMind’s work is also part of a bigger trend in deep learning, Jaeger says. The leading deep learning teams at Google and other companies are racing to build new AI architectures with many different functional modules—among them, attentional control or working memory; they then train the systems through deep learning.
“The DNC is just one among dozens of novel, highly potent, and cleverly-thought-out neural learning systems that are popping up all over the place,” Jaeger says.
Established to study and formulate best practices on AI technologies, to advance the public’s understanding of AI, and to serve as an open platform for discussion and engagement about AI and its influences on people and society.
INDUSTRY LEADERS ESTABLISH PARTNERSHIP ON AI BEST PRACTICES
Press ReleasesSeptember 28, 2016 NEW YORK — IBM, DeepMind,/Google, Microsoft, Amazon, and Facebook today announced that they will create a non-profit organization that will work to advance public understanding of artificial intelligence technologies (AI) and formulate best practices on the challenges and opportunities within the field. Academics, non-profits, and specialists in policy and ethics will be invited to join the Board of the organization, named the Partnership on Artificial Intelligence to Benefit People and Society (Partnership on AI).
The objective of the Partnership on AI is to address opportunities and challenges with AI technologies to benefit people and society. Together, the organization’s members will conduct research, recommend best practices, and publish research under an open license in areas such as ethics, fairness, and inclusivity; transparency, privacy, and interoperability; collaboration between people and AI systems; and the trustworthiness, reliability, and robustness of the technology. It does not intend to lobby government or other policymaking bodies.
The organization’s founding members will each contribute financial and research resources to the partnership and will share leadership with independent third-parties, including academics, user group advocates, and industry domain experts. There will be equal representation of corporate and non-corporate members on the board of this new organization. The Partnership is in discussions with professional and scientific organizations, such as the Association for the Advancement of Artificial Intelligence (AAAI), as well as non-profit research groups including the Allen Institute for Artificial Intelligence (AI2), and anticipates announcements regarding additional participants in the near future.
AI technologies hold tremendous potential to improve many aspects of life, ranging from healthcare, education, and manufacturing to home automation and transportation. Through rigorous research, the development of best practices, and an open and transparent dialogue, the founding members of the Partnership on AI hope to maximize this potential and ensure it benefits as many people as possible.
So what’s new?
Our 2014 system used the Inception V1image classification model to initialize the image encoder, which
produces the encodings that are useful for recognizing different objects in the images. This was the best image model available at the time, achieving 89.6% top-5 accuracy on the benchmark ImageNet 2012 image classification task. We replaced this in 2015 with the newer Inception V2image classification model, which achieves 91.8% accuracy on the same task.The improved vision component gave our captioning system an accuracy boost of 2 points in the BLEU-4 metric (which is commonly used in machine translation to evaluate the quality of generated sentences) and was an important factor of its success in the captioning challenge.Today’s code release initializes the image encoder using the Inception V3model, which achieves 93.9% accuracy on the ImageNet classification task. Initializing the image encoder with a better vision model gives the image captioning system a better ability to recognize different objects in the images, allowing it to generate more detailed and accurate descriptions. This gives an additional 2 points of improvement in the BLEU-4 metric over the system used in the captioning challenge.Another key improvement to the vision component comes from fine-tuning the image model. This step addresses the problem that the image encoder is initialized by a model trained to classify objects in images, whereas the goal of the captioning system is to describe the objects in images using the encodings produced by the image model. For example, an image classification model will tell you that a dog, grass and a frisbee are in the image, but a natural description should also tell you the color of the grass and how the dog relates to the frisbee. In the fine-tuning phase, the captioning system is improved by jointly training its vision and language components on human generated captions. This allows the captioning system to transfer information from the image that is specifically useful for generating descriptive captions, but which was not necessary for classifying objects. In particular, after fine-tuning it becomes better at correctly describing the colors of objects. Importantly, the fine-tuning phase must occur after the language component has already learned to generate captions – otherwise, the noisiness of the randomly initialized language component causes irreversible corruption to the vision component. For more details, read the full paper here.
Left: the better image model allows the captioning model to generate more detailed and accurate descriptions. Right: after fine-tuning the image model, the image captioning system is more likely to describe the colors of objects correctly.
Until recently our image captioning system was implemented in the DistBelief software framework. The TensorFlow implementation released today achieves the same level of accuracy with significantly faster performance: time per training step
is just 0.7 seconds in TensorFlow compared to 3 seconds in DistBelief on an Nvidia K20 GPU, meaning that total training time is just 25% of the time previously required.A natural question is whether our captioning system can generate novel descriptions of previously unseen contexts and interactions. The system is trained by showing it hundreds of thousands of images that were captioned manually by humans, and it often re-uses human captions when presented with scenes similar to what it’s seen before.
When the model is presented with scenes similar to what it’s seen before, it will often re-use human generated captions.
So does it really understand the objects and their interactions in each image? Or does it always regurgitate descriptions from the training data? Excitingly, our model does indeed develop the ability to generate accurate new captions when presented with completely new scenes, indicating a deeper understanding of the objects and context in the images. Moreover, it learns how to express that knowledge in natural-sounding English phrases despite receiving no additional language training other than reading the human captions.
Our model generates a completely new caption using concepts learned from similar scenes in the training set
We hope that sharing this model in TensorFlow will help push forward image captioning research and applications, and will also
allow interested people to learn and have fun. To get started training your own image captioning system, and for more details on the neural network architecture, navigate to the model’s home-page here. While our system uses the Inception V3 image classification model, you could even try training our system with the recently released Inception-ResNet-v2 model to see if it can do even better!
AI (Artificial intelligence) is a subfield of computer science that was created in the 1960s, and it was/is concerned with solving tasks that are easy for humans but hard for computers. In particular, a so-called Strong AI would be a system that can do anything a human can (perhaps without purely physical things). This is fairly generic and includes all kinds of tasks such as
given some AI problem that can be described in discrete terms (e.g. out of a particular set of actions, which one is the right one), and
given a lot of information about the world,
figure out what is the “correct” action, without having the programmer program it in.
Typically some outside process is needed to judge whether the action was correct or not.
In mathematical terms, it’s a function: you feed in some input, and you want it to to produce the right output, so the whole problem is simply to build a model of this mathematical function in some automatic way. To draw a distinction with AI, if I can write a very clever program that has human-like behavior, it can be AI, but unless its parameters are automatically learned from data, it’s not machine learning.
Deep learning is one kind of machine learning that’s very popular now. It involves a particular kind of mathematical model that can be thought of as a composition of simple blocks (function composition) of a certain type, and where some of these blocks can be adjusted to better predict the final outcome.
The word “deep” means that the composition has many of these blocks stacked on top of each other, and the tricky bit is how to adjust the blocks that are far from the output, since a small change there can have very indirect effects on the output. This is done via something called Backpropagation inside of a larger process called Gradient descent which lets you change the parameters in a way that improves your model.
This post presents WaveNet, a deep generative model of raw audio waveforms. We show that WaveNets are able to generate speech which mimics any human voice and which sounds more natural than the best existing Text-to-Speech systems, reducing the gap with human performance by over 50%.
We also demonstrate that the same network can be used to synthesize other audio signals such as music, and present some striking samples of automatically generated piano pieces.
Allowing people to converse with machines is a long-standing dream of human-computer interaction. The ability of computers to understand natural speech has been revolutionised in the last few years by the application of deep neural networks (e.g.,Google Voice Search). However, generating speech with computers — a process usually referred to as speech synthesis or text-to-speech (TTS) — is still largely based on so-called concatenative TTS, where a very large database of short speech fragments are recorded from a single speaker and then recombined to form complete utterances. This makes it difficult to modify the voice (for example switching to a different speaker, or altering the emphasis or emotion of their speech) without recording a whole new database.
This has led to a great demand for parametric TTS, where all the information required to generate the data is stored in the parameters of the model, and the contents and characteristics of the speech can be controlled via the inputs to the model. So far, however, parametric TTS has tended to sound less natural than concatenative, at least for syllabic languages such as English. Existing parametric models typically generate audio signals by passing their outputs through signal processing algorithms known asvocoders.
WaveNet changes this paradigm by directly modelling the raw waveform of the audio signal, one sample at a time. As well as yielding more natural-sounding speech, using raw waveforms means that WaveNet can model any kind of audio, including music.
Researchers usually avoid modelling raw audio because it ticks so quickly: typically 16,000 samples per second or more, with important structure at many time-scales. Building a completely autoregressive model, in which the prediction for every one of those samples is influenced by all previous ones (in statistics-speak, each predictive distribution is conditioned on all previous observations), is clearly a challenging task.
However, our PixelRNN and PixelCNN models, published earlier this year, showed that it was possible to generate complex natural images not only one pixel at a time, but one colour-channel at a time, requiring thousands of predictions per image. This inspired us to adapt our two-dimensional PixelNets to a one-dimensional WaveNet.
The above animation shows how a WaveNet is structured. It is a fully convolutional neural network, where the convolutional layers have various dilation factors that allow its receptive field to grow exponentially with depth and cover thousands of timesteps.At training time, the input sequences are real waveforms recorded from human speakers. After training, we can sample the network to generate synthetic utterances. At each step during sampling a value is drawn from the probability distribution computed by the network. This value is then fed back into the input and a new prediction for the next step is made. Building up samples one step at a time like this is computationally expensive, but we have found it essential for generating complex, realistic-sounding audio.
Improving the State of the Art
We trained WaveNet using some of Google’s TTS datasets so we could evaluate its performance. The following figure shows the quality of WaveNets on a scale from 1 to 5, compared with Google’s current best TTS systems (parametric and concatenative), and with human speech using Mean Opinion Scores (MOS). MOS are a standard measure for subjective sound quality tests, and were obtained in blind tests with human subjects (from over 500 ratings on 100 test sentences). As we can see, WaveNets reduce the gap between the state of the art and human-level performance by over 50% for both US English and Mandarin Chinese.
For both Chinese and English, Google’s current TTS systems are considered among the best worldwide, so improving on both with a single model is a major achievement.
Here are some samples from all three systems so you can listen and compare yourself:
Knowing What to Say
In order to use WaveNet to turn text into speech, we have to tell it what the text is. We do this by transforming the text into a sequence of linguistic and phonetic features (which contain information about the current phoneme, syllable, word, etc.) and by feeding it into WaveNet. This means the network’s predictions are conditioned not only on the previous audio samples, but also on the text we want it to say.
If we train the network without the text sequence, it still generates speech, but now it has to make up what to say. As you can hear from the samples below, this results in a kind of babbling, where real words are interspersed with made-up word-like sounds:
Notice that non-speech sounds, such as breathing and mouth movements, are also sometimes generated by WaveNet; this reflects the greater flexibility of a raw-audio model.
As you can hear from these samples, a single WaveNet is able to learn the characteristics of many different voices, male and female. To make sure it knew which voice to use for any given utterance, we conditioned the network on the identity of the speaker. Interestingly, we found that training on many speakers made it better at modelling a single speaker than training on that speaker alone, suggesting a form of transfer learning.
By changing the speaker identity, we can use WaveNet to say the same thing in different voices:
Similarly, we could provide additional inputs to the model, such as emotions or accents, to make the speech even more diverse and interesting.
Since WaveNets can be used to model any audio signal, we thought it would also be fun to try to generate music. Unlike the TTS experiments, we didn’t condition the networks on an input sequence telling it what to play (such as a musical score); instead, we simply let it generate whatever it wanted to. When we trained it on a dataset of classical piano music, it produced fascinating samples like the ones below:
WaveNets open up a lot of possibilities for TTS, music generation and audio modelling in general. The fact that directly generating timestep per timestep with deep neural networks works at all for 16kHz audio is really surprising, let alone that it outperforms state-of-the-art TTS systems. We are excited to see what we can do with them next.
The introduction of graphene seemed to take the final bit of luster off of carbon nanotubes’ shine, but the material, which researchers have been using to make transistors for over 20 years, has experienced a renaissance of late.
“This achievement has been a dream of nanotechnology for the last 20 years,” said Michael Arnold, a professor at UW-Madison, in a press release. “Making carbon nanotube transistors that are better than silicon transistors is a big milestone,” Arnold added. “[It’s] a critical advance toward exploiting carbon nanotubes in logic, high-speed communications, and other semiconductor electronics technologies.”
In research described in the journal Science Advances, the UW-Madison researchers were able to achieve a current that is 1.9 times as fast as that seen in silicon transistors. The measure of how rapidly the current that can travel through the channel between a transistor’s source and drain determines how fast the circuit is. The more current there is, the more quickly the gate of the next device in the circuit can be charged .
The key to getting the nanotubes to create such a fast transistor was a new process that employs polymers to sort between the metallic and semiconducting SWCNTs to create an ultra-high purity of solution.
“We’ve identified specific conditions in which you can get rid of nearly all metallic nanotubes, [leaving] less than 0.01 percent metallic nanotubes [in a sample],” said Arnold.
The researchers had already tackled the problem of aligning and placing the nanotubes on a wafer two years ago when they developed a process they dubbed “floating evaporative self-assembly.” That technique uses a hydrophobic substrate and partially submerges it in water. Then the SWCNTs are deposited on its surface and the substrate removed vertically from the water.
“In our research, we’ve shown that we can simultaneously overcome all of these challenges of working with nanotubes, and that has allowed us to create these groundbreaking carbon nanotube transistors that surpass silicon and gallium arsenide transistors,” said Arnold.
In the video below, Arnold provides a little primer on SWCNTs and what his group’s research with them could mean to the future of electronics.
In continuing research, the UW-Madison team will be aiming to replicate the manufacturability of silicon transistors. To date, they have managed to scale their alignment and deposition process to 1-inch-by-1-inch wafers; the longer-term goal is to bring this up to commercial scales.
Arnold added: “There has been a lot of hype about carbon nanotubes that hasn’t been realized, and that has kind of soured many people’s outlook. But we think the hype is deserved. It has just taken decades of work for the materials science to catch up and allow us to effectively harness these materials.”
Machines are getting smarter every day—and that is both good and terrifying.
Scientists at the University of Sheffield have come up with a way for machines to learn just by looking. They don’t need to be told what to look for—they can just learn how a system works by observing it. The method is called Turing Learning and is inspired by Alan Turing’s famous test.
For a computer to learn, usually it has to be told what to look for. For instance, if you wanted to teach a robot to paint like Picasso, you’d train software to mimic real Picasso paintings. “Someone would have to tell the algorithms what is considered similar to a Picasso to begin with,” says Roderick Gross, in a news release.
Turing Learning would not require such prior knowledge, he says. It would use two computer systems, plus the original “system” you’re investigating: a shoal of fish, a Picasso painting, anything. One of the computer systems tries to copy the real-world system as closely as possible. The other computer is an observer. Its task is to watch the goings-on and try to discern which of the systems is real, and which is the copy. If it guesses right, it gets a reward. At the same time, the counterfeit system is rewarded if it fools the observer.
Proceeding like this, the counterfeit models get better and better, and the observer works out how to distinguish real from fake to a more and more accurate degree. In the end, it can not only tell real from fake, but it has also—almost as a by-product of the process—created a precise model of how the genuine system works.
The experiment is named after Alan Turing‘s famous test for artificial intelligence, which says that if a computer program can fool a human observer into believing it is a real person, then it can be considered intelligent. In reality this never really works, as a) convincing a person that you’re another person isn’t a guarantee of intelligence, and b) many computer programs have simply been designed to game the human observers.
Turing Learning, though, is actually practical. It can be used to teach robots certain behaviors, but perhaps more useful is the categorization it performs. Set a Turing Learning machine loose on a swarm of insects, for instance, and it could tease out details in the behavior of a bee colony that remain invisible to humans.
The systems can also be used to recognize abnormal behavior, without first teaching the system what constitutes abnormal behavior. The possibilities here are huge, because noticing oddities in otherwise uniform behavior is something we humans can be terrible at. Look at airport security, for example and how often TSA agents miss guns, explosives, and other weapons.
The technique could also be used in video games to make the virtual players act more like real human players to monitor livestock for odd behaviors that might signal health problems, and for security purposes like lie detection.
In some ways, the technology is terrifying, as computers are able to get to the very basics of how things behave. On the other hand, they still need to be told what to do with that knowledge, so at least there’s something for us puny humans to do in the world of the future.
Python has lots of scientific, data analysis, and machine learning libraries. But there are many problems when starting out on a machine learning project. Which library do you use? How can you use a model that has been trained in your production app? In this talk I will discuss how you can use TensorFlow to create Deep Learning applications and how to deploy them into production.
Python has lots of scientific, data analysis, and machine learning libraries. But there are many problems when starting out on a machine learning project. Which library do you use? How do they compare to each other? How can you use a model that has been trained in your production application?
TensorFlow is a new Open-Source framework created at Google for building Deep Learning applications.Tensorflow allows you to construct easy to understand data flow graphs in Python which form a mathematical and logical pipeline. Creating data flow graphs allow easier visualization of complicated algorithms as well as running the training operations over multiple hardware GPUs in parallel.
In this talk I will discuss how you can use TensorFlow to create Deep Learning applications. I will discuss how it compares to other Python machine learning libraries like Theano or Chainer. Finally, I will discuss how trained TensorFlow models could be deployed into a production system using TensorFlow Serve.
by Kaz Sato, Developer Advocate, Google Cloud Platform
August 31, 2016
It’s not hyperbole to say that use cases for machine learning and deep learning are only limited by our imaginations. About one year ago, a former embedded systems designer from the Japanese automobile industry named Makoto Koike started helping out at his parents’ cucumber farm, and was amazed by the amount of work it takes to sort cucumbers by size, shape, color and other attributes.
Makoto’s father is very proud of his thorny cucumber, for instance, having dedicated his life to delivering fresh and crispy cucumbers, with many prickles still on them. Straight and thick cucumbers with a vivid color and lots of prickles are considered premium grade and command much higher prices on the market.
But Makoto learned very quickly that sorting cucumbers is as hard and tricky as actually growing them. “Each cucumber has different color, shape, quality and freshness,” Makoto says.
Cucumbers from retail stores
Cucumbers from Makoto’s farm
In Japan, each farm has its own classification standard and there’s no industry standard. At Makoto’s farm, they sort them into nine different classes, and his mother sorts them all herself — spending up to eight hours per day at peak harvesting times.
“The sorting work is not an easy task to learn. You have to look at not only the size and thickness, but also the color, texture, small scratches, whether or not they are crooked and whether they have prickles. It takes months to learn the system and you can’t just hire part-time workers during the busiest period. I myself only recently learned to sort cucumbers well,” Makoto said.
Distorted or crooked cucumbers are ranked as low-quality product
There are also some automatic sorters on the market, but they have limitations in terms of performance and cost, and small farms don’t tend to use them.
Makoto doesn’t think sorting is an essential task for cucumber farmers. “Farmers want to focus and spend their time on growing delicious vegetables. I’d like to automate the sorting tasks before taking the farm business over from my parents.“
Makoto Koike, center, with his parents at the family cucumber farm
Makoto Koike, family cucumber farm
The many uses of deep learning
Makoto first got the idea to explore machine learning for sorting cucumbers from a completely different use case: Google AlphaGo competing with the world’s top professional Go player.
“When I saw the Google’s AlphaGo, I realized something really serious is happening here,” said Makoto. “That was the trigger for me to start developing the cucumber sorter with deep learning technology.“
Using deep learning for image recognition allows a computer to learn from a training data set what the important “features” of the images are. By using a hierarchy of numerous artificial neurons, deep learning can automatically classify images with a high degree of accuracy. Thus, neural networks can recognize different species of cats, or models of cars or airplanes from images. Sometimes neural networks can exceed the performance of the human eye for certain applications. (For more information, check out my previous blog post Understanding neural networks with TensorFlow Playground.)
TensorFlow democratizes the power of deep learning
But can computers really learn mom’s art of cucumber sorting? Makoto set out to see whether he could use deep learning technology for sorting using Google’s open source machine learning library, TensorFlow.
“Google had just open sourced TensorFlow, so I started trying it out with images of my cucumbers,” Makoto said. “This was the first time I tried out machine learning or deep learning technology, and right away got much higher accuracy than I expected. That gave me the confidence that it could solve my problem.“
With TensorFlow, you don’t need to be knowledgeable about the advanced math models and optimization algorithms needed to implement deep neural networks. Just download the sample code and read the tutorials and you can get started in no time. The library lowers the barrier to entry for machine learning significantly, and since Google open-sourced TensorFlow last November, many “non ML” engineers have started playing with the technology with their own datasets and applications.
Cucumber sorting system design
Here’s a systems diagram of the cucumber sorter that Makoto built. The system uses Raspberry Pi 3 as the main controller to take images of the cucumbers with a camera, and
in a first phase, runs a small-scale neural network on TensorFlow to detect whether or not the image is of a cucumber.
It then forwards the image to a larger TensorFlow neural network running on a Linux server to perform a more detailed classification.
Systems diagram of the cucumber sorter
Makoto used the sample TensorFlow code Deep MNIST for Experts with minor modifications to the convolution, pooling and last layers, changing the network design to adapt to the pixel format of cucumber images and the number of cucumber classes.
Here’s Makoto’s cucumber sorter, which went live in July:
Here’s a close-up of the sorting arm, and the camera interface:
And here is the cucumber sorter in action:
Pushing the limits of deep learning
One of the current challenges with deep learning is that you need to have a large number of training datasets. To train the model, Makoto spent about three months taking 7,000 pictures of cucumbers sorted by his mother, but it’s probably not enough.
“When I did a validation with the test images, the recognition accuracy exceeded 95%. But if you apply the system with real use cases, the accuracy drops down to about 70%. I suspect the neural network model has the issue of “overfitting” (the phenomenon in neural network where the model is trained to fit only to the small training dataset) because of the insufficient number of training images.“
The second challenge of deep learning is that it consumes a lot of computing power. The current sorter uses a typical Windows desktop PC to train the neural network model. Although it converts the cucumber image into 80 x 80 pixel low-resolution images, it still takes two to three days to complete training the model with 7,000 images.
“Even with this low-res image, the system can only classify a cucumber based on its shape, length and level of distortion. It can’t recognize color, texture, scratches and prickles,” Makoto explained. Increasing image resolution by zooming into the cucumber would result in much higher accuracy, but would also increase the training time significantly.
To improve deep learning, some large enterprises have started doing large-scale distributed training, but those servers come at an enormous cost. Google offers Cloud Machine Learning (Cloud ML), a low-cost cloud platform for training and prediction that dedicates hundreds of cloud servers to training a network with TensorFlow. With Cloud ML, Google handles building a large-scale cluster for distributed training, and you just pay for what you use, making it easier for developers to try out deep learning without making a significant capital investment.
These specialized servers were used in the AlphaGo match
Makoto is eagerly awaiting Cloud ML. “I could use Cloud ML to try training the model with much higher resolution images and more training data. Also, I could try changing the various configurations, parameters and algorithms of the neural network to see how that improves accuracy. I can’t wait to try it.“
This is definitely not a scene from “A Clockwork Orange.” Allen Brain Observatory
As the mice watched a computer screen, their glowing neurons pulsed through glass windows in their skulls.
Using a device called a two-photon microscope, researchers at the Allen Institute for Brain Sciencecould peer through those windows and record, layer by layer, the workings of their little minds.
The result, announced July 13, is a real-time record of the visual cortex — a brain region shared in similar form across mammalian species — at work. The data set that emerged is so massive and complete that its creators have named it the Allen Brain Observatory.
Bred for the lab, the mice were genetically modified so that specific cells in their brains would fluoresce when they became active. Researchers had installed the brain-windows surgically, slicing away tiny chunks of the rodents’ skulls and replacing them with five-millimeter skylights.
Sparkling neurons of the mouse visual cortex shone through the glass as images and short films flashed across the screen. Each point of light the researchers saw translated, with hours of careful processing, into data:
Which cell lit up?
Where in the brain?
How long did it glow?
What was the mouse doing at the time?
What was on the screen?
The researchers imaged the neurons in small groups, building a map of one microscopic layer before moving down to the next. When they were finished, the activities of 18,000 cells from several dozen mice were recorded in their database.
The problem the Brain Observatory wants to solve is straightforward. Science still does not understand the brain’s underlying code very well, and individual studies may turn up odd results that are difficult to interpret in the context of the whole brain.
A decade ago, for example, a widely-reported study appeared to find a single neuron in a human brain that always — and only — winked on when presented with images of Halle Berry. Few scientists suggested that this single cell actually stored the subject’s whole knowledge of Berry’s face. But without more context about what the cells around it were doing, a more complete explanation remained out of reach.
“When you’re listening to a cell with an electrode, all you’re hearing is [its activity level] spiking,” said Shawn Olsen, another researcher on the project. “And you don’t know where exactly that cell is, you don’t know its precise location, you don’t know its shape, you don’t know who it connects to.“
Imagine trying to assemble a complete understanding of a computer given only facts like under certain circumstances, clicking the mouse makes lights on the printer blink.
To get beyond that kind of feeling around in the dark, the Allen Institute has taken what Olsen calls an “industrial” approach to mapping out the brain’s activity.
“Our goal is to systematically march through the different cortical layers, and the different cell types, and the different areas of the cortex to produce a systematic, mostly comprehensive survey of the activity,” Olsen explained. “It doesn’t just describe how one cell type is responding or one particular area, but characterizes as much as we can a complete population of cells that will allow us to draw inferences that you couldn’t describe if you were just looking at one cell at a time.“
In other words, this project makes its impact through the grinding power of time and effort.
A visualization of cells examined in the project. Allen Brain Observatory
Researchers showed the mice moving horizontal or vertical lines, light and dark dots on a surface, natural scenes, and even clips from Hollywood movies.
The more abstract displays target how the mind sees and interprets light and dark, lines, and motion, building on existing neuroscience.Researchers have known for decades that particular cells appear to correspond to particular kinds of motion or shape, or positions in the visual field. This research helps them place the activity of those cells in context.
One of the most obvious results was that the brain is noisy, messy, and confusing.
“Even though we showed the same image, we could get dramatically different responses from the same cell. On one trial it may have a strong response, on another it may have a weak response,” Olsen said.
All that noise in their data is one of the things that differentiates it from a typical study, de Vries said.
“If you’re inserting an electrode you’re going to keep advancing until you find a cell that kind of responds the way you want it to,” he said. “By doing a survey like this we’re going to see a lot of cells that don’t respond to the stimuli in the way that we think they should. We’re realizing that the cartoon model that we have of the cortex isn’t completely accurate.“
Olsen said they suspect a lot of that noise emerges from whatever the mouse is thinking about or doing that has nothing to do with what’s on screen. They recorded videos of the mice during data collection to help researchers combing their data learn more about those effects.
The best evidence for this suspicion? When they showed the mice more interesting visuals, like pictures of animals or clips from the film “Touch of Evil,” the neurons behaved much more consistently.
“We would present each [clip] ten different times,” de Vries said. “And we can see from trial to trial many cells at certain times almost always respond — reliable, repeatable, robust responses.“
In other words, it appears the mice were paying attention.
Allen Brain Observatory
The Brain Observatory was turned loose on the internet Wednesday, with its data available for researchers and the public to comb through, explore, and maybe critique.
But the project isn’t over.
In the next year-and-a-half, the researchers intend to add more types of cells and more regions of the visual cortex to their observatory. And their long-term ambitions are even grander.
“Ultimately,” Olson said,”we want to understand how this visual information in the mouse’s brain gets used to guide behavior and memory and cognition.“
Right now, the mice just watch screens. But by training them to perform tasks based on what they see, he said they hope to crack the mysteries of memory, decision-making, and problem-solving. Another parallel observatory created using electrode arrays instead of light through windows will add new levels of richness to their data.
So the underlying code of mouse — and human — brains remains largely a mystery, but the map that we’ll need to unlock it grows richer by the day.