Established to study and formulate best practices on AI technologies, to advance the public’s understanding of AI, and to serve as an open platform for discussion and engagement about AI and its influences on people and society.
INDUSTRY LEADERS ESTABLISH PARTNERSHIP ON AI BEST PRACTICES
Press ReleasesSeptember 28, 2016 NEW YORK — IBM, DeepMind,/Google, Microsoft, Amazon, and Facebook today announced that they will create a non-profit organization that will work to advance public understanding of artificial intelligence technologies (AI) and formulate best practices on the challenges and opportunities within the field. Academics, non-profits, and specialists in policy and ethics will be invited to join the Board of the organization, named the Partnership on Artificial Intelligence to Benefit People and Society (Partnership on AI).
The objective of the Partnership on AI is to address opportunities and challenges with AI technologies to benefit people and society. Together, the organization’s members will conduct research, recommend best practices, and publish research under an open license in areas such as ethics, fairness, and inclusivity; transparency, privacy, and interoperability; collaboration between people and AI systems; and the trustworthiness, reliability, and robustness of the technology. It does not intend to lobby government or other policymaking bodies.
The organization’s founding members will each contribute financial and research resources to the partnership and will share leadership with independent third-parties, including academics, user group advocates, and industry domain experts. There will be equal representation of corporate and non-corporate members on the board of this new organization. The Partnership is in discussions with professional and scientific organizations, such as the Association for the Advancement of Artificial Intelligence (AAAI), as well as non-profit research groups including the Allen Institute for Artificial Intelligence (AI2), and anticipates announcements regarding additional participants in the near future.
AI technologies hold tremendous potential to improve many aspects of life, ranging from healthcare, education, and manufacturing to home automation and transportation. Through rigorous research, the development of best practices, and an open and transparent dialogue, the founding members of the Partnership on AI hope to maximize this potential and ensure it benefits as many people as possible.
So what’s new?
Our 2014 system used the Inception V1image classification model to initialize the image encoder, which
produces the encodings that are useful for recognizing different objects in the images. This was the best image model available at the time, achieving 89.6% top-5 accuracy on the benchmark ImageNet 2012 image classification task. We replaced this in 2015 with the newer Inception V2image classification model, which achieves 91.8% accuracy on the same task.The improved vision component gave our captioning system an accuracy boost of 2 points in the BLEU-4 metric (which is commonly used in machine translation to evaluate the quality of generated sentences) and was an important factor of its success in the captioning challenge.Today’s code release initializes the image encoder using the Inception V3model, which achieves 93.9% accuracy on the ImageNet classification task. Initializing the image encoder with a better vision model gives the image captioning system a better ability to recognize different objects in the images, allowing it to generate more detailed and accurate descriptions. This gives an additional 2 points of improvement in the BLEU-4 metric over the system used in the captioning challenge.Another key improvement to the vision component comes from fine-tuning the image model. This step addresses the problem that the image encoder is initialized by a model trained to classify objects in images, whereas the goal of the captioning system is to describe the objects in images using the encodings produced by the image model. For example, an image classification model will tell you that a dog, grass and a frisbee are in the image, but a natural description should also tell you the color of the grass and how the dog relates to the frisbee. In the fine-tuning phase, the captioning system is improved by jointly training its vision and language components on human generated captions. This allows the captioning system to transfer information from the image that is specifically useful for generating descriptive captions, but which was not necessary for classifying objects. In particular, after fine-tuning it becomes better at correctly describing the colors of objects. Importantly, the fine-tuning phase must occur after the language component has already learned to generate captions – otherwise, the noisiness of the randomly initialized language component causes irreversible corruption to the vision component. For more details, read the full paper here.
Left: the better image model allows the captioning model to generate more detailed and accurate descriptions. Right: after fine-tuning the image model, the image captioning system is more likely to describe the colors of objects correctly.
Until recently our image captioning system was implemented in the DistBelief software framework. The TensorFlow implementation released today achieves the same level of accuracy with significantly faster performance: time per training step
is just 0.7 seconds in TensorFlow compared to 3 seconds in DistBelief on an Nvidia K20 GPU, meaning that total training time is just 25% of the time previously required.A natural question is whether our captioning system can generate novel descriptions of previously unseen contexts and interactions. The system is trained by showing it hundreds of thousands of images that were captioned manually by humans, and it often re-uses human captions when presented with scenes similar to what it’s seen before.
When the model is presented with scenes similar to what it’s seen before, it will often re-use human generated captions.
So does it really understand the objects and their interactions in each image? Or does it always regurgitate descriptions from the training data? Excitingly, our model does indeed develop the ability to generate accurate new captions when presented with completely new scenes, indicating a deeper understanding of the objects and context in the images. Moreover, it learns how to express that knowledge in natural-sounding English phrases despite receiving no additional language training other than reading the human captions.
Our model generates a completely new caption using concepts learned from similar scenes in the training set
We hope that sharing this model in TensorFlow will help push forward image captioning research and applications, and will also
allow interested people to learn and have fun. To get started training your own image captioning system, and for more details on the neural network architecture, navigate to the model’s home-page here. While our system uses the Inception V3 image classification model, you could even try training our system with the recently released Inception-ResNet-v2 model to see if it can do even better!
AI (Artificial intelligence) is a subfield of computer science that was created in the 1960s, and it was/is concerned with solving tasks that are easy for humans but hard for computers. In particular, a so-called Strong AI would be a system that can do anything a human can (perhaps without purely physical things). This is fairly generic and includes all kinds of tasks such as
given some AI problem that can be described in discrete terms (e.g. out of a particular set of actions, which one is the right one), and
given a lot of information about the world,
figure out what is the “correct” action, without having the programmer program it in.
Typically some outside process is needed to judge whether the action was correct or not.
In mathematical terms, it’s a function: you feed in some input, and you want it to to produce the right output, so the whole problem is simply to build a model of this mathematical function in some automatic way. To draw a distinction with AI, if I can write a very clever program that has human-like behavior, it can be AI, but unless its parameters are automatically learned from data, it’s not machine learning.
Deep learning is one kind of machine learning that’s very popular now. It involves a particular kind of mathematical model that can be thought of as a composition of simple blocks (function composition) of a certain type, and where some of these blocks can be adjusted to better predict the final outcome.
The word “deep” means that the composition has many of these blocks stacked on top of each other, and the tricky bit is how to adjust the blocks that are far from the output, since a small change there can have very indirect effects on the output. This is done via something called Backpropagation inside of a larger process called Gradient descent which lets you change the parameters in a way that improves your model.
This post presents WaveNet, a deep generative model of raw audio waveforms. We show that WaveNets are able to generate speech which mimics any human voice and which sounds more natural than the best existing Text-to-Speech systems, reducing the gap with human performance by over 50%.
We also demonstrate that the same network can be used to synthesize other audio signals such as music, and present some striking samples of automatically generated piano pieces.
Allowing people to converse with machines is a long-standing dream of human-computer interaction. The ability of computers to understand natural speech has been revolutionised in the last few years by the application of deep neural networks (e.g.,Google Voice Search). However, generating speech with computers — a process usually referred to as speech synthesis or text-to-speech (TTS) — is still largely based on so-called concatenative TTS, where a very large database of short speech fragments are recorded from a single speaker and then recombined to form complete utterances. This makes it difficult to modify the voice (for example switching to a different speaker, or altering the emphasis or emotion of their speech) without recording a whole new database.
This has led to a great demand for parametric TTS, where all the information required to generate the data is stored in the parameters of the model, and the contents and characteristics of the speech can be controlled via the inputs to the model. So far, however, parametric TTS has tended to sound less natural than concatenative, at least for syllabic languages such as English. Existing parametric models typically generate audio signals by passing their outputs through signal processing algorithms known asvocoders.
WaveNet changes this paradigm by directly modelling the raw waveform of the audio signal, one sample at a time. As well as yielding more natural-sounding speech, using raw waveforms means that WaveNet can model any kind of audio, including music.
Researchers usually avoid modelling raw audio because it ticks so quickly: typically 16,000 samples per second or more, with important structure at many time-scales. Building a completely autoregressive model, in which the prediction for every one of those samples is influenced by all previous ones (in statistics-speak, each predictive distribution is conditioned on all previous observations), is clearly a challenging task.
However, our PixelRNN and PixelCNN models, published earlier this year, showed that it was possible to generate complex natural images not only one pixel at a time, but one colour-channel at a time, requiring thousands of predictions per image. This inspired us to adapt our two-dimensional PixelNets to a one-dimensional WaveNet.
The above animation shows how a WaveNet is structured. It is a fully convolutional neural network, where the convolutional layers have various dilation factors that allow its receptive field to grow exponentially with depth and cover thousands of timesteps.At training time, the input sequences are real waveforms recorded from human speakers. After training, we can sample the network to generate synthetic utterances. At each step during sampling a value is drawn from the probability distribution computed by the network. This value is then fed back into the input and a new prediction for the next step is made. Building up samples one step at a time like this is computationally expensive, but we have found it essential for generating complex, realistic-sounding audio.
Improving the State of the Art
We trained WaveNet using some of Google’s TTS datasets so we could evaluate its performance. The following figure shows the quality of WaveNets on a scale from 1 to 5, compared with Google’s current best TTS systems (parametric and concatenative), and with human speech using Mean Opinion Scores (MOS). MOS are a standard measure for subjective sound quality tests, and were obtained in blind tests with human subjects (from over 500 ratings on 100 test sentences). As we can see, WaveNets reduce the gap between the state of the art and human-level performance by over 50% for both US English and Mandarin Chinese.
For both Chinese and English, Google’s current TTS systems are considered among the best worldwide, so improving on both with a single model is a major achievement.
Here are some samples from all three systems so you can listen and compare yourself:
Knowing What to Say
In order to use WaveNet to turn text into speech, we have to tell it what the text is. We do this by transforming the text into a sequence of linguistic and phonetic features (which contain information about the current phoneme, syllable, word, etc.) and by feeding it into WaveNet. This means the network’s predictions are conditioned not only on the previous audio samples, but also on the text we want it to say.
If we train the network without the text sequence, it still generates speech, but now it has to make up what to say. As you can hear from the samples below, this results in a kind of babbling, where real words are interspersed with made-up word-like sounds:
Notice that non-speech sounds, such as breathing and mouth movements, are also sometimes generated by WaveNet; this reflects the greater flexibility of a raw-audio model.
As you can hear from these samples, a single WaveNet is able to learn the characteristics of many different voices, male and female. To make sure it knew which voice to use for any given utterance, we conditioned the network on the identity of the speaker. Interestingly, we found that training on many speakers made it better at modelling a single speaker than training on that speaker alone, suggesting a form of transfer learning.
By changing the speaker identity, we can use WaveNet to say the same thing in different voices:
Similarly, we could provide additional inputs to the model, such as emotions or accents, to make the speech even more diverse and interesting.
Since WaveNets can be used to model any audio signal, we thought it would also be fun to try to generate music. Unlike the TTS experiments, we didn’t condition the networks on an input sequence telling it what to play (such as a musical score); instead, we simply let it generate whatever it wanted to. When we trained it on a dataset of classical piano music, it produced fascinating samples like the ones below:
WaveNets open up a lot of possibilities for TTS, music generation and audio modelling in general. The fact that directly generating timestep per timestep with deep neural networks works at all for 16kHz audio is really surprising, let alone that it outperforms state-of-the-art TTS systems. We are excited to see what we can do with them next.
The introduction of graphene seemed to take the final bit of luster off of carbon nanotubes’ shine, but the material, which researchers have been using to make transistors for over 20 years, has experienced a renaissance of late.
“This achievement has been a dream of nanotechnology for the last 20 years,” said Michael Arnold, a professor at UW-Madison, in a press release. “Making carbon nanotube transistors that are better than silicon transistors is a big milestone,” Arnold added. “[It’s] a critical advance toward exploiting carbon nanotubes in logic, high-speed communications, and other semiconductor electronics technologies.”
In research described in the journal Science Advances, the UW-Madison researchers were able to achieve a current that is 1.9 times as fast as that seen in silicon transistors. The measure of how rapidly the current that can travel through the channel between a transistor’s source and drain determines how fast the circuit is. The more current there is, the more quickly the gate of the next device in the circuit can be charged .
The key to getting the nanotubes to create such a fast transistor was a new process that employs polymers to sort between the metallic and semiconducting SWCNTs to create an ultra-high purity of solution.
“We’ve identified specific conditions in which you can get rid of nearly all metallic nanotubes, [leaving] less than 0.01 percent metallic nanotubes [in a sample],” said Arnold.
The researchers had already tackled the problem of aligning and placing the nanotubes on a wafer two years ago when they developed a process they dubbed “floating evaporative self-assembly.” That technique uses a hydrophobic substrate and partially submerges it in water. Then the SWCNTs are deposited on its surface and the substrate removed vertically from the water.
“In our research, we’ve shown that we can simultaneously overcome all of these challenges of working with nanotubes, and that has allowed us to create these groundbreaking carbon nanotube transistors that surpass silicon and gallium arsenide transistors,” said Arnold.
In the video below, Arnold provides a little primer on SWCNTs and what his group’s research with them could mean to the future of electronics.
In continuing research, the UW-Madison team will be aiming to replicate the manufacturability of silicon transistors. To date, they have managed to scale their alignment and deposition process to 1-inch-by-1-inch wafers; the longer-term goal is to bring this up to commercial scales.
Arnold added: “There has been a lot of hype about carbon nanotubes that hasn’t been realized, and that has kind of soured many people’s outlook. But we think the hype is deserved. It has just taken decades of work for the materials science to catch up and allow us to effectively harness these materials.”
Machines are getting smarter every day—and that is both good and terrifying.
Scientists at the University of Sheffield have come up with a way for machines to learn just by looking. They don’t need to be told what to look for—they can just learn how a system works by observing it. The method is called Turing Learning and is inspired by Alan Turing’s famous test.
For a computer to learn, usually it has to be told what to look for. For instance, if you wanted to teach a robot to paint like Picasso, you’d train software to mimic real Picasso paintings. “Someone would have to tell the algorithms what is considered similar to a Picasso to begin with,” says Roderick Gross, in a news release.
Turing Learning would not require such prior knowledge, he says. It would use two computer systems, plus the original “system” you’re investigating: a shoal of fish, a Picasso painting, anything. One of the computer systems tries to copy the real-world system as closely as possible. The other computer is an observer. Its task is to watch the goings-on and try to discern which of the systems is real, and which is the copy. If it guesses right, it gets a reward. At the same time, the counterfeit system is rewarded if it fools the observer.
Proceeding like this, the counterfeit models get better and better, and the observer works out how to distinguish real from fake to a more and more accurate degree. In the end, it can not only tell real from fake, but it has also—almost as a by-product of the process—created a precise model of how the genuine system works.
The experiment is named after Alan Turing‘s famous test for artificial intelligence, which says that if a computer program can fool a human observer into believing it is a real person, then it can be considered intelligent. In reality this never really works, as a) convincing a person that you’re another person isn’t a guarantee of intelligence, and b) many computer programs have simply been designed to game the human observers.
Turing Learning, though, is actually practical. It can be used to teach robots certain behaviors, but perhaps more useful is the categorization it performs. Set a Turing Learning machine loose on a swarm of insects, for instance, and it could tease out details in the behavior of a bee colony that remain invisible to humans.
The systems can also be used to recognize abnormal behavior, without first teaching the system what constitutes abnormal behavior. The possibilities here are huge, because noticing oddities in otherwise uniform behavior is something we humans can be terrible at. Look at airport security, for example and how often TSA agents miss guns, explosives, and other weapons.
The technique could also be used in video games to make the virtual players act more like real human players to monitor livestock for odd behaviors that might signal health problems, and for security purposes like lie detection.
In some ways, the technology is terrifying, as computers are able to get to the very basics of how things behave. On the other hand, they still need to be told what to do with that knowledge, so at least there’s something for us puny humans to do in the world of the future.
Python has lots of scientific, data analysis, and machine learning libraries. But there are many problems when starting out on a machine learning project. Which library do you use? How can you use a model that has been trained in your production app? In this talk I will discuss how you can use TensorFlow to create Deep Learning applications and how to deploy them into production.
Python has lots of scientific, data analysis, and machine learning libraries. But there are many problems when starting out on a machine learning project. Which library do you use? How do they compare to each other? How can you use a model that has been trained in your production application?
TensorFlow is a new Open-Source framework created at Google for building Deep Learning applications.Tensorflow allows you to construct easy to understand data flow graphs in Python which form a mathematical and logical pipeline. Creating data flow graphs allow easier visualization of complicated algorithms as well as running the training operations over multiple hardware GPUs in parallel.
In this talk I will discuss how you can use TensorFlow to create Deep Learning applications. I will discuss how it compares to other Python machine learning libraries like Theano or Chainer. Finally, I will discuss how trained TensorFlow models could be deployed into a production system using TensorFlow Serve.
by Kaz Sato, Developer Advocate, Google Cloud Platform
August 31, 2016
It’s not hyperbole to say that use cases for machine learning and deep learning are only limited by our imaginations. About one year ago, a former embedded systems designer from the Japanese automobile industry named Makoto Koike started helping out at his parents’ cucumber farm, and was amazed by the amount of work it takes to sort cucumbers by size, shape, color and other attributes.
Makoto’s father is very proud of his thorny cucumber, for instance, having dedicated his life to delivering fresh and crispy cucumbers, with many prickles still on them. Straight and thick cucumbers with a vivid color and lots of prickles are considered premium grade and command much higher prices on the market.
But Makoto learned very quickly that sorting cucumbers is as hard and tricky as actually growing them. “Each cucumber has different color, shape, quality and freshness,” Makoto says.
Cucumbers from retail stores
Cucumbers from Makoto’s farm
In Japan, each farm has its own classification standard and there’s no industry standard. At Makoto’s farm, they sort them into nine different classes, and his mother sorts them all herself — spending up to eight hours per day at peak harvesting times.
“The sorting work is not an easy task to learn. You have to look at not only the size and thickness, but also the color, texture, small scratches, whether or not they are crooked and whether they have prickles. It takes months to learn the system and you can’t just hire part-time workers during the busiest period. I myself only recently learned to sort cucumbers well,” Makoto said.
Distorted or crooked cucumbers are ranked as low-quality product
There are also some automatic sorters on the market, but they have limitations in terms of performance and cost, and small farms don’t tend to use them.
Makoto doesn’t think sorting is an essential task for cucumber farmers. “Farmers want to focus and spend their time on growing delicious vegetables. I’d like to automate the sorting tasks before taking the farm business over from my parents.“
Makoto Koike, center, with his parents at the family cucumber farm
Makoto Koike, family cucumber farm
The many uses of deep learning
Makoto first got the idea to explore machine learning for sorting cucumbers from a completely different use case: Google AlphaGo competing with the world’s top professional Go player.
“When I saw the Google’s AlphaGo, I realized something really serious is happening here,” said Makoto. “That was the trigger for me to start developing the cucumber sorter with deep learning technology.“
Using deep learning for image recognition allows a computer to learn from a training data set what the important “features” of the images are. By using a hierarchy of numerous artificial neurons, deep learning can automatically classify images with a high degree of accuracy. Thus, neural networks can recognize different species of cats, or models of cars or airplanes from images. Sometimes neural networks can exceed the performance of the human eye for certain applications. (For more information, check out my previous blog post Understanding neural networks with TensorFlow Playground.)
TensorFlow democratizes the power of deep learning
But can computers really learn mom’s art of cucumber sorting? Makoto set out to see whether he could use deep learning technology for sorting using Google’s open source machine learning library, TensorFlow.
“Google had just open sourced TensorFlow, so I started trying it out with images of my cucumbers,” Makoto said. “This was the first time I tried out machine learning or deep learning technology, and right away got much higher accuracy than I expected. That gave me the confidence that it could solve my problem.“
With TensorFlow, you don’t need to be knowledgeable about the advanced math models and optimization algorithms needed to implement deep neural networks. Just download the sample code and read the tutorials and you can get started in no time. The library lowers the barrier to entry for machine learning significantly, and since Google open-sourced TensorFlow last November, many “non ML” engineers have started playing with the technology with their own datasets and applications.
Cucumber sorting system design
Here’s a systems diagram of the cucumber sorter that Makoto built. The system uses Raspberry Pi 3 as the main controller to take images of the cucumbers with a camera, and
in a first phase, runs a small-scale neural network on TensorFlow to detect whether or not the image is of a cucumber.
It then forwards the image to a larger TensorFlow neural network running on a Linux server to perform a more detailed classification.
Systems diagram of the cucumber sorter
Makoto used the sample TensorFlow code Deep MNIST for Experts with minor modifications to the convolution, pooling and last layers, changing the network design to adapt to the pixel format of cucumber images and the number of cucumber classes.
Here’s Makoto’s cucumber sorter, which went live in July:
Here’s a close-up of the sorting arm, and the camera interface:
And here is the cucumber sorter in action:
Pushing the limits of deep learning
One of the current challenges with deep learning is that you need to have a large number of training datasets. To train the model, Makoto spent about three months taking 7,000 pictures of cucumbers sorted by his mother, but it’s probably not enough.
“When I did a validation with the test images, the recognition accuracy exceeded 95%. But if you apply the system with real use cases, the accuracy drops down to about 70%. I suspect the neural network model has the issue of “overfitting” (the phenomenon in neural network where the model is trained to fit only to the small training dataset) because of the insufficient number of training images.“
The second challenge of deep learning is that it consumes a lot of computing power. The current sorter uses a typical Windows desktop PC to train the neural network model. Although it converts the cucumber image into 80 x 80 pixel low-resolution images, it still takes two to three days to complete training the model with 7,000 images.
“Even with this low-res image, the system can only classify a cucumber based on its shape, length and level of distortion. It can’t recognize color, texture, scratches and prickles,” Makoto explained. Increasing image resolution by zooming into the cucumber would result in much higher accuracy, but would also increase the training time significantly.
To improve deep learning, some large enterprises have started doing large-scale distributed training, but those servers come at an enormous cost. Google offers Cloud Machine Learning (Cloud ML), a low-cost cloud platform for training and prediction that dedicates hundreds of cloud servers to training a network with TensorFlow. With Cloud ML, Google handles building a large-scale cluster for distributed training, and you just pay for what you use, making it easier for developers to try out deep learning without making a significant capital investment.
These specialized servers were used in the AlphaGo match
Makoto is eagerly awaiting Cloud ML. “I could use Cloud ML to try training the model with much higher resolution images and more training data. Also, I could try changing the various configurations, parameters and algorithms of the neural network to see how that improves accuracy. I can’t wait to try it.“
This is definitely not a scene from “A Clockwork Orange.” Allen Brain Observatory
As the mice watched a computer screen, their glowing neurons pulsed through glass windows in their skulls.
Using a device called a two-photon microscope, researchers at the Allen Institute for Brain Sciencecould peer through those windows and record, layer by layer, the workings of their little minds.
The result, announced July 13, is a real-time record of the visual cortex — a brain region shared in similar form across mammalian species — at work. The data set that emerged is so massive and complete that its creators have named it the Allen Brain Observatory.
Bred for the lab, the mice were genetically modified so that specific cells in their brains would fluoresce when they became active. Researchers had installed the brain-windows surgically, slicing away tiny chunks of the rodents’ skulls and replacing them with five-millimeter skylights.
Sparkling neurons of the mouse visual cortex shone through the glass as images and short films flashed across the screen. Each point of light the researchers saw translated, with hours of careful processing, into data:
Which cell lit up?
Where in the brain?
How long did it glow?
What was the mouse doing at the time?
What was on the screen?
The researchers imaged the neurons in small groups, building a map of one microscopic layer before moving down to the next. When they were finished, the activities of 18,000 cells from several dozen mice were recorded in their database.
The problem the Brain Observatory wants to solve is straightforward. Science still does not understand the brain’s underlying code very well, and individual studies may turn up odd results that are difficult to interpret in the context of the whole brain.
A decade ago, for example, a widely-reported study appeared to find a single neuron in a human brain that always — and only — winked on when presented with images of Halle Berry. Few scientists suggested that this single cell actually stored the subject’s whole knowledge of Berry’s face. But without more context about what the cells around it were doing, a more complete explanation remained out of reach.
“When you’re listening to a cell with an electrode, all you’re hearing is [its activity level] spiking,” said Shawn Olsen, another researcher on the project. “And you don’t know where exactly that cell is, you don’t know its precise location, you don’t know its shape, you don’t know who it connects to.“
Imagine trying to assemble a complete understanding of a computer given only facts like under certain circumstances, clicking the mouse makes lights on the printer blink.
To get beyond that kind of feeling around in the dark, the Allen Institute has taken what Olsen calls an “industrial” approach to mapping out the brain’s activity.
“Our goal is to systematically march through the different cortical layers, and the different cell types, and the different areas of the cortex to produce a systematic, mostly comprehensive survey of the activity,” Olsen explained. “It doesn’t just describe how one cell type is responding or one particular area, but characterizes as much as we can a complete population of cells that will allow us to draw inferences that you couldn’t describe if you were just looking at one cell at a time.“
In other words, this project makes its impact through the grinding power of time and effort.
A visualization of cells examined in the project. Allen Brain Observatory
Researchers showed the mice moving horizontal or vertical lines, light and dark dots on a surface, natural scenes, and even clips from Hollywood movies.
The more abstract displays target how the mind sees and interprets light and dark, lines, and motion, building on existing neuroscience.Researchers have known for decades that particular cells appear to correspond to particular kinds of motion or shape, or positions in the visual field. This research helps them place the activity of those cells in context.
One of the most obvious results was that the brain is noisy, messy, and confusing.
“Even though we showed the same image, we could get dramatically different responses from the same cell. On one trial it may have a strong response, on another it may have a weak response,” Olsen said.
All that noise in their data is one of the things that differentiates it from a typical study, de Vries said.
“If you’re inserting an electrode you’re going to keep advancing until you find a cell that kind of responds the way you want it to,” he said. “By doing a survey like this we’re going to see a lot of cells that don’t respond to the stimuli in the way that we think they should. We’re realizing that the cartoon model that we have of the cortex isn’t completely accurate.“
Olsen said they suspect a lot of that noise emerges from whatever the mouse is thinking about or doing that has nothing to do with what’s on screen. They recorded videos of the mice during data collection to help researchers combing their data learn more about those effects.
The best evidence for this suspicion? When they showed the mice more interesting visuals, like pictures of animals or clips from the film “Touch of Evil,” the neurons behaved much more consistently.
“We would present each [clip] ten different times,” de Vries said. “And we can see from trial to trial many cells at certain times almost always respond — reliable, repeatable, robust responses.“
In other words, it appears the mice were paying attention.
Allen Brain Observatory
The Brain Observatory was turned loose on the internet Wednesday, with its data available for researchers and the public to comb through, explore, and maybe critique.
But the project isn’t over.
In the next year-and-a-half, the researchers intend to add more types of cells and more regions of the visual cortex to their observatory. And their long-term ambitions are even grander.
“Ultimately,” Olson said,”we want to understand how this visual information in the mouse’s brain gets used to guide behavior and memory and cognition.“
Right now, the mice just watch screens. But by training them to perform tasks based on what they see, he said they hope to crack the mysteries of memory, decision-making, and problem-solving. Another parallel observatory created using electrode arrays instead of light through windows will add new levels of richness to their data.
So the underlying code of mouse — and human — brains remains largely a mystery, but the map that we’ll need to unlock it grows richer by the day.
It is amazing how intelligent we can be. We can construct shelter, find new ways of hunting, and create boats and machines. Our unique intelligence has been responsible for the emergence of civilization.
But how does a set of living cells become intelligent? How can flesh and blood turn into something that can create bicycles and airplanes or write novels?
This is the question of the origin of intelligence.
This problem has puzzled many theorists and scientists, and it is particularly important if we want to build intelligent machines. They still lag well behind us. Although computers calculate millions of times faster than we do, it is we who understand the big picture in which these calculations fit. Even animals are much more intelligent than machines. A mouse can find its way in a hostile forest and survive. This cannot be said for our computers or robots.
The question of how to achieve intelligence remains a mystery for scientists.
Recently, however a new theory has been proposed that may resolve this very question. The theory is called practopoiesis and is founded in the most fundamental capability of all biological organisms—their ability to adapt.
Darwin’s theory of evolution describes one way how our genomes adapt. By creating offspring new combinations of genes are tested; the good ones are kept and the bad ones are disposed of. The result is a genome better adapted to the environment.
Practopoiesis tells us that somewhat similar adaptation mechanisms of trials and errors occur while an organism grows, while it digests food and also, while it acts intelligently or thinks.
For example, the growth of our body is not precisely programmed by the genes. Instead, our genes perform experiments, which require feedback from the environment and corrections of errors. Only with trial and errors can our body properly grow.
Our genes contain an elaborate knowledge of which experiments need to be done, and this knowledge of trial-and-error approaches has been acquired through eons of evolution. We kept whatever worked well for our ancestors.
However, this knowledge alone is not enough to make us intelligent.
To create intelligent behavior such as thinking, decision making, understanding a poem, or simply detecting one’s friend in a crowd of strangers, our bodies require yet another type of trial-and-error knowledge. There are mechanisms in our body that also contain elaborate knowledge for experimenting, but they are much faster. The knowledge of these mechanisms is not collected through evolution but through the development over the lifetime of an individual.
These fast adaptive mechanisms continually adjust the big network of our connected nerve cells. These adaptation mechanisms can change in an eye-blink the way the brain networks are effectively connected. It may take less than a second to make a change necessary to recognize one’s own grandmother, or to make a decision, or to get a new idea on how to solve a problem.
The slow and the fast adaptive mechanisms share one thing: They cannot be successful without receiving feedback and thus iterating through several stages of trial and error; for example, testing several possibilities of who this person in distance could be.
Practopoiesis states that the slow and fast adaptive mechanisms are collectively responsible for creation of intelligence and are organized into a hierarchy.
First, evolution creates genes at a painstakingly slow tempo. Then genes slowly create the mechanisms of fast adaptations.
Next, adaptation mechanisms change the properties of our nerve cells within seconds.
And finally, the resulting adjusted networks of nerve cells route sensory signals to muscles with the speed of lightning.
At the end behavior is created.
Probably the most groundbreaking aspect of practopoietic theory is that our intelligent minds are not primarily located in the connectivity matrix of our neural networks, as it has been widely held, but instead in the elaborate knowledge of the fast adaptive mechanisms. The more knowledge our genes store into our quick abilities to adapt nerve cells, the more capability we have to adjust in novel situations, solve problems, and generally, act intelligently.
Therefore, our intelligence seems to come from the hierarchy of adaptive mechanisms, from the very slow evolution that enables the genome to adapt over a lifetime, to the quick pace of neural adaptation expressing knowledge acquired through its lifetime. Only when these adaptations have been performed successfully can our networks of neurons perform tasks with wonderful accuracy.
Our capability to survive and create originates, then,
from the adaptive mechanisms that operate at different levels and
the vast amounts of knowledge accumulated by each of the levels.
The combined result of all of them together is what makes us intelligent.
Danko Nikolić is a brain and mind scientist, running an electrophysiology lab at the Max Planck Institute for Brain Research, and is the creator of the concept of ideasthesia. More about practopoiesis can be read here
Olli hits the road in the Washington, D.C. area and later this year in Miami-Dade County and Las Vegas.
Local Motors CEO and co-founder John B. Rogers, Jr. with “Olli” & IBM, June 15, 2016.Rich Riggins/Feature Photo Service for IBM
IBM, along with the Arizona-based manufacturer Local Motors, debuted the first-ever driverless vehicle to use the Watson cognitive computing platform. Dubbed “Olli,” the electric vehicle was unveiled at Local Motors’ new facility in National Harbor, Maryland, just outside of Washington, D.C.
Olli, which can carry up to 12 passengers, taps into four Watson APIs (
Speech to Text,
Natural Language Classifier,
Entity Extraction and
Text to Speech
) to interact with its riders. It can answer questions like “Can I bring my children on board?” and respond to basic operational commands like, “Take me to the closest Mexican restaurant.” Olli can also give vehicle diagnostics, answering questions like, “Why are you stopping?“
Olli learns from data produced by more than 30 sensors embedded throughout the vehicle, which will added and adjusted to meet passenger needs and local preferences.
While Olli is the first self-driving vehicle to use IBM Watson Internet of Things (IoT), this isn’t Watson’s first foray into the automotive industry. IBM launched its IoT for Automotive unit in September of last year, and in March, IBM and Honda announced a deal for Watson technology and analytics to be used in the automaker’s Formula One (F1) cars and pits.
IBM demonstrated its commitment to IoT in March of last year, when it announced it was spending $3B over four years to establish a separate IoT business unit, whch later became the Watson IoT business unit.
IBM says that starting Thursday, Olli will be used on public roads locally in Washington, D.C. and will be used in Miami-Dade County and Las Vegas later this year. Miami-Dade County is exploring a pilot program that would deploy several autonomous vehicles to shuttle people around Miami.