Show and Tell: image captioning open sourced in TensorFlow

By Hugo Angel,

 In 2014, research scientists on the Google Brain team trained a machine learning system to automatically produce captions that accurately describe images. Further development of that system led to its success in the Microsoft COCO 2015 image captioning challenge, a competition to compare the best algorithms for computing accurate image captions, where it tied for first place.
Today, we’re making the latest version of our image captioning system available as an open source model in TensorFlow.
This release contains significant improvements to the computer vision component of the captioning system, is much faster to train, and produces more detailed and accurate descriptions compared to the original system. These improvements are outlined and analyzed in the paper Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge, published in IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatically captioned by our system.
So what’s new? 
Our 2014 system used the Inception V1 image classification model to initialize the image encoder, which
produces the encodings that are useful for recognizing different objects in the images. This was the best image model available at the time, achieving 89.6% top-5 accuracy on the benchmark ImageNet 2012 image classification task. We replaced this in 2015 with the newer Inception V2 image classification model, which achieves 91.8% accuracy on the same task.The improved vision component gave our captioning system an accuracy boost of 2 points in the BLEU-4 metric (which is commonly used in machine translation to evaluate the quality of generated sentences) and was an important factor of its success in the captioning challenge.Today’s code release initializes the image encoder using the Inception V3 model, which achieves 93.9% accuracy on the ImageNet classification task. Initializing the image encoder with a better vision model gives the image captioning system a better ability to recognize different objects in the images, allowing it to generate more detailed and accurate descriptions. This gives an additional 2 points of improvement in the BLEU-4 metric over the system used in the captioning challenge.Another key improvement to the vision component comes from fine-tuning the image model. This step addresses the problem that the image encoder is initialized by a model trained to classify objects in images, whereas the goal of the captioning system is to describe the objects in images using the encodings produced by the image model.  For example, an image classification model will tell you that a dog, grass and a frisbee are in the image, but a natural description should also tell you the color of the grass and how the dog relates to the frisbee.  In the fine-tuning phase, the captioning system is improved by jointly training its vision and language components on human generated captions. This allows the captioning system to transfer information from the image that is specifically useful for generating descriptive captions, but which was not necessary for classifying objects. In particular,  after fine-tuning it becomes better at correctly describing the colors of objects. Importantly, the fine-tuning phase must occur after the language component has already learned to generate captions – otherwise, the noisiness of the randomly initialized language component causes irreversible corruption to the vision component. For more details, read the full paper here.
Left: the better image model allows the captioning model to generate more detailed and accurate descriptions. Right: after fine-tuning the image model, the image captioning system is more likely to describe the colors of objects correctly.
Until recently our image captioning system was implemented in the DistBelief software framework. The TensorFlow implementation released today achieves the same level of accuracy with significantly faster performance: time per training step
is just 0.7 seconds in TensorFlow compared to 3 seconds in DistBelief on an Nvidia K20 GPU, meaning that total training time is just 25% of the time previously required.
A natural question is whether our captioning system can generate novel descriptions of previously unseen contexts and interactions. The system is trained by showing it hundreds of thousands of images that were captioned manually by humans, and it often re-uses human captions when presented with scenes similar to what it’s seen before.
When the model is presented with scenes similar to what it’s seen before, it will often re-use human generated captions.
So does it really understand the objects and their interactions in each image? Or does it always regurgitate descriptions from the training data? Excitingly, our model does indeed develop the ability to generate accurate new captions when presented with completely new scenes, indicating a deeper understanding of the objects and context in the images. Moreover, it learns how to express that knowledge in natural-sounding English phrases despite receiving no additional language training other than reading the human captions.
 

Our model generates a completely new caption using concepts learned from similar scenes in the training set
We hope that sharing this model in TensorFlow will help push forward image captioning research and applications, and will also
allow interested people to learn and have fun. To get started training your own image captioning system, and for more details on the neural network architecture, navigate to the model’s home-page here. While our system uses the Inception V3 image classification model, you could even try training our system with the recently released Inception-ResNet-v2 model to see if it can do even better!

ORIGINAL: Google Blog

by Chris Shallue, Software Engineer, Google Brain Team
September 22, 2016

What Are The Differences Between AI, Machine Learning, NLP, And Deep Learning?

By Hugo Angel,

(Image: Creative Commons)
What is the difference between AI, Machine Learning, NLP, and Deep Learning? originally appeared on Quora: the knowledge sharing network where compelling questions are answered by people with unique insights.Answer by Dmitriy Genzel, PhD in Computer Science, on Quora:

  • AI (Artificial intelligence) is a subfield of computer science that was created in the 1960s, and it was/is concerned with solving tasks that are easy for humans but hard for computers. In particular, a so-called Strong AI would be a system that can do anything a human can (perhaps without purely physical things). This is fairly generic and includes all kinds of tasks such as  
    • planning, 
    • moving around in the world, 
    • recognizing objects and sounds, 
    • speaking, 
    • translating, 
    • performing social or business transactions, 
    • creative work (making art or poetry), 
    • etc.
  • NLP (Natural language processing) is simply the part of AI that has to do with language (usually written).
  • Machine learning is concerned with one aspect of this:
    • given some AI problem that can be described in discrete terms (e.g. out of a particular set of actions, which one is the right one), and
    • given a lot of information about the world,
    • figure out what is the “correct” action, without having the programmer program it in.
    • Typically some outside process is needed to judge whether the action was correct or not.
    • In mathematical terms, it’s a function: you feed in some input, and you want it to to produce the right output, so the whole problem is simply to build a model of this mathematical function in some automatic way. To draw a distinction with AI, if I can write a very clever program that has human-like behavior, it can be AI, but unless its parameters are automatically learned from data, it’s not machine learning.
  • Deep learning is one kind of machine learning that’s very popular now. It involves a particular kind of mathematical model that can be thought of as a composition of simple blocks (function composition) of a certain type, and where some of these blocks can be adjusted to better predict the final outcome.
Add caption

The word “deep” means that the composition has many of these blocks stacked on top of each other, and the tricky bit is how to adjust the blocks that are far from the output, since a small change there can have very indirect effects on the output. This is done via something called Backpropagation inside of a larger process called Gradient descent which lets you change the parameters in a way that improves your model.

Artificial Intelligence: What is artificial intelligence and why do we need it?
Machine Learning: What is machine learning?
Natural Language Processing: What makes natural language processing difficult?

ORIGINAL: Quora
June 8, 2016

WaveNet: A Generative Model for Raw Audio by Google DeepMind

By Hugo Angel,

WaveNet: A Generative Model for Raw Audio
This post presents WaveNet, a deep generative model of raw audio waveforms. We show that WaveNets are able to generate speech which mimics any human voice and which sounds more natural than the best existing Text-to-Speech systems, reducing the gap with human performance by over 50%.
We also demonstrate that the same network can be used to synthesize other audio signals such as music, and present some striking samples of automatically generated piano pieces.
Talking Machines
Allowing people to converse with machines is a long-standing dream of human-computer interaction. The ability of computers to understand natural speech has been revolutionised in the last few years by the application of deep neural networks (e.g.,Google Voice Search). However, generating speech with computers — a process usually referred to as speech synthesis or text-to-speech (TTS) — is still largely based on so-called concatenative TTS, where a very large database of short speech fragments are recorded from a single speaker and then recombined to form complete utterances. This makes it difficult to modify the voice (for example switching to a different speaker, or altering the emphasis or emotion of their speech) without recording a whole new database.
This has led to a great demand for parametric TTS, where all the information required to generate the data is stored in the parameters of the model, and the contents and characteristics of the speech can be controlled via the inputs to the model. So far, however, parametric TTS has tended to sound less natural than concatenative, at least for syllabic languages such as English. Existing parametric models typically generate audio signals by passing their outputs through signal processing algorithms known asvocoders.
WaveNet changes this paradigm by directly modelling the raw waveform of the audio signal, one sample at a time. As well as yielding more natural-sounding speech, using raw waveforms means that WaveNet can model any kind of audio, including music.
WaveNets

 

Wave animation

 

Researchers usually avoid modelling raw audio because it ticks so quickly: typically 16,000 samples per second or more, with important structure at many time-scales. Building a completely autoregressive model, in which the prediction for every one of those samples is influenced by all previous ones (in statistics-speak, each predictive distribution is conditioned on all previous observations), is clearly a challenging task.
However, our PixelRNN and PixelCNN models, published earlier this year, showed that it was possible to generate complex natural images not only one pixel at a time, but one colour-channel at a time, requiring thousands of predictions per image. This inspired us to adapt our two-dimensional PixelNets to a one-dimensional WaveNet.
Architecture animation

 

 The above animation shows how a WaveNet is structured. It is a fully convolutional neural network, where the convolutional layers have various dilation factors that allow its receptive field to grow exponentially with depth and cover thousands of timesteps.At training time, the input sequences are real waveforms recorded from human speakers. After training, we can sample the network to generate synthetic utterances. At each step during sampling a value is drawn from the probability distribution computed by the network. This value is then fed back into the input and a new prediction for the next step is made. Building up samples one step at a time like this is computationally expensive, but we have found it essential for generating complex, realistic-sounding audio.
Improving the State of the Art
We trained WaveNet using some of Google’s TTS datasets so we could evaluate its performance. The following figure shows the quality of WaveNets on a scale from 1 to 5, compared with Google’s current best TTS systems (parametric and concatenative), and with human speech using Mean Opinion Scores (MOS). MOS are a standard measure for subjective sound quality tests, and were obtained in blind tests with human subjects (from over 500 ratings on 100 test sentences). As we can see, WaveNets reduce the gap between the state of the art and human-level performance by over 50% for both US English and Mandarin Chinese.
For both Chinese and English, Google’s current TTS systems are considered among the best worldwide, so improving on both with a single model is a major achievement.

 

Here are some samples from all three systems so you can listen and compare yourself:

US English:

Mandarin Chinese:

Knowing What to Say

In order to use WaveNet to turn text into speech, we have to tell it what the text is. We do this by transforming the text into a sequence of linguistic and phonetic features (which contain information about the current phoneme, syllable, word, etc.) and by feeding it into WaveNet. This means the network’s predictions are conditioned not only on the previous audio samples, but also on the text we want it to say.
If we train the network without the text sequence, it still generates speech, but now it has to make up what to say. As you can hear from the samples below, this results in a kind of babbling, where real words are interspersed with made-up word-like sounds:

 

Notice that non-speech sounds, such as breathing and mouth movements, are also sometimes generated by WaveNet; this reflects the greater flexibility of a raw-audio model.
As you can hear from these samples, a single WaveNet is able to learn the characteristics of many different voices, male and female. To make sure it knew which voice to use for any given utterance, we conditioned the network on the identity of the speaker. Interestingly, we found that training on many speakers made it better at modelling a single speaker than training on that speaker alone, suggesting a form of transfer learning.
By changing the speaker identity, we can use WaveNet to say the same thing in different voices:

 

Similarly, we could provide additional inputs to the model, such as emotions or accents, to make the speech even more diverse and interesting.
Making Music
Since WaveNets can be used to model any audio signal, we thought it would also be fun to try to generate music. Unlike the TTS experiments, we didn’t condition the networks on an input sequence telling it what to play (such as a musical score); instead, we simply let it generate whatever it wanted to. When we trained it on a dataset of classical piano music, it produced fascinating samples like the ones below:

 

WaveNets open up a lot of possibilities for TTS, music generation and audio modelling in general. The fact that directly generating timestep per timestep with deep neural networks works at all for 16kHz audio is really surprising, let alone that it outperforms state-of-the-art TTS systems. We are excited to see what we can do with them next.
For more details, take a look at our paper.


ORIGINAL: Google DeepMind
Aäron van den Oord. Research Scientist, DeepMind
Heiga Zen. Research Scientist, Google
Sander Dieleman. Research Scientist, DeepMind
8 September 2016

© 2016 DeepMind Technologies Limited

 

Carbon Nanotube Transistors Finally Outperform Silicon

By Hugo Angel,

Photo: Stephanie Precourt/UW-Madison College of Engineering

Back in the 1990s, observers predicted that the single-walled carbon nanotube (SWCNT) would be the nanomaterial that pushed silicon aside and created a post-CMOS world where Moore’s Law could continue its march towards ever=smaller chip dimensions. All of that hope was swallowed up by inconsistencies between semiconducting and metallic SWCNTs and the vexing issue of trying to get them all to align on a wafer.

The introduction of graphene seemed to take the final bit of luster off of carbon nanotubes’ shine, but the material, which researchers have been using to make transistors for over 20 years, has experienced a renaissance of late.
Now, researchers at the University of Wisconsin-Madison (UW-Madison) have given SWCNTs a new boost in their resurgence by using them to make a transistor that outperforms state-of-the-art silicon transistors.
This achievement has been a dream of nanotechnology for the last 20 years,” said Michael Arnold, a professor at UW-Madison, in a press release. “Making carbon nanotube transistors that are better than silicon transistors is a big milestone,” Arnold added. “[It’s] a critical advance toward exploiting carbon nanotubes in logic, high-speed communications, and other semiconductor electronics technologies.
In research described in the journal Science Advances, the UW-Madison researchers were able to achieve a current that is 1.9 times as fast as that seen in silicon transistors. The measure of how rapidly the current that can travel through the channel between a transistor’s source and drain determines how fast the circuit is. The more current there is, the more quickly the gate of the next device in the circuit can be charged .

The key to getting the nanotubes to create such a fast transistor was a new process that employs polymers to sort between the metallic and semiconducting SWCNTs to create an ultra-high purity of solution.
We’ve identified specific conditions in which you can get rid of nearly all metallic nanotubes, [leaving] less than 0.01 percent metallic nanotubes [in a sample],” said Arnold.
The researchers had already tackled the problem of aligning and placing the nanotubes on a wafer two years ago when they developed a process they dubbed “floating evaporative self-assembly.” That technique uses a hydrophobic substrate and partially submerges it in water. Then the SWCNTs are deposited on its surface and the substrate removed vertically from the water.
In our research, we’ve shown that we can simultaneously overcome all of these challenges of working with nanotubes, and that has allowed us to create these groundbreaking carbon nanotube transistors that surpass silicon and gallium arsenide transistors,” said Arnold.
In the video below, Arnold provides a little primer on SWCNTs and what his group’s research with them could mean to the future of electronics.

In continuing research, the UW-Madison team will be aiming to replicate the manufacturability of silicon transistors. To date, they have managed to scale their alignment and deposition process to 1-inch-by-1-inch wafers; the longer-term goal is to bring this up to commercial scales.

Arnold added: “There has been a lot of hype about carbon nanotubes that hasn’t been realized, and that has kind of soured many people’s outlook. But we think the hype is deserved. It has just taken decades of work for the materials science to catch up and allow us to effectively harness these materials.

ORIGINAL: IEEE

By Dexter Johnson
6 Sep 2016

Quantum Computers Explained – Limits of Human Technology

By Hugo Angel,

Where are the limits of human technology? And can we somehow avoid them? This is where quantum computers become very interesting. 
Check out THE NOVA PROJECT to learn more about dark energy: www.nova.org.au 


ORIGINAL: YouTube



Robots Can Now Learn Just By Observing, Without Being Told What To Look For

By Hugo Angel,

Machines are getting smarter every day—and that is both good and terrifying.
[Illustrations: v_alex/iStock]
Scientists at the University of Sheffield have come up with a way for machines to learn just by looking. They don’t need to be told what to look for—they can just learn how a system works by observing it. The method is called Turing Learning and is inspired by Alan Turing’s famous test.
For a computer to learn, usually it has to be told what to look for. For instance, if you wanted to teach a robot to paint like Picasso, you’d train software to mimic real Picasso paintings. “Someone would have to tell the algorithms what is considered similar to a Picasso to begin with,” says Roderick Gross, in a news release.
Turing Learning would not require such prior knowledge, he says. It would use two computer systems, plus the original “system” you’re investigating: a shoal of fish, a Picasso painting, anything. One of the computer systems tries to copy the real-world system as closely as possible. The other computer is an observer. Its task is to watch the goings-on and try to discern which of the systems is real, and which is the copy. If it guesses right, it gets a reward. At the same time, the counterfeit system is rewarded if it fools the observer.
Proceeding like this, the counterfeit models get better and better, and the observer works out how to distinguish real from fake to a more and more accurate degree. In the end, it can not only tell real from fake, but it has also—almost as a by-product of the process—created a precise model of how the genuine system works.
The experiment is named after Alan Turing‘s famous test for artificial intelligence, which says that if a computer program can fool a human observer into believing it is a real person, then it can be considered intelligent. In reality this never really works, as a) convincing a person that you’re another person isn’t a guarantee of intelligence, and b) many computer programs have simply been designed to game the human observers.
Turing Learning, though, is actually practical. It can be used to teach robots certain behaviors, but perhaps more useful is the categorization it performs. Set a Turing Learning machine loose on a swarm of insects, for instance, and it could tease out details in the behavior of a bee colony that remain invisible to humans.
The systems can also be used to recognize abnormal behavior, without first teaching the system what constitutes abnormal behavior. The possibilities here are huge, because noticing oddities in otherwise uniform behavior is something we humans can be terrible at. Look at airport security, for example and how often TSA agents miss guns, explosives, and other weapons.
The technique could also be used in video games to make the virtual players act more like real human players to monitor livestock for odd behaviors that might signal health problems, and for security purposes like lie detection.
In some ways, the technology is terrifying, as computers are able to get to the very basics of how things behave. On the other hand, they still need to be told what to do with that knowledge, so at least there’s something for us puny humans to do in the world of the future.
ORIGINAL: FastCoExist
09.07.16

Deep Learning With Python & Tensorflow – PyConSG 2016

By Hugo Angel,

ORIGINAL: Pycon.SG
Jul 5, 2016
Speaker: Ian Lewis
Description
Python has lots of scientific, data analysis, and machine learning libraries. But there are many problems when starting out on a machine learning project. Which library do you use? How can you use a model that has been trained in your production app? In this talk I will discuss how you can use TensorFlow to create Deep Learning applications and how to deploy them into production.
Abstract
Python has lots of scientific, data analysis, and machine learning libraries. But there are many problems when starting out on a machine learning project. Which library do you use? How do they compare to each other? How can you use a model that has been trained in your production application?
TensorFlow is a new Open-Source framework created at Google for building Deep Learning applications. Tensorflow allows you to construct easy to understand data flow graphs in Python which form a mathematical and logical pipeline. Creating data flow graphs allow easier visualization of complicated algorithms as well as running the training operations over multiple hardware GPUs in parallel.
In this talk I will discuss how you can use TensorFlow to create Deep Learning applications. I will discuss how it compares to other Python machine learning libraries like Theano or Chainer. Finally, I will discuss how trained TensorFlow models could be deployed into a production system using TensorFlow Serve.
Event Page: https://pycon.sg
Produced by Engineers.SG

How a Japanese cucumber farmer is using deep learning and TensorFlow.

By Hugo Angel,

by Kaz Sato, Developer Advocate, Google Cloud Platform
August 31, 2016
It’s not hyperbole to say that use cases for machine learning and deep learning are only limited by our imaginations. About one year ago, a former embedded systems designer from the Japanese automobile industry named Makoto Koike started helping out at his parents’ cucumber farm, and was amazed by the amount of work it takes to sort cucumbers by size, shape, color and other attributes.
Makoto’s father is very proud of his thorny cucumber, for instance, having dedicated his life to delivering fresh and crispy cucumbers, with many prickles still on them. Straight and thick cucumbers with a vivid color and lots of prickles are considered premium grade and command much higher prices on the market.
But Makoto learned very quickly that sorting cucumbers is as hard and tricky as actually growing them.Each cucumber has different color, shape, quality and freshness,” Makoto says.
Cucumbers from retail stores
Cucumbers from Makoto’s farm
In Japan, each farm has its own classification standard and there’s no industry standard. At Makoto’s farm, they sort them into nine different classes, and his mother sorts them all herself — spending up to eight hours per day at peak harvesting times.
The sorting work is not an easy task to learn. You have to look at not only the size and thickness, but also the color, texture, small scratches, whether or not they are crooked and whether they have prickles. It takes months to learn the system and you can’t just hire part-time workers during the busiest period. I myself only recently learned to sort cucumbers well,” Makoto said.
Distorted or crooked cucumbers are ranked as low-quality product
There are also some automatic sorters on the market, but they have limitations in terms of performance and cost, and small farms don’t tend to use them.
Makoto doesn’t think sorting is an essential task for cucumber farmers. “Farmers want to focus and spend their time on growing delicious vegetables. I’d like to automate the sorting tasks before taking the farm business over from my parents.
Makoto Koike, center, with his parents at the family cucumber farm
Makoto Koike, family cucumber farm
The many uses of deep learning
Makoto first got the idea to explore machine learning for sorting cucumbers from a completely different use case: Google AlphaGo competing with the world’s top professional Go player.
When I saw the Google’s AlphaGo, I realized something really serious is happening here,” said Makoto. “That was the trigger for me to start developing the cucumber sorter with deep learning technology.
Using deep learning for image recognition allows a computer to learn from a training data set what the important “features” of the images are. By using a hierarchy of numerous artificial neurons, deep learning can automatically classify images with a high degree of accuracy. Thus, neural networks can recognize different species of cats, or models of cars or airplanes from images. Sometimes neural networks can exceed the performance of the human eye for certain applications. (For more information, check out my previous blog post Understanding neural networks with TensorFlow Playground.)

TensorFlow democratizes the power of deep learning
But can computers really learn mom’s art of cucumber sorting? Makoto set out to see whether he could use deep learning technology for sorting using Google’s open source machine learning library, TensorFlow.
Google had just open sourced TensorFlow, so I started trying it out with images of my cucumbers,” Makoto said. “This was the first time I tried out machine learning or deep learning technology, and right away got much higher accuracy than I expected. That gave me the confidence that it could solve my problem.
With TensorFlow, you don’t need to be knowledgeable about the advanced math models and optimization algorithms needed to implement deep neural networks. Just download the sample code and read the tutorials and you can get started in no time. The library lowers the barrier to entry for machine learning significantly, and since Google open-sourced TensorFlow last November, many “non ML” engineers have started playing with the technology with their own datasets and applications.

Cucumber sorting system design
Here’s a systems diagram of the cucumber sorter that Makoto built. The system uses Raspberry Pi 3 as the main controller to take images of the cucumbers with a camera, and 

  • in a first phase, runs a small-scale neural network on TensorFlow to detect whether or not the image is of a cucumber
  • It then forwards the image to a larger TensorFlow neural network running on a Linux server to perform a more detailed classification.
Systems diagram of the cucumber sorter
Makoto used the sample TensorFlow code Deep MNIST for Experts with minor modifications to the convolution, pooling and last layers, changing the network design to adapt to the pixel format of cucumber images and the number of cucumber classes.
Here’s Makoto’s cucumber sorter, which went live in July:
Here’s a close-up of the sorting arm, and the camera interface:

And here is the cucumber sorter in action:

Pushing the limits of deep learning
One of the current challenges with deep learning is that you need to have a large number of training datasets. To train the model, Makoto spent about three months taking 7,000 pictures of cucumbers sorted by his mother, but it’s probably not enough.
When I did a validation with the test images, the recognition accuracy exceeded 95%. But if you apply the system with real use cases, the accuracy drops down to about 70%. I suspect the neural network model has the issue of “overfitting” (the phenomenon in neural network where the model is trained to fit only to the small training dataset) because of the insufficient number of training images.
The second challenge of deep learning is that it consumes a lot of computing power. The current sorter uses a typical Windows desktop PC to train the neural network model. Although it converts the cucumber image into 80 x 80 pixel low-resolution images, it still takes two to three days to complete training the model with 7,000 images.
Even with this low-res image, the system can only classify a cucumber based on its shape, length and level of distortion. It can’t recognize color, texture, scratches and prickles,” Makoto explained. Increasing image resolution by zooming into the cucumber would result in much higher accuracy, but would also increase the training time significantly.
To improve deep learning, some large enterprises have started doing large-scale distributed training, but those servers come at an enormous cost. Google offers Cloud Machine Learning (Cloud ML), a low-cost cloud platform for training and prediction that dedicates hundreds of cloud servers to training a network with TensorFlow. With Cloud ML, Google handles building a large-scale cluster for distributed training, and you just pay for what you use, making it easier for developers to try out deep learning without making a significant capital investment.
These specialized servers were used in the AlphaGo match
Makoto is eagerly awaiting Cloud ML. “I could use Cloud ML to try training the model with much higher resolution images and more training data. Also, I could try changing the various configurations, parameters and algorithms of the neural network to see how that improves accuracy. I can’t wait to try it.

A lab founded by a tech billionaire just unveiled a major leap forward in cracking your brain’s code

By Hugo Angel,

CeC_AIBS_Allen_CellResponseHR
This is definitely not a scene from “A Clockwork Orange.” Allen Brain Observatory
As the mice watched a computer screen, their glowing neurons pulsed through glass windows in their skulls.
Using a device called a two-photon microscope, researchers at the Allen Institute for Brain Science could peer through those windows and record, layer by layer, the workings of their little minds.
The result, announced July 13, is a real-time record of the visual cortex — a brain region shared in similar form across mammalian species — at work. The data set that emerged is so massive and complete that its creators have named it the Allen Brain Observatory.
Bred for the lab, the mice were genetically modified so that specific cells in their brains would fluoresce when they became active. Researchers had installed the brain-windows surgically, slicing away tiny chunks of the rodents’ skulls and replacing them with five-millimeter skylights.
Sparkling neurons of the mouse visual cortex shone through the glass as images and short films flashed across the screen. Each point of light the researchers saw translated, with hours of careful processing, into data: 
  • Which cell lit up? 
  • Where in the brain? 
  • How long did it glow? 
  • What was the mouse doing at the time? 
  • What was on the screen?

The researchers imaged the neurons in small groups, building a map of one microscopic layer before moving down to the next. When they were finished, the activities of 18,000 cells from several dozen mice were recorded in their database.

This is the first data set where we’re watching large populations of neurons’ activity in real time, at the cellular level,” said Saskia de Vries, a scientist who worked on the project, at the private research center launched by Microsoft co-founder Paul Allen.
The problem the Brain Observatory wants to solve is straightforward. Science still does not understand the brain’s underlying code very well, and individual studies may turn up odd results that are difficult to interpret in the context of the whole brain.
A decade ago, for example, a widely-reported study appeared to find a single neuron in a human brain that always — and only — winked on when presented with images of Halle Berry. Few scientists suggested that this single cell actually stored the subject’s whole knowledge of Berry’s face. But without more context about what the cells around it were doing, a more complete explanation remained out of reach.
When you’re listening to a cell with an electrode, all you’re hearing is [its activity level] spiking,” said Shawn Olsen, another researcher on the project. “And you don’t know where exactly that cell is, you don’t know its precise location, you don’t know its shape, you don’t know who it connects to.
Imagine trying to assemble a complete understanding of a computer given only facts like under certain circumstances, clicking the mouse makes lights on the printer blink.
To get beyond that kind of feeling around in the dark, the Allen Institute has taken what Olsen calls an “industrial” approach to mapping out the brain’s activity.
Our goal is to systematically march through the different cortical layers, and the different cell types, and the different areas of the cortex to produce a systematic, mostly comprehensive survey of the activity,” Olsen explained. “It doesn’t just describe how one cell type is responding or one particular area, but characterizes as much as we can a complete population of cells that will allow us to draw inferences that you couldn’t describe if you were just looking at one cell at a time.
In other words, this project makes its impact through the grinding power of time and effort.
A visualization of cells examined in the project. Allen Brain Observatory

Researchers showed the mice moving horizontal or vertical lines, light and dark dots on a surface, natural scenes, and even clips from Hollywood movies.

The more abstract displays target how the mind sees and interprets light and dark, lines, and motion, building on existing neuroscience. Researchers have known for decades that particular cells appear to correspond to particular kinds of motion or shape, or positions in the visual field. This research helps them place the activity of those cells in context.
One of the most obvious results was that the brain is noisy, messy, and confusing.
Even though we showed the same image, we could get dramatically different responses from the same cell. On one trial it may have a strong response, on another it may have a weak response,” Olsen said.
All that noise in their data is one of the things that differentiates it from a typical study, de Vries said.
If you’re inserting an electrode you’re going to keep advancing until you find a cell that kind of responds the way you want it to,” he said. “By doing a survey like this we’re going to see a lot of cells that don’t respond to the stimuli in the way that we think they should. We’re realizing that the cartoon model that we have of the cortex isn’t completely accurate.

Olsen said they suspect a lot of that noise emerges from whatever the mouse is thinking about or doing that has nothing to do with what’s on screen. They recorded videos of the mice during data collection to help researchers combing their data learn more about those effects.
The best evidence for this suspicion? When they showed the mice more interesting visuals, like pictures of animals or clips from the film “Touch of Evil,” the neurons behaved much more consistently.
We would present each [clip] ten different times,” de Vries said. “And we can see from trial to trial many cells at certain times almost always respond — reliable, repeatable, robust responses.
In other words, it appears the mice were paying attention.
Allen Brain Observatory

The Brain Observatory was turned loose on the internet Wednesday, with its data available for researchers and the public to comb through, explore, and maybe critique.

But the project isn’t over.
In the next year-and-a-half, the researchers intend to add more types of cells and more regions of the visual cortex to their observatory. And their long-term ambitions are even grander.
Ultimately,” Olson said,”we want to understand how this visual information in the mouse’s brain gets used to guide behavior and memory and cognition.
Right now, the mice just watch screens. But by training them to perform tasks based on what they see, he said they hope to crack the mysteries of memory, decision-making, and problem-solving. Another parallel observatory created using electrode arrays instead of light through windows will add new levels of richness to their data.
So the underlying code of mouse — and human — brains remains largely a mystery, but the map that we’ll need to unlock it grows richer by the day.
ORIGINAL: Tech Insider

Jul. 13, 2016

Where does intelligence come from?

By Hugo Angel,

Add caption
It is amazing how intelligent we can be. We can construct shelter, find new ways of hunting, and create boats and machines. Our unique intelligence has been responsible for the emergence of civilization.
But how does a set of living cells become intelligent? How can flesh and blood turn into something that can create bicycles and airplanes or write novels?
This is the question of the origin of intelligence.
This problem has puzzled many theorists and scientists, and it is particularly important if we want to build intelligent machines. They still lag well behind us. Although computers calculate millions of times faster than we do, it is we who understand the big picture in which these calculations fit. Even animals are much more intelligent than machines. A mouse can find its way in a hostile forest and survive. This cannot be said for our computers or robots.
The question of how to achieve intelligence remains a mystery for scientists.
Recently, however a new theory has been proposed that may resolve this very question. The theory is called practopoiesis and is founded in the most fundamental capability of all biological organisms—their ability to adapt.
Darwin’s theory of evolution describes one way how our genomes adapt. By creating offspring new combinations of genes are tested; the good ones are kept and the bad ones are disposed of. The result is a genome better adapted to the environment.
Practopoiesis tells us that somewhat similar adaptation mechanisms of trials and errors occur while an organism grows, while it digests food and also, while it acts intelligently or thinks.
For example, the growth of our body is not precisely programmed by the genes. Instead, our genes perform experiments, which require feedback from the environment and corrections of errors. Only with trial and errors can our body properly grow.
Our genes contain an elaborate knowledge of which experiments need to be done, and this knowledge of trial-and-error approaches has been acquired through eons of evolution. We kept whatever worked well for our ancestors.
However, this knowledge alone is not enough to make us intelligent.
To create intelligent behavior such as thinking, decision making, understanding a poem, or simply detecting one’s friend in a crowd of strangers, our bodies require yet another type of trial-and-error knowledge. There are mechanisms in our body that also contain elaborate knowledge for experimenting, but they are much faster. The knowledge of these mechanisms is not collected through evolution but through the development over the lifetime of an individual.
These fast adaptive mechanisms continually adjust the big network of our connected nerve cells. These adaptation mechanisms can change in an eye-blink the way the brain networks are effectively connected. It may take less than a second to make a change necessary to recognize one’s own grandmother, or to make a decision, or to get a new idea on how to solve a problem.
The slow and the fast adaptive mechanisms share one thing: They cannot be successful without receiving feedback and thus iterating through several stages of trial and error; for example, testing several possibilities of who this person in distance could be.
Practopoiesis states that the slow and fast adaptive mechanisms are collectively responsible for creation of intelligence and are organized into a hierarchy. 
  • First, evolution creates genes at a painstakingly slow tempo. Then genes slowly create the mechanisms of fast adaptations
  • Next, adaptation mechanisms change the properties of our nerve cells within seconds
  • And finally, the resulting adjusted networks of nerve cells route sensory signals to muscles with the speed of lightning. 
  • At the end behavior is created.
Probably the most groundbreaking aspect of practopoietic theory is that our intelligent minds are not primarily located in the connectivity matrix of our neural networks, as it has been widely held, but instead in the elaborate knowledge of the fast adaptive mechanisms. The more knowledge our genes store into our quick abilities to adapt nerve cells, the more capability we have to adjust in novel situations, solve problems, and generally, act intelligently.
Therefore, our intelligence seems to come from the hierarchy of adaptive mechanisms, from the very slow evolution that enables the genome to adapt over a lifetime, to the quick pace of neural adaptation expressing knowledge acquired through its lifetime. Only when these adaptations have been performed successfully can our networks of neurons perform tasks with wonderful accuracy.
Our capability to survive and create originates, then, 
  • from the adaptive mechanisms that operate at different levels and 
  • the vast amounts of knowledge accumulated by each of the levels.
 The combined result of all of them together is what makes us intelligent.
May 16, 2016
Danko Nikolić
About the Author:
Danko Nikolić is a brain and mind scientist, running an electrophysiology lab at the Max Planck Institute for Brain Research, and is the creator of the concept of ideasthesia. More about practopoiesis can be read here

IBM, Local Motors debut Olli, the first Watson-powered self-driving vehicle

By Hugo Angel,

Olli hits the road in the Washington, D.C. area and later this year in Miami-Dade County and Las Vegas.
Local Motors CEO and co-founder John B. Rogers, Jr. with “Olli” & IBM, June 15, 2016.Rich Riggins/Feature Photo Service for IBM

IBM, along with the Arizona-based manufacturer Local Motors, debuted the first-ever driverless vehicle to use the Watson cognitive computing platform. Dubbed “Olli,” the electric vehicle was unveiled at Local Motors’ new facility in National Harbor, Maryland, just outside of Washington, D.C.

Olli, which can carry up to 12 passengers, taps into four Watson APIs (

  • Speech to Text, 
  • Natural Language Classifier, 
  • Entity Extraction and 
  • Text to Speech

) to interact with its riders. It can answer questions like “Can I bring my children on board?” and respond to basic operational commands like, “Take me to the closest Mexican restaurant.” Olli can also give vehicle diagnostics, answering questions like, “Why are you stopping?

Olli learns from data produced by more than 30 sensors embedded throughout the vehicle, which will added and adjusted to meet passenger needs and local preferences.
While Olli is the first self-driving vehicle to use IBM Watson Internet of Things (IoT), this isn’t Watson’s first foray into the automotive industry. IBM launched its IoT for Automotive unit in September of last year, and in March, IBM and Honda announced a deal for Watson technology and analytics to be used in the automaker’s Formula One (F1) cars and pits.
IBM demonstrated its commitment to IoT in March of last year, when it announced it was spending $3B over four years to establish a separate IoT business unit, whch later became the Watson IoT business unit.
IBM says that starting Thursday, Olli will be used on public roads locally in Washington, D.C. and will be used in Miami-Dade County and Las Vegas later this year. Miami-Dade County is exploring a pilot program that would deploy several autonomous vehicles to shuttle people around Miami.
ORIGINAL: ZDnet
By Stephanie Condon for Between the Lines
June 16, 2016

The Quest to Make Code Work Like Biology Just Took A Big Step

By Hugo Angel,

THE QUEST TO MAKE CODE WORK LIKE BIOLOGY JUST TOOK A BIG STEP

|Chef CTO Adam Jacob.CHRISTIE HEMM KLOK/WIRED
IN THE EARLY 1970s, at Silicon Valley’s Xerox PARC, Alan Kay envisioned computer software as something akin to a biological system, a vast collection of small cells that could communicate via simple messages. Each cell would perform its own discrete task. But in communicating with the rest, it would form a more complex whole. “This is an almost foolproof way of operating,” Kay once told me. Computer programmers could build something large by focusing on something small. That’s a simpler task, and in the end, the thing you build is stronger and more efficient. 
The result was a programming language called SmallTalk. Kay called it an object-oriented language—the “objects” were the cells—and it spawned so many of the languages that programmers use today, from Objective-C and Swiftwhich run all the apps on your Apple iPhone, to JavaGoogle’s language of choice on Android phones. Kay’s vision of code as biology is now the norm. It’s how the world’s programmers think about building software. 

In the ’70s, Alan Kay was a researcher at Xerox PARC, where he helped develop the notion of personal computing, the laptop, the now ubiquitous overlapping-window interface, and object-oriented programming.
COMPUTER HISTORY MUSEUM
But Kay’s big idea extends well beyond individual languages like Swift and Java. This is also how Google, Twitter, and other Internet giants now think about building and running their massive online services. The Google search engine isn’t software that runs on a single machine. Serving millions upon millions of people around the globe, it’s software that runs on thousands of machines spread across multiple computer data centers. Google runs this entire service like a biological system, as a vast collection of self-contained pieces that work in concert. It can readily spread those cells of code across all those machines, and when machines break—as they inevitably do—it can move code to new machines and keep the whole alive. 
Now, Adam Jacob wants to bring this notion to every other business on earth. Jacob is a bearded former comic-book-store clerk who, in the grand tradition of Alan Kay, views technology like a philosopher. He’s also the chief technology officer and co-founder of Chef, a Seattle company that has long helped businesses automate the operation of their online services through a techno-philosophy known as “DevOps.” Today, he and his company unveiled a new creation they call Habitat. Habitat is a way of packaging entire applications into something akin to Alan Kay’s biological cells, squeezing in not only the application code but everything needed to run, oversee, and update that code—all its “dependencies,” in programmer-speak. Then you can deploy hundreds or even thousands of these cells across a network of machines, and they will operate as a whole, with Habitat handling all the necessary communication between each cell. “With Habitat,” Jacob says, “all of the automation travels with the application itself.” 
That’s something that will at least capture the imagination of coders. And if it works, it will serve the rest of us too. If businesses push their services towards the biological ideal, then we, the people who use those services, will end up with technology that just works better—that coders can improve more easily and more quickly than before
Reduce, Reuse, Repackage 
Habitat is part of a much larger effort to remake any online business in the image of Google. Alex Polvi, CEO and founder of a startup called CoreOS, calls this movement GIFEE—or Google Infrastructure For Everyone Else—and it includes tools built by CoreOS as well as such companies as Docker and Mesosphere, not to mention Google itself. The goal: to create tools that more efficiently juggle software across the vast computer networks that drive the modern digital world. 
But Jacob seeks to shift this idea’s center of gravity. He wants to make it as easy as possible for businesses to run their existing applications in this enormously distributed manner. He wants businesses embrace this ideal even if they’re not willing to rebuild these applications or the computer platforms they run on. He aims to provide a way of wrapping any code—new or old—in an interface that can run on practically any machine. Rather than rebuilding your operation in the image of Google, Jacob says, you can simply repackage it. 
If what I want is an easier application to manage, why do I need to change the infrastructure for that application?” he says. It’s yet another extension of Alan Kay’s biological metaphor—as he himself will tell you. When I describe Habitat to Kay—now revered as one of the founding fathers of the PC, alongside so many other PARC researchers—he says it does what SmallTalk did so long go
Chef CTO Adam Jacob.CHRISTIE HEMM KLOK/WIRED
The Unknown Programmer 
Kay traces the origins of SmallTalk to his time in the Air Force. In 1961, he was stationed at Randolph Air Force Base near San Antonio, Texas, and he worked as a programmer, building software for a vacuum-tube computer called the Burroughs 220. In those days, computers didn’t have operating systems. No Apple iOS. No Windows. No Unix. And data didn’t come packaged in standard file formats. No .doc. No .xls. No .txt. But the Air Force needed a way of sending files between bases so that different machines could read them. Sometime before Kay arrived, another Air Force programmer—whose name is lost to history—cooked up a good way. 
This unnamed programmer—“almost certainly an enlisted man,” Kay says, “because officers didn’t program back then”—would put data on a magnetic-tape reel along with all the procedures needed to read that data. Then, he tacked on a simple interface—a few “pointers,” in programmer-speak—that allowed the machine to interact with those procedures. To read the data, all the machine needed to understand were the pointers—not a whole new way of doing things. In this way, someone like Kay could read the tape from any machine on any Air Force base. 
Kay’s programming objects worked in a similar way. Each did its own thing, but could communicate with the outside world through a simple interface. That meant coders could readily plug an old object into a new program, or reuse it several times across the same program. Today, this notion is fundamental to software design. And now, Habitat wants to recreate this dynamic on a higher level: not within an application, but in a way that allows an application to run across as a vast computer network. 
Because Habitat wraps an application in a package that includes everything needed to run and oversee the application—while fronting this package with a simple interface—you can potentially run that application on any machine. Or, indeed, you can spread tens, hundreds, or even thousands of packages across a vast network of machines. Software called the Habitat Supervisor sits on each machine, running each package and ensuring it can communicate with the rest. Written in a new programming language called Rust which is suited to modern online systems, Chef designed this Supervisor specifically to juggle code on an enormous scale. 
Kay’s vision of code as biology is now the norm. It’s how the world’s programmers think about the software they build. 
But the important stuff lies inside those packages. Each package includes everything you need to orchestrate the application, as modern coders say, across myriad machines. Once you deploy your packages across a network, Jacob says, they can essentially orchestrate themselves. Instead of overseeing the application from one central nerve center, you can distribute the task—the ultimate aim of Kay’s biological system. That’s simpler and less likely to fail, at least in theory. 
What’s more, each package includes everything you need to modify the application—to, say, update the code or apply new security rules. This is what Jacob means when he says that all the automation travels with the application. “Having the management go with the package,” he says, “means I can manage in the same way, no matter where I choose to run it.” That’s vital in the modern world. Online code is constantly changing, and this system is designed for change.

‘Grownup Containers’ 
The idea at the heart of Habitat is similar to concepts that drive Mesosphere, Google’s Kubernetes, and Docker’s Swarm. All of these increasingly popular tools run software inside Linux “containers”—walled-off spaces within the Linux operating system that provide ways to orchestrate discrete pieces of code across myriad machines. Google uses containers in running its own online empire, and the rest of Silicon Valley is following suit. 
But Chef is taking a different tack. Rather than centering Habitat around Linux containers, they’ve built a new kind of package designed to run in other ways too. You can run Habitat packages atop Mesosphere or Kubernetes. You can also run them atop virtual machines, such as those offered by Amazon or Google on their cloud services. Or you can just run them on your own servers. “We can take all the existing software in the world, which wasn’t built with any of this new stuff in mind, and make it behave,” Jacob says. 
Jon Cowie, senior operations engineer at the online marketplace Etsy, is among the few outsiders who have kicked the tires on Habibat. He calls it “grownup containers.” Building an application around containers can be a complicated business, he explains. Habitat, he says, is simpler. You wrap your code, old or new, in a new interface and run it where you want to run it. “They are giving you a flexible toolkit,” he says. 
That said, container systems like Mesosphere and Kubernetes can still be a very important thing. These tools include “schedulers” that spread code across myriad machines in a hyper-efficient way, finding machines that have available resources and actually launching the code. Habitat doesn’t do that. It handles everything after the code is in place. 
Jacob sees Habitat as a tool that runs in tandem with a Mesophere or a Kubernetes—or atop other kinds of systems. He sees it as a single tool that can run any application on anything. But you may have to tweak Habitat so it will run on your infrastructure of choice. In packaging your app, Habitat must use a format that can speak to each type of system you want it to run on (the inputs and outputs for a virtual machine are different, say, from the inputs and outputs for Kubernetes), and at the moment, it only offers certain formats. If it doesn’t handle your format of choice, you’ll have to write a little extra code of your own. 
Jacob says writing this code is “trivial.” And for seasoned developers, it may be. Habitat’s overarching mission is to bring the biological imperative to as many businesses as possible. But of course, the mission isn’t everything. The importance of Habitat will really come down to how well it works.

Promise Theory 
Whatever the case, the idea behind Habitat is enormously powerful. The biological ideal has driven the evolution of computing systems for decades—and will continue to drive their evolution. Jacob and Chef are taking a concept that computer coders are intimately familiar with, and they’re applying it to something new. 
They’re trying to take away more of the complexity—and do this in a way that matches the cultural affiliation of developers,” says Mark Burgess, a computer scientist, physicist, and philosopher whose ideas helped spawn Chef and other DevOps projects. 
Burgess compares this phenomenon to what he calls Promise Theory, where humans and autonomous agents work together to solve problems by striving to fulfill certain intentions, or promises. He sees computer automation not just as a cooperation of code, but of people and code. That’s what Jacob is striving for. You share your intentions with Habitat, and its autonomous agents work to realize them—a flesh-and-blood biological system combining with its idealized counterpart in code. 
ORIGINAL: Wired
AUTHOR: CADE METZ.CADE METZ BUSINESS 
DATE OF PUBLICATION: 06.14.16.06.14.16