Inside a big bet on future machine intelligence
Feature Jeff Hawkins has bet his reputation, fortune, and entire intellectual life on one idea: that he understands the brain well enough to create machines with an intelligence we recognize as our own.
If his bet is correct, the Palm Pilot inventor will father a new technology, one that becomes the crucible in which a general artificial intelligence is one day forged. If his bet is wrong, then Hawkins will have wasted his life. At 56 years old that might sting a little.
“I want to bring about intelligent machines, machine intelligence, accelerated greatly from where it was going to happen and I don’t want to be consumed – I want to come out at the other end as a normal person with my sanity,
” Hawkins told The Register
. “My mission, the mission of Numenta, is to be a catalyst for machine intelligence.
A catalyst, he says, staring intently at your correspondent, “is something which accelerates a reaction by a thousand or ten thousand or a million-fold, and doesn’t get consumed in the process.“
His goal is ambitious, to put it mildly.
Before we dig deep into Hawkins’ idiosyncratic approach to artificial intelligence, it’s worth outlining the state of current AI research, why his critics have a right to be skeptical of his grandiose claims, and how his approach is different to the one being touted by consumer web giants such as Google.
AI researcher Jeff Hawkins
The road to a successful, widely deployable framework for an artificial mind is littered with failed schemes, dead ends, and traps
. No one has come to the end of it, yet. But while major firms like Google and Facebook, and small companies like Vicarious
, are striding over well-worn paths, Hawkins believes he is taking a new approach that could take him and his colleagues at his company,Numenta
, all the way.
For over a decade, Hawkins has poured his energy into amassing enough knowledge about the brain and about how to program it in software. Now, he believes he is on the cusp of a great period of invention that may yield some very powerful technology.
Some people believe in him, others doubt him, and some academics El Reg has spoken with are suspicious of his ideas.
One thing we have established is that the work to which Hawkins has dedicated his life has become an influential touchstone within the red-hot modern artificial intelligence industry. His 2004 book, On Intelligence, appears to have been read by and inspired many of the most prominent figures in AI, and the tech Numenta is creating may trounce other commercial efforts by much larger companies such as Google, Facebook, and Microsoft.
“I think Jeff is largely right in what he wrote in On Intelligence,” explains Hawkins’ former colleague Dileep George (now running his own AI startup, Vicarious, which recently received $40m in funding from Mark Zuckerberg, space pioneer Elon Musk, and actor-turned-VC Ashton Kutcher). “Hierarchical systems, associative memory, time and attention – I think all those ideas are correct.“
One of Google’s most prominent AI experts agrees: “Jeff Hawkins … has served as inspiration to countless AI researchers, for which I give him a lot of credit,” explains former Google brain king and current Stanford Professor Andrew Ng.
Some organizations have taken Hawkins’ ideas and stealthily run with them, with schemes already underway at companies like IBM
and federal organizations like DARPA
to implement his ideas in silicon, paving the way for neuromorphic processors
that process information in near–real time, develop representations of patterns, and make predictions. If successful, these chips will make Qualcomm‘s “neuromorphic” Zeroth processors look like toys.
He has also inspired software adaptations of his work, such as CEPT
, which has built an intriguing natural language processing engine partly out of Hawkins’ ideas.
How we think: time and hierarchy
Hawkins’ idea is that to build systems that behave like the brain, you have to be able to
- take in a stream of changing information,
- recognize patterns in it without knowing anything about the input source,
- make predictions, and
- react accordingly.
The only context you have for this analysis is an ability to observe how the stream of data changes over time.
Though this sounds similar to some of the data processing systems being worked on by researchers at Google, Microsoft, and Facebook, it has some subtle differences.
Part of it is heritage – Hawkins traces his ideas back to his own understanding of how our neocortex works based on a synthesis of thousands of academic papers, chats with researchers, and his own work at two of his prior tech companies, Palm and Handspring
, whereas the inspiration for most other approaches are neural networks
based on technology from the 80s, which itself was refined out of a 1940s paper [PDF]
, “A Logical Calculus of the Ideas Immanent in Nervous Activity
“That may be the right thing to do, but it’s not the way brains work and it’s not the principles of intelligence and it’s not going to lead to a system that can explore the world or systems that can have behavior,” Hawkins tells us.
So far he has outlined the ideas for this approach in his influential On Intelligence, plus a white paper published in 2011, a set of open source algorithms called NuPIC based on his Hierarchical Temporal Memory approach, and hundreds of talks given at universities and at companies ranging from Google to small startups.
Six easy pieces and the one true algorithm
Hawkins’ work has “popularized the hypothesis that much of intelligence might be due to one learning algorithm,” explains Ng.
Part of why Hawkins’ approach is so controversial is that rather than assembling a set of advanced software components for specific computing functions and lashing them together via ever more complex collections of software, Hawkins has dedicated his research to figuring out an implementation of a single, basic approach.
This approach stems from an observation that our brain doesn’t appear to come preloaded with any specific instructions or routines, but rather is an architecture that is able to take in, process, and store an endless stream of information and develop higher-order understandings out of that.
The manifestation of Hawkins’ approach is the Cortical Learning Algorithm, or CLA.
“People used to think the neocortex was divided into sensory regions and motor regions,” he explains. “We know now that is not true – the whole neocortex is sensory and motor.”
Ultimately, the CLA will be a single system that involves both sensory processing and motor control – brain functions that Hawkins believes must be fused together to create the possibility of consciousness. For now, most work has been done on the sensory layer, though he has recently made some breakthroughs on the motor integration as well.
To build his Cortical Learning Algorithm system, Hawkins says, he has developed six principles that define a cortical-like processor. These traits are
- “on-line learning from streaming data”,
- “hierarchy of memory regions”,
- “sequence memory”,
- “sparse distributed representations”,
- “all regions are sensory and motor”, and
These principles are based on his own study of the work being done by neuroscientists around the world.
Now, Hawkins says, Numenta is on the verge of a breakthrough that could see the small company birth a framework for building intelligence machines. And unlike the hysteria that greeted AI in the 70s and 80s as the defense industry pumped money into AI, this time may not be a false dawn.
“I am thrilled at the progress we’re making,” he told El Reg one sunny afternoon at Numenta’s whiteboard-crammed offices in Redwood City, California. “It’s accelerating. These things are compounding, and it feels like these things are all coming together very rapidly.”
The approach Numenta has been developing is producing better and better results, he says, and the CLA is gaining broader capabilities. In the past months, Hawkins has gone through a period of fecund creativity, and has solved one of the main problems that have bedeviled his system (temporal pooling), he says. He sees 2014 as a critical year for the company.
He is confident that he has bet correctly – but it’s been a hard road to get here.
That long, hard road
Hawkins’ interest in the brain dates back to his childhood, as does his frustration with how it is studied.
Growing up, Hawkins spent time with his father in an old shipyard on the north shore of Long Island, inventing all manner of boats with his father, an inventor with the enthusiasm for creativity of a Dr. Seuss character. In high school, the young Hawkins developed an interest in biophysics and, as he recounts in his book On Intelligence, tried to find out more about the brain at a local library.
“My search for a satisfying brain book turned up empty. I came to realize that no one had any idea how the brain actually worked. There weren’t even any bad or unproven theories; there simply were none,” he wrote.
This realization sparked a lifelong passion to try to understand the grand, intricate system that makes people who they are, and to eventually model the brain and create machines built in the same manner.
Hawkins graduated from Cornell in 1979 with a Bachelor of Science in Electronic Engineering. After a stint at Intel, he applied to MIT to study artificial intelligence, but had his application rejected because he wanted to understand how brains work, rather than build artificial intelligence. After this he worked at laptop start-up GRiD Systems, but during this time “could not get my curiosity about the brain and intelligent machines out of my head,” so he did a correspondence course in physiology and ultimately applied to and was accepted in the biophysics program at the University of California, Berkeley.
When Hawkins started at Berkeley in 1986, his ambition to study a theory of the brain collided with the university administration, which disagreed with his course of study. Though Berkeley was not able to give him a course of study, Hawkins spent almost two years ensconced in the school’s many libraries reading as much of the literature available on neuroscience as possible.
This deep immersion in neuroscience became the lens through which Hawkins viewed the world, with his later business accomplishments – Palm, Handspring – all leading to valuable insights on how the brain works and why the brain behaves as it does.
The way Hawkins recounts his past makes it seem as if the creation of a billion-dollar business in Palm, and arguably the prototype of the modern smartphone in Handspring, was a footnote along his journey to understand the brain.
This makes more sense when viewed against what he did in 2002, when he founded the Redwood Neuroscience Institute (now a part of the University of California at Berkeley and an epicenter of cutting-edge neuroscience research in its own right), and in 2005 founded Numenta with Palm/Handspring collaborator Donna Dublinksy and cofounder Dileep George.
These decades gave Hawkins the business acumen, money, and perspective needed to make a go at crafting his foundation for machine intelligence.
His media-savvy, confident approach appears to have stirred up some ill feeling among other academics who point out, correctly, that Hawkins hasn’t published widely, nor has he invented many ideas on his own.
Numenta has also had troubles, partly due to Hawkins’ idiosyncratic view on how the brain works.
In 2010, for example, Numenta cofounder Dileep George left to found his own company, Vicarious, to pick some of the more low-hanging fruit in the promising field of AI. From what we understand, this amicable separation stemmed from a difference of opinion between George and Hawkins, as George tended towards a more mathematical approach, and Hawkins to a more biological one.
Hawkins has also come in for a bit of a drubbing from the intelligentsia
, with NYU
psychology professor Gary Marcus
dismissing Numenta’s approach in a New Yorker
article titled “Steamrolled by Big Data
Other academics El Reg interviewed for this article did not want to be quoted, as they felt Hawkins’ lack of peer reviewed papers combined with his entrepreneurial persona reduced the credibility of his entire approach.
Hawkins brushes off these criticisms and believes they come down to a difference of opinion between him and the AI intelligentsia.
“These are complex biological systems that were not designed by mathematical principles [that are] very difficult to formalize completely,” he told us.
“This reminds me a bit of the beginning of the computer era,” he said. “If you go back to the 1930s and early 40s, when people first started thinking about computers they were really interested in whether an algorithm would complete, and they were looking for mathematical completeness, a mathematical proof, that if you implemented something like an algorithm today when we build a computer, no one sits around saying “Let’s look at the mathematical formalism of this computer.’ It reminds me a little about that. We still have people saying ‘You don’t have enough math here!’ There’s some people that just don’t like that.“
Hawkins’ confidence stems from the way Numenta has built its technology, which far from merely taking inspiration from the brain – as many other startups claim to do – is actively built as a digital implementation of everything Hawkins has learned about how the dense, napkin-sized sheet of cells that is our neocortex works.
“I know of no other cortical theories/models that incorporate any of the following:
- active dendrites,
- differences between proximal and distal dendrites,
- synapse growth and decay,
- potential synapses,
- dendrite growth,
- depolarization as a mode of prediction,
- multiple types of inhibition and their corresponding inhibitory neurons,
The new temporal pooling mechanism we are working on requires metabotropic receptors in the locations they are, and are not, found. Again, I don’t know of any theories that have been reduced to practice that incorporate any, let alone all of these concepts,” he wrote in a post to the discussion mailing list for NuPic, an open source implementation of Numenta’s CLA, in February.
Deep learning is the new shallow learning
But for all the apparent rigorousness of Hawkins’ approach, during the years he has worked on the technology there has been a fundamental change in the landscape of AI development: the rise of the consumer internet giants, and with them the appearance of various cavernous stores of user data on which to train learning algorithms.
Google, for instance, was said in January of 2014 to be assembling the team required for the “Manhattan Project for AI
“, according to a source who spoke anonymously to online publication Re/code
. But Hawkins thinks that for all its grand aims, Google’s approach may be based on a flawed presumption.
The collective term for the approach pioneered by companies like Google, Microsoft, and Facebook is “Deep Learning“, but Hawkins fears it may be another blind path.
“Deep learning could be the greatest thing in the world, but it’s not a brain theory,” he says.
Deep learning approaches, Hawkins says, encourage the industry to go about refining methods based on old technology, itself based on an oversimplified version of the neurons in a brain.
Because of the vast stores of user data available, the companies are all compelled to approach the quest of creating artificial intelligence through building machines that compute over certain types of data.
In many cases, much of the development at places like Google, Microsoft, and Facebook has revolved around vision – a dead end, according to Hawkins.
“Where the whole community got tripped up – and I’m talking fifty years tripped up – is vision,” Hawkins explains. “They said, ‘Your eyes are moving all the time, your head is moving, the world is moving – let us focus on a simpler problem: spatial inference in vision’. This turns out to be a very small subset of what vision is. Vision turns out to be an inference problem. What that did is they threw out the most important part of vision – you must learn first how to do time-based vision.”
The acquisitions these companies have made speak to this apparent flaw.
Google, for instance, hired AI luminary and University of Toronto professor Geoff Hinton and his startup DNNresearch last year to have him apply his “Deep Belief Networks” approach to Google’s AI efforts.
In a talk given at the University of Toronto last year, Hinton said he believed more advanced AI should be based on existing approaches, rather than a rethought understanding of the brain.
“The kind of neural inspiration I like is when making it more like the brain works better,
” Hinton said
. “There’s lots of people who say you ought to make it more like the brain – like Henry Markram [of the European Union's brain simulation project], for example. He says, ‘Give me a billion dollars and I’ll make something like the brain,’ but he doesn’t actually know how to make it work – he just knows how to make something more and more like the brain. That seems to me not the right approach. What we should do is stick with things that actually work and make them more like the brain, and notice when making them more like the brain is actually helpful. There’s not much point in making things work worse
Hawkins vehemently disagrees with this point, and believes that basing approaches on existing methods means Hinton and other AI researchers are not going to be able to imbue their systems with the generality needed for true machine intelligence.
Another influential Googler agrees.
“We have neuroscientists in our team so we can be biologically inspired but are not slavish to it,” Google Fellow Jeff Dean (creator of MapReduce, the Google File System, and now a figure in Google’s own “Brain Project” team, also known as its AI division) told us this year.
“I’m surprised by how few people believe they need to understand how the brain works to build intelligent machines,” Hawkins says. “I’m disappointed by this.”
Hinton’s foundational technologies, for example, are Boltzmann machines – advanced “stochastic recurrent neural network” tools that try to mimic some of the characteristics of the brain, which sit at the heart of Hinton’s “Deep Belief Networks” (2006).
“The neurons in a restricted Boltzmann machine are not even close [to the brain] – it’s not even an approximation,” Hawkins explains.
Even Google is not sure about which way to bet on how to build a mind, as illustrated by its buy of UK company “DeepMind Technologies” earlier this year.
That company’s founder, Demis Hassabis
, has done detailed work on fundamental neuroscience, and has built technology out of this understanding. In 2010, it was reported
that he mentioned both Hawkins’ Hierarchical Temporal Memory
and Hinton’s Deep Belief Nets
when giving a talk on viable general artificial intelligence approaches.
Facebook has gone down similar paths by hiring
the influential artificial intelligence academic Yann LeCun
to help it “predict what a user is going to do next,
” among other things.
Microsoft has developed significant capabilities as well, with systems like the Siri-beater “Cortana” and various endeavors by the company’s research division, MSR
Though the techniques these various researchers employ differ, they all depend on training a dataset over a large amount of information, and then selectively retraining it as information changes.
These AI efforts are built around dealing with problems backed up by large and relatively predictable datasets. This has yielded some incredible inventions, such as
- reasonable natural language processing,
- image detection, and
- video tagging.
It has not and cannot, however, yield a framework for a general intelligence, as it doesn’t have the necessary architecture for data
- retention, and
that our own brains do, Hawkins claims.
Hawkins’ focus on time is why he believes his approach will win – something that the consumer internet giants are slowly waking up to.
It’s all about time
“I would say that Hawkins is focusing more on how things unfold over time, which I think is very important,” Google’s research director Peter Norvig told El Reg via email, “while most of the current deep learning work assumes a static representation, unchanging over time. I suspect that as we scale up the applications (i.e., from still images to video sequences, and from extracting noun-phrase entities in text to dealing with whole sentences denoting actions), that there will be more emphasis on the unfolding of dynamic processes over time.“
Another former Googler concurs, with Andrew Ng telling us via email, “Hawkins’ work places a huge emphasis on learning from sequences. While most deep learning researchers also think that learning from sequences is important, we just haven’t figured out ways to do so that we’re happy with yet.“
Geoff Hinton echoes this praise. “He has great insights about the types of computation the brain must be doing,” he tells us – but argues that Jeff Hawkins’ actual algorithmic contributions have been “disappointing” so far.
An absolutely crucial ingredient to AI
Time “is one hundred per cent crucial” to the creation of true artificial intelligence, Hawkins tells us. “If you accept the fact intelligent machines are going to work on the principles of the neocortex, it is the entire thing, basically. The only way.“
“The brain does two things:
- it does inference, which is recognizing patterns, and
- it does behavior, which is generating patterns or generating motor behavior,“
Hawkins explains. “Ninety-nine percent of inference is time-based – language, audition, touch – it’s all time-based. You can’t understand touch without moving your hand. The order in which patterns occur is very important.”
Numenta’s approach relies on time. Its Cortical Learning Algorithm (white paper) amounts to an engine for
- processing streams of information,
- classifying them,
- learning to spot differences, and
- using time-based patterns to make predictions about the future.
As mentioned above, there are several efforts underway at companies like IBM and federal research agencies like DARPA to implement Hawkins’ systems in custom processors, and these schemes all recognize the importance of Hawkins’ reliance on time.
“What I found intriguing about [his approach] – time is not an afterthought. In all of these [other] things, time has been an afterthought,” one source currently working on implementing Hawkins’ ideas tells us.
So far, Hawkins has used his system to make predictions of diverse phenomena such as
- hourly energy use and
- stock trading volumes, and
- to detect anomalies in data streams.
Numenta’s commercial product, Grok, detects anomalies in computer servers running on Amazon’s cloud service.
Hawkins described to us one way to understand the power of this type of pattern recognition. “Imagine you are listening to a musician,” he suggested. “After hearing her play for several days, you learn the kind of music she plays, how talented she is, how much she improvises, and how many mistakes she makes. Your brain learns her style, and then has expectations about what she will play and what it will sound like. As you continue to listen to her play, you will detect if her style changes, if the type of music she plays changes, or if she starts making more errors. The same kind of patterns exist in machine-generated data, and Grok will detect changes.“
Here again the wider AI community appears to be dovetailing into Hawkins’ ideas, with one of Andrew Ng
‘s former Stanford students Honglak Lee
having published a paper called “A classification-based polyphonic piano transcription approach using learned feature representations
” in 2011. However, the method if implementation is different.
Obscurity through biology
Part of the reason why Hawkins’ technology is not more widely known is because for current uses it is hard for it to demonstrate a vast lead over rival approaches. For all of Hawkins’ belief in the tech, it is hard to demonstrate a convincing killer application for it that other approaches can’t do. The point, Hawkins says, is that the CLA’s internal structure gets rid of some of the stumbling blocks that exist in the future of other approaches.
Hawkins believes the CLA’s implicit dependence on time means that eventually it will become the dominant approach.
“At the bottom of the [neocortex's] hierarchy are fast-changing patterns and they form sequences – some of them are predictable and some of them are not – and what the neocortex is doing is trying to understand the set of patterns here and give it a constant representation – a name for the sequence, if you will – and it forms that as the next level of the hierarchy so the next level up is more stable,” Hawkins explains.
“Changing patterns lead to changing representations in the hierarchy that are more stable, and then it learns the changes in those patterns, and as you go up the hierarchy it forms more and more stable representations of the world and they also tend to be independent of your body position and your senses.“
A comparison between Hawkins’ Hierarchical Temporal Memory cells (right),
a neural network neuron (center), and the brain’s own neuron (left)
He believes his technology is more effective than the approaches taken by his rivals due to its use of sparse distributed representations as an input device to a storage system he terms “sequence memory“.
Sequence memory refers to how information makes its way into the brain as a stream of information that comes in from both external stimuli and internal stimuli, such as signals from the broader body.
Sparse Distributed Representations (SDRs)
are partially based on the work of mathematician Pentti Kanerva
on “Sparse Distributed Memory
They refer to how the brain represents and stores information. They are designed to mimic the way our brain is believed to encode memories, which is through neuron firings across a very large area in response to inputs. To achieve this, SDRs are written, roughly, as a 2000-bit string of which perhaps two percent are active. This means that you don’t need to read all active bits in an SDR to say that it is similar to another, because it merely needs to share a few of the activated bits to be considered similar, due to the sparsity.
Hawkins believes SDRs give input data inherent meaning through this representation approach.
“This means that if two vectors have 1s in the same position, they are semantically similar. Vectors can therefore be expressed in degrees of similarity rather than simply being identical or different. These large vectors can be stored accurately even using a subsampled index of, say, 10 of 2,000 bits. This makes SDR memory fault tolerant to gaps in data. SDRs also exhibit properties that reliably allow the neocortex to determine if a new input is unexpected,” the company’s commercial website for Grok says.
But what are the drawbacks?
So if Hawkins thinks he has the theory and is on the way to building the technology, and other companies are implementing it, then why are we even calling what he is doing a “bet“? The answer comes down to credibility.
Hawkins’ idiosyncratic nature and decision to synthesize insights from two different fields – neuroscience and computer science – are his strengths, but also his drawbacks.
“No one knows how the cortex works, so there is no way to know if Jeff is on the right track or not,” Dr. Terry Sejnowski, the laboratory head of the Computational Neurobiology Laboratory at the SALK Institute for Biological Studies, tells us. “To the extent that [Hawkins] incorporates new data into his models he may have a shot, and there will be a flood of data coming from the BRAIN Initiative that was announced by Obama last April.“
Hawkins says that this response is typical of the academic community, and that there is enough data available to learn about the brain. You just have to look for it.
“We’re not going to replicate the neocortex, we’re not going to simulate the neocortex, we just need to understand how it works in sufficient detail so we can say ‘A-ha!’ and build things like it,” Hawkins says. “There is an incredible amount of unassimilated data that exists. Fifty years of papers. Thousands of papers a year. It’s unbelievable, and it’s always the next set of papers that people think is going to do it. … it’s not true that you have to wait for that stuff.”
The root of the problems Hawkins faces may be his approach, which stems more from biology than from mathematics. His old colleague and cofounder of Numenta, Dileep George, confirms this.
“I think Jeff is largely right in what he wrote in On Intelligence,” George told us. “There are different approaches on how to bring those ideas. Jeff has an angle on it; we have a different angle on it; the rest of the community have another perspective on it.“
These ideas are echoed by Google’s Norvig. “Hawkins, at least in his general-public-facing-persona, seems to be more driven by duplicating what the brain does, while the deep learning researchers take some concepts from the brain, but then mostly are trying to optimize mathematical equations,” he told us via email.
“I live in the middle,” Hawkins explains. “Where I know the neuroscience details very very well, and I have a theoretical framework, and I bounce back and forth between these over and over again.“
Hawkins reckons that what he is doing today “is maybe 5 per cent of how humans learn,” Hawkins says.
He believes that during the coming year he will begin work on the next major area of development for his technology: action.
For Hawkins’ machines to gain independence – the ability, say, to not only recognize and classify patterns, but actively tune themselves to hunt for specific bits of information – the motor component needs to be integrated, he explains.
“What we’ve proven so far – I say built and tested and put into a product – is pure sensor. It’s like an ear listening to sounds that doesn’t have a chance to move,” he tells us.
If you can add in the motor component, “an entire world opens up,” he says.
“For example, I could have something like a web bot – an internet crawler. Today’s web crawlers are really stupid, they’re like wall-following rats. They just go up and down the length up and down the length,” he says.
“If I wanted to look and understand the web, I could have a virtual system that is basically moving through cyberspace thinking about ‘What is the structure here? How do I model this?’ And so that’s an example of a behavioral system that has no physical presence. It basically says, ‘OK, I’m looking at this data, now where do I go next to look? Oh, I’m going to follow this link and do that in an intelligent way’.“
By creating this technology, Hawkins hopes to dramatically accelerate the speed with which generally applicable artificial intelligence is developed and integrated into our world.
It’s taken a lot to get here, and the older Hawkins gets and the more rival companies spend, the bigger the stakes get. As of 2014, he is still betting his life on the fact that he is right and they are wrong. ®