The artificial intelligence (AI) world summit took place in Singapore on 3 and 4 October 2017 (The AI summit Singapore). If we follow this global trend of heavy emphasis on AI, we can note the convergence between artificial intelligence and the emergence of “smart cities” in Asia, especially in China (Imran Khan, “Asia is leading the “Smart city” charge, but we’re not there yet”, TechinAsia, January 19 2016). The development of artificial intelligence indeed combines with the current urbanization of the Chinese population.
This “intelligentization” of smart cities in China is induced by the necessity to master urban growth, while adapting urban areas to the emerging energy, water, food, health challenges, through the treatment of big data by artificial intelligence (Jean-Michel Valantin, “China: Towards the digital ecological revolution?”, The Red (Team) Analysis Society, October 22, 2017). Reciprocally, the smart urban development is a powerful driver, among others, of the development of artificial intelligence (Linda Poon, “What artificial intelligence reveals about urban change?” City Lab, July 13, 2017).
In this article, we shall thus focus upon the combination of artificial intelligence and cities that indeed creates the so-called “smart cities” in China. After having presented how this combination looks like through Chinese examples, we shall explain how this trend is implemented. Finally, we shall see how the development of artificial intelligence within the latest generations of smart cities is disrupting geopolitics through the combination of industry and intelligentization.
Artificial intelligence and smart cities
In China, the urban revolution induced by the acceleration of rural exodus is entwined with the digital and artificial intelligence revolution. This can be seen through the national program of urban development that is transforming “small” (3 million people) and middle size cities (5 million people) into smart cities. The new 95 Chinese smart cities are meant to shelter the coming250 millions people expected to relocate into towns between the end of 2017 and 2026 (Chris Weller, “Here’s China’s genius plan to move 250 millions people from farms to cities”, Business Insider, 5 August 2015). However, these 95 cities are part of the 500 smart cities that are expected to be developed before the end of 2017 (“Chinese “smart cities” to number 500 before end of 2017“, China Daily, 21-04-2017).
In order to manage the mammoth challenges of these huge cities, artificial intelligence is on the rise. Deep learning is notably the type of AI that is used to make these cities smart. Deep learning is both able to treat the massive flow of data generated by cities and made possible by the exponentially growing flows of these big data – as these very data allow the AI to learn by themselves, through the creation, among other things, of the codes needed to apprehend new kinds of data and issues (Michael Copeland, “What’s the difference between AI, machine learning and deep learning?”, NVIDIA Blog, July 29, 2016).
For example, since 2016, the Hangzhou municipal government has integrated artificial intelligence, notably with “city brain”, which helps improving traffic efficiency through the use of the big data streams generated by a myriad of captors and cameras. The “city brain” project is led the giant technology company Alibaba. This “intelligentization” of traffic management helps reduce traffic jam, improves street surveillance, as well as air pollution for the 9 millions residents of Hangzhou. However, it is only the first step before turning the city into an intelligent and sustainable smart city (Du Yifei, “Hangzhou growing “smarter” thanks to AI technology”, People’s Daily, October 20, 2017).
Through the developing internet of things (IoT), the convergence of “intelligent” infrastructures, of big data management, and of urban artificial intelligence is going to be increasingly important to improve traffic, and thus energy efficiency, air pollution and economic development (Sarah Hsu, “China is investing heavily into Artificial intelligence, and could soon catch up with US”, Forbes, July 3, 2017). The Hangzhou experiment is duplicated in Suzhou, Quzhou and Macao.
Meanwhile, Baidu Inc, the Chinese largest search engine, develops a partnership with the Shanxi province in order to implement “city brain”, which is dedicated to create smart cities in the northern province, while improving coal mining management and chemical treatment (“Baidu partners with Shanxi province to integrate AI with city management”, China Money Network, July 13). As a result, the AI is going to be used to alleviate the use of this energy, which is also responsible of the Chinese “airpocalypse” (Jean-Michel Valantin, “The Arctic, Russia and China’s energy transition”, The Red (Team) Analysis Society, February 2, 2017).
In the meantime, Tencent, another mammoth Chinese technology company, is multiplying partnerships with 14 Chinese provinces and 50 cities to develop and integrate urban artificial intelligences. In the same time, the Hong Kong government is getting ready to implement an artificial intelligence program to tackle the 21st urban challenges, chief among them urban development management and climate change impacts.
When looking closely at this development of artificial intelligence in order to support the management of Chinese cities and at the multiplication of smart cities, we notice both also coincide with the political will aimed at reducing the growth of already clogged Chinese megacities of more than ten million people – such as
Beijing (21,5 millions people),
Shanghai (25 millions), and
the urban areas around them – and of the network of very great cities where more than 5 to 10 million people live.
Indeed, the problem is that these very large cities and megalopolis have reached highly dangerous levels of water and air pollution, hence the “airpocalypse”, created by the noxious mix of car fume and coal plants exhaust.
From the intelligentization of Chinese cities to the “smart cars revolution”
This Chinese AI-centred urban development strategy also drives a gigantic urban, technological and industrial revolution, that turns China into a possible world leader
in clean energy,
in electric and smart cars and
in urban development.
The development of the new generations of smart car is thus going to be coupled with latest advances in artificial intelligence. As a result, China can position itself in the “middle” of the major trends of globalization. Indeed, smart electric cars are the “new frontier” of the car industry that supports the economy of great economic powers as such as the U.S., Japan, and Germany (Michael Klare, Blood and oil, 2005), while artificial intelligence is the new frontier of industry and the building of the future. The emergence of China as an “electric and smart cars” provider could have massive implications for the industrial and economic development of these countries.
In 2015, in the case of Shanghai, the number of cars grew by more than 13%, reaching the staggering total of 2.5 million cars in a 25 millions people strong megacity. In order to mitigate the impact of the car flow on the atmosphere, the municipal authorities use new “smart street” technologies. For example, the Ningbo-Hangzhou-Shanghai highway, daily used by more than 40 000 cars, is being equipped with a cyber network allowing drivers to pay tolls in advance with their smartphones. This application allows a significant decrease in pollution, because the lines of thousands of cars stopping in front of paybooths are reduced (“Chinese “smart cities” to number 500 before end of 2017”, China Daily, 21 April 2017).
In the meantime, the tech giant Tencent, the creator of WeChat, the enormous Chinese social network, which attracts more than 889 million users per month (“2017 WeChat Users Behavior Report”, China Channel, April 25, 2017), is developing a partnership with the Guangzhou automobile Group to develop smart cars. Baidu is doing the same with the Chinese BYD, Chery and BAIC, while launching Apollo, the open source platform on AI-powered smart cars. Alibaba, the giant of e-commerce, with more than 454 millions users during the first quarter of 2017 (“Number of active buyers across Alibaba’s online shopping properties from 2nd quarter 2012 to 1st quarter 2017 (in millions)”, Statista, The Statistical Portal, 2017) is developing a partnership with the Chinese brand SAIC motors and has already launched the Yunos System, which connects cars to the cloud and internet services. (Charles Clover and Sherry Fei Ju, “Tencent and Guangzhou team up to produce smart cars“, Financial Times, 19 september 2017).
It must be kept in mind that these three Chinese giant tech companies are thus connecting the development of their own services with artificial intelligence development, notably with smart cars development, in the context of the urban, digital and ecological transformation of China. In other terms, “city brains” and “smart cars” are going to become an immense “digital ecosystem” that artificial intelligences are going to manage, thus giving China an imposing technological edge.
This means that artificial intelligence is becoming the common support of the social and urban transformation of China, as well as the ways and means of the transformation of the Chinese urban network into smart cities. It is also a scientific, technological and industrial revolution.
This revolution is going to be based on the new international distribution of power between artificial intelligence-centred countries, and the others.
Indeed, in China, artificial intelligence is creating new social, economic and political conditions. This means that China is using artificial intelligence in order to manage its own social evolution, while becoming a mammoth artificial intelligence great power.
It now remains to be seen how the latest generations of smart cities powered by developing artificial intelligence accompanies the way some countries are getting ready for the economic, industrial and ecological, as well as security and military challenges of the 21 century, and how this urban and artificial intelligence is preparing an immense geopolitical revolution. This revolution is going to be based on the new international distribution of power between artificial intelligence-centred countries, and the others.
About the author: Jean-Michel Valantin (PhD Paris) leads the Environment and Geopolitics Department of The Red (Team) Analysis Society. He is specialised in strategic studies and defence sociology with a focus on environmental geostrategy.
As a new Nature paper points out, “There are an astonishing 10 to the
power of 170 possible board configurations in Go—more than the number of
atoms in the known universe.” (Image: DeepMind)
Remember AlphaGo, the first artificial intelligence to defeat a grandmaster at Go?
Well, the program just got a major upgrade, and it can now teach itself how to dominate the game without any human intervention. But get this: In a tournament that pitted AI against AI, this juiced-up version, called AlphaGo Zero, defeated the regular AlphaGo by a whopping 100 games to 0, signifying a major advance in the field. Hear that? It’s the technological singularity inching ever closer.A new paper published in Nature today describes how the artificially intelligent system that defeated Go grandmaster Lee Sedol in 2016 got its digital ass kicked by a new-and-improved version of itself. And it didn’t just lose by a little—it couldn’t even muster a single win after playing a hundred games. Incredibly, it took AlphaGo Zero (AGZ) just three days to train itself from scratch and acquire literally thousands of years of human Go knowledge simply by playing itself. The only input it had was what it does to the positions of the black and white pieces on the board.
In addition to devising completely new strategies,
the new system is also considerably leaner and meaner than the original AlphaGo.
Lee Sedol getting crushed by AlphaGo in 2016. (Image: AP)
Now, every once in a while the field of AI experiences a “holy shit” moment, and this would appear to be one of those moments. Looking back, other “holy shit” moments include:
Deep Blue defeating Garry Kasparov at chess in 1997,
IBM’s Watson defeating two of the world’s best Jeopardy! champions in 2011,
the aforementioned defeat of Lee Sedol in 2016, and
This latest achievement qualifies as a “holy shit” moment for a number of reasons.
First of all, the original AlphaGo had the benefit of learning from literally thousands of previously played Go games, including those played by human amateurs and professionals. AGZ, on the other hand, received no help from its human handlers, and had access to absolutely nothing aside from the rules of the game. Using “reinforcement learning,” AGZ played itself over and over again, “starting from random play, and without any supervision or use of human data,” according to the Google-owned DeepMind researchers in their study. This allowed the system to improve and refine its digital brain, known as a neural network, as it continually learned from experience. This basically means that AlphaGo Zero was its own teacher.
“This technique is more powerful than previous versions of AlphaGo because it is no longer constrained by the limits of human knowledge,” notes the DeepMind team in a release. “Instead, it is able to learn tabula rasa [from a clean slate] from the strongest player in the world: AlphaGo itself.”
When playing Go, the system considers the most probable next moves (a “policy network”), and then estimates the probability of winning based on those moves (its “value network”). AGZ requires about 0.4 seconds to make these two assessments. The original AlphaGo was equipped with a pair of neural networks to make similar evaluations, but for AGZ, the Deepmind developers merged the policy and value networks into one, allowing the system to learn more efficiently. What’s more, the new system is powered by four tensor processing units (TPUS)—specialized chips for neural network training. Old AlphaGo needed 48 TPUs.
After just three days of self-play training and a total of 4.9 million games played against itself, AGZ acquired the expertise needed to trounce AlphaGo (by comparison, the original AlphaGo had 30 million games for inspiration). After 40 days of self-training, AGZ defeated another, more sophisticated version of AlphaGo called AlphaGo “Master” that defeated the world’s best Go players and the world’s top ranked Go player, Ke Jie. Earlier this year, both the original AlphaGo and AlphaGo Master won a combined 60 games against top professionals. The rise of AGZ, it would now appear, has made these previous versions obsolete.
“The time when humans can have a meaningful conversation with an AI has always seemed far off and the stuff of science fiction. But for Go players, that day is here.”
This is a major achievement for AI, and the subfield of reinforcement learning in particular. By teaching itself, the system matched and exceeded human knowledge by an order of magnitude in just a few days, while also developing
unconventional strategies and
creative new moves.
For Go players, the breakthrough is as sobering as it is exciting; they’re learning things from AI that they could have never learned on their own, or would have needed an inordinate amount of time to figure out.
“[AlphaGo Zero’s] games against AlphaGo Master will surely contain gems, especially because its victories seem effortless,” wrote Andy Okun and Andrew Jackson, members of the American Go Association, in a Nature News and Views article. “At each stage of the game, it seems to gain a bit here and lose a bit there, but somehow it ends up slightly ahead, as if by magic… The time when humans can have a meaningful conversation with an AI has always seemed far off and the stuff of science fiction. But for Go players, that day is here.”
No doubt, AGZ represents a disruptive advance in the world of Go, but what about its potential impact on the rest of the world? According to Nick Hynes, a grad student at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), it’ll be a while before a specialized tool like this will have an impact on our daily lives.“So far, the algorithm described only works for problems where there are a countable number of actions you can take, so it would need modification before it could be used for continuous control problems like locomotion [for instance],” Hynes told Gizmodo. “Also, it requires that you have a really good model of the environment. In this case, it literally knows all of the rules. That would be as if you had a robot for which you could exactly predict the outcomes of actions—which is impossible for real, imperfect physical systems.”
The nice part, he says, is that there are several other lines of AI research that address both of these issues (e.g. machine learning, evolutionary algorithms, etc.), so it’s really just a matter of integration. “The real key here is the technique,” says Hynes.
“It’s like an alien civilization inventing its own mathematics which allows it to do things like time travel…Although we’re still far from ‘The Singularity,’ we’re definitely heading in that direction.”
“As expected—and desired—we’re moving farther away from the classic pattern of getting a bunch of human-labeled data and training a model to imitate it,” he said. “What we’re seeing here is a model free from human bias and presuppositions: It can learn whatever it determines is optimal, which may indeed be more nuanced that our own conceptions of the same. It’s like an alien civilization inventing its own mathematics which allows it to do things like time travel,” to which he added: “Although we’re still far from ‘The Singularity,’ we’re definitely heading in that direction.”
Noam Brown, a Carnegie Mellon University computer scientist who helped to develop the first AI to defeat top humans in no-limit poker, says the DeepMind researchers have achieved an impressive result, and that it could lead to bigger, better things in AI.
“While the original AlphaGo managed to defeat top humans, it did so partly by relying on expert human knowledge of the game and human training data,” Brown told Gizmodo. “That led to questions of whether the techniques could extend beyond Go. AlphaGo Zero achieves even better performance without using any expert human knowledge. It seems likely that the same approach could extend to all perfect-information games [such as chess and checkers]. This is a major step toward developing general-purpose AIs.”
As both Hynes and Brown admit, this latest breakthrough doesn’t mean the technological singularity—that hypothesized time in the future when greater-than-human machine intelligence achieves explosive growth—is imminent. But it should cause pause for thought. Once
we teach a system the rules of a game or
the constraints of a real-world problem,
the power of reinforcement learning makes it possible to simply press the start button and let the system do the rest. It will then figure out the best ways to succeed at the task, devising solutions and strategies that are beyond human capacities, and possibly even human comprehension.
As noted, AGZ and the game of Go represent an oversimplified, constrained, and highly predictable picture of the world, but in the future, AI will be tasked with more complex challenges. Eventually, self-teaching systems will be used to solve more pressing problems, such as protein folding to conjure up new medicines and biotechnologies, figuring out ways to reduce energy consumption, or when we need to design new materials. A highly generalized self-learning system could also be tasked with improving itself, leading to artificial general intelligence (i.e. a very human-like intelligence) and even artificial superintelligence.
As the DeepMind researchers conclude in their study, “Our results comprehensively demonstrate that a pure reinforcement learning approach is fully feasible, even in the most challenging of domains: it is possible to train to superhuman level, without human examples or guidance, given no knowledge of the domain beyond basic rules.”
And indeed, now that human players are no longer dominant in games like chess and Go, it can be said that we’ve already entered into the era of superintelligence. This latest breakthrough is the tiniest hint of what’s still to come.
Geoffrey Hinton harbors doubts about AI’s current workhorse. (Johnny Guatto / University of Toronto)
Steve LeVine Sep 15
In 1986, Geoffrey Hinton co-authored a paper that, four decades later, is central to the explosion of artificial intelligence. But Hinton says his breakthrough method should be dispensed with, and a new path to AI found.
Speaking with Axios on the sidelines of an AI conference in Toronto on Wednesday, Hinton, a professor emeritus at the University of Toronto and a Google researcher, said he is now “deeply suspicious” of back-propagation, the workhorse method that underlies most of the advances we are seeing in the AI field today, including the capacity to sort through photos and talk to Siri. “My view is throw it all away and start again,” he said.
The bottom line:Other scientists at the conference said back-propagation still has a core role in AI’s future. But Hinton said that, to push materially ahead, entirely new methods will probably have to be invented. “Max Planck said, ‘Science progresses one funeral at a time.’ The future depends on some graduate student who is deeply suspicious of everything I have said.”
How it works: In back propagation, labels or “weights” are used to represent a photo or voice within a brain-like neural layer. The weights are then adjusted and readjusted, layer by layer, until the network can perform an intelligent function with the fewest possible errors.
But Hinton suggested that, to get to where neural networks are able to become intelligent on their own, what is known as “unsupervised learning,” “I suspect that means getting rid of back-propagation.”
“I don’t think it’s how the brain works,” he said. “We clearly don’t need all the labeled data.”
En 2013, tras dos años de Proyecto, el robot Todai sacó una nota suficientemente buena para ser admitido en 472 de 581 universidades privadas. En 2016, su nota estuvo entre el 20% de las mejores en los exámenes tipo test, y en entre el 1% de los mejores en uno de los dos exámenes de matemáticas. Además, fue capaz de escribir una redacción sobre el comercio marítimo del siglo XVII mejor que la mayoría de los estudiantes. “Tomó información del libro de texto y de Wikipedia y la combinó sin entender ni pizca”, explicó Arai durante su reciente charla TED en Vancouver. “Ni Watson, ni Siri, ni Todai Robot pueden leer. La inteligencia artificial no puede entender, solo hace como que entiende”.
Más que contenta por su robot, Arai quedó alarmada por los resultados. “¿Cómo es posible que esta máquina no inteligente lo hiciera mejor que nuestros niños?”, se preguntaba. Preocupada por el futuro laboral de las nuevas generaciones, realizó un experimento con estudiantes y descubrió que un tercio de ellos fallaron preguntas sencillas porque no leen bien, un problema que, piensa, existe en todo el mundo. “Nosotros, los humanos, podemos comprender el significado de las cosas, algo que no puede hacer la inteligencia artificial. Pero la mayoría de los estudiantes reciben conocimiento sin comprender el significado, y eso no es conocimiento, es memorización, y la inteligencia artificial puede hacer lo mismo. Debemos crear un nuevo sistema educativo”.
Pregunta: ¿Por qué decidió una matemática como usted meterse en el mundo de los robots?
Respuesta: La inteligencia artificial consiste en intentar escribir el pensamiento en lenguaje matemático. No hay otra forma para que la inteligencia artificial sea inteligente. Como matemática creo que el pensamiento no puede escribirse en el lenguaje matemático. Descartes dijo lo mismo. Mi primera impresión fue que la inteligencia artificial es imposible. Utiliza probabilidad y estadística sumada a la lógica. En el siglo XX se usaba solo la lógica, y por supuesto que no todo puede ser escrito con lógica, como los sentimientos, por ejemplo. Ahora están usando estadística, imitando el pasado para ver cómo actuar cuando encontremos cosas nuevas.
P. No le gusta cuando la gente dice que la inteligencia artificial podría conquistar el mundo… R. Estoy harta de esa imagen, por eso decidí crear un robot muy inteligente, utilizando lo último en investigación para ver sus limitaciones. Watson de IBM y Google Car, por ejemplo, tienden a mostrar solo las cosas buenas. Nosotros queremos mostrarlo todo. También lo que no es capaz de hacer.
P. Al intentar mejorar la inteligencia artificial, usted vio que había que mejorar la educación.
R. Sabía que mi robot era ininteligente, cargado de conocimientos que no sabe cómo usar correctamente porque no entiende el significado. Quedé estupefacta al ver que este robot que no es inteligente escribió una redacción mejor que la mayoría de los estudiantes. Así que decidí investigar lo que estaba ocurriendo en el mundo humano. Estaría más contenta si hubiera descubierto que la inteligencia artificial adelantó a los estudiantes porque es mejor en memorizar y computar, pero ese no era el caso. El robot no comprende el significado, pero tampoco la mayoría de los estudiantes.
P. ¿Cree usted que el problema es que dependemos tanto de Siri y Google para resolver nuestras dudas que por eso no procesamos la información bien? R. Estamos analizando el porqué. Algo que podemos ver es que antes todo el mundo leía el periódico, incluso la gente pobre. Pero ahora la mayoría de las parejas jóvenes no leen el diario porque lo tienen en su teléfono. No compran libros porque la mayoría de las historias están en blogs. No tienen calendario o hasta reloj en casa porque lo tienen en el teléfono. Los niños se crían sin números ni letras en su ambiente. Y también tienden a tener conversaciones en mensajes de texto muy cortos. Tienen menos oportunidades de leer, creo.
P. Parte del proyecto Todai es ver qué tipo de trabajos la inteligencia artificial podría quitarle a los humanos. R. En Japón, antes, todo el mundo era clase media, no había gente muy rica, ni gente muy pobre. Pero cuando la inteligencia artificial llega a una sociedad se lleva muchos trabajos, incluidos los puestos de banqueros o analistas. Quienes pierden su trabajo por culpa de la inteligencia artificial puede que no encuentren otro en mucho tiempo. Quizás haya trabajos como corregir los errores cometidos por la inteligencia artificial, trabajos muy duros y más insignificantes que nunca, como en Tiempos Modernos de Chaplin. Alguien con talento, creativo, inteligente, determinado, bueno en la lectura y la escritura, tendrá más oportunidades que nunca porque incluso si nació en un pueblo, mientras tenga acceso a Internet dispondrá de mucha información para aprender gratuitamente y llegar a hacerse millonario. Es mucho más fácil comenzar un negocio que en el siglo XX. Pero alguien que no tiene ese tipo de inteligencia, probablemente se quede atrapado entre las multitudes. Lo que pasa es que todos tienen derecho a voto, y, en ese sentido, somos todos iguales. Si cada vez hay más y más gente que se siente atrapada y solo la gente inteligente gana dinero, y los utiliza para ganar más dinero, pensarán mal de la sociedad, odiarán a la sociedad, y las consecuencias las sufriremos todos, en todo el mundo.
P. ¿Cuál piensa que es la solución? R.Ahora es el momento de hacer que nuestros niños sean más inteligentes que la inteligencia artificial. He inaugurado el Instituto de Investigación de la Ciencia para la Educación este mes para investigar cuántos estudiantes tienen malos hábitos de lectura y escritura, y por qué, y ver cómo podemos ayudarles a modificar esos hábitos para que puedan adelantar al robot usando su poderío humano. Me gustaría que estuviéramos como en Japón de los años setenta, cuando todo el mundo era de clase media, todos nos ayudábamos y no necesitábamos más dinero del que somos capaces de gastar en nuestra vida. Todo el mundo debería estar bien educado, saber leer y escribir, pero no solo el significado literal. Todos deberíamos aprender con profundidad, leer con profundidad para poder mantener nuestro trabajo.
The long-standing dream of using Artificial Intelligence (AI) to build an artificial brain has taken a significant step forward, as a team led by Professor Newton Howard from the University of Oxford has successfully prototyped a nanoscale, AI-powered, artificial brain in the form factor of a high-bandwidth neural implant.
Professor Newton Howard (pictured above and below) holding parts of the implant device
This achievement caps over a decade of research by Professor Howard at MIT’s Synthetic Intelligence Lab and the University of Oxford, work that resulted in several issued US patents on the technologies and algorithms that power the device,
the Fundamental Code Unit of the Brain (FCU),
the Brain Code (BC) and the Biological Co-Processor (BCP)
are the latest advanced foundations for any eventual merger between biological intelligence and human intelligence. Ni2o (pronounced “Nitoo”) is the entity that Professor Howard licensed to further develop, market and promote these technologies.
The Biological Co-Processor is unique in that it uses advanced nanotechnology, optogenetics and deep machine learning to intelligently map internal events, such as neural spiking activity, to external physiological, linguistic and behavioral expression. The implant contains over a million carbon nanotubes, each of which is 10,000 times smaller than the width of a human hair. Carbon nanotubes provide a natural, high-bandwidth interface as they conduct heat, light and electricity instantaneously updating the neural laces. They adhere to neuronal constructs and even promote neural growth. Qualcomm team leader Rudy Beraha commented, ‘Although the prototype unit shown today is tethered to external power, a commercial Brain Co-Processor unit will be wireless and inductively powered, enabling it to be administered with a minimally-invasive procedures.‘
The device uses a combination of methods to write to the brain, including
various molecules that simulate or inhibit the activation of specific neuronal groups.
These can be targeted to stimulate a desired response, such as releasing chemicals in patients suffering from a neurological disorder or imbalance. The BCP is designed as a fully integrated system to use the brain’s own internal systems and chemistries to pattern and mimic healthy brain behavior, an approach that stands in stark contrast to the current state of the art, which is to simply apply mild electrocution to problematic regions of the brain.
The Biological Co-Processor promises to provide relief for millions of patients suffering from neurological, psychiatric and psychological disorders as well as degenerative diseases. Initial therapeutic uses will likely be for patients with traumatic brain injuries and neurodegenerative disorders, such as Alzheimer’s, as the BCP will strengthen the weak, shortening connections responsible for lost memories and skills. Once implanted, the device provides a closed-loop, self-learning platform able to both determine and administer the perfect balance of pharmaceutical, electroceutical, genomeceutical and optoceutical therapies.
Dr Richard Wirt, a Senior Fellow at Intel Corporation and Co-Founder of INTENT, the company’s partner of Ni2o bringing BCP to market, commented on the device, saying, ‘In the immediate timeframe, this device will have many benefits for researchers, as it could be used to replicate an entire brain image, synchronously mapping internal and external expressions of human response. Over the long term, the potential therapeutic benefits are unlimited.‘
The brain controls all organs and systems in the body, so the cure to nearly every disease resides there.- Professor Newton Howard
Rather than simply disrupting neural circuits, the machine learning systems within the BCP are designed to interpret these signals and intelligently read and write to the surrounding neurons. These capabilities could be used to reestablish any degenerative or trauma-induced damage and perhaps write these memories and skills to other, healthier areas of the brain.
One day, these capabilities could also be used in healthy patients to radically augment human ability and proactively improve health. As Professor Howard points out: ‘The brain controls all organs and systems in the body, so the cure to nearly every disease resides there.‘ Speaking more broadly, Professor Howard sees the merging of man with machine as our inevitable destiny, claiming it to be ‘the next step on the blueprint that the author of it all built into our natural architecture.‘
With the resurgence of neuroscience and AI enhancing machine learning, there has been renewed interest in brain implants. This past March, Elon Musk and Bryan Johnson independently announced that they are focusing and investing in for the brain/computer interface domain.
When asked about these new competitors, Professor Howard said he is happy to see all these new startups and established names getting into the field – he only wonders what took them so long, stating: ‘I would like to see us all working together, as we have already established a mathematical foundation and software framework to solve so many of the challenges they will be facing. We could all get there faster if we could work together – after all, the patient is the priority.‘
Developments and advances in artificial intelligence (AI) have been due in large part to technologies that mimic how the human brain works. In the world of information technology, such AI systems are called neural networks.
These contain algorithms that can be trained, among other things, to imitate how the brain recognises speech and images. However, running an Artificial Neural Network consumes a lot of time and energy.
It paves the way for intelligent systems that required less time and energy to learn, and it can learn autonomously.
In the human brain, synapses work as connections between neurons. The connections are reinforced and learning is improved the more these synapses are stimulated.
The memristor works in a similar fashion. It’s made up of a thin ferroelectric layer (which can be spontaneously polarised) that is enclosed between two electrodes.
Using voltage pulses, their resistance can be adjusted, like biological neurons. The synaptic connection will be strong when resistance is low, and vice-versa.
(a) Sketch of pre- and post-neurons connected by a synapse. The synaptic transmission is modulated by the causality (Δt) of neuron spikes. (b) Sketch of the ferroelectric memristor where a ferroelectric tunnel barrier of BiFeO3 (BFO) is sandwiched between a bottom electrode of (Ca,Ce)MnO3 (CCMO) and a top submicron pillar of Pt/Co. YAO stands for YAlO3. (c) Single-pulse hysteresis loop of the ferroelectric memristor displaying clear voltage thresholds ( and ). (d) Measurements of STDP in the ferroelectric memristor. Modulation of the device conductance (ΔG) as a function of the delay (Δt) between pre- and post-synaptic spikes. Seven data sets were collected on the same device showing the reproducibility of the effect. The total length of each pre- and post-synaptic spike is 600 ns. Source: Nature Communications
The memristor’s capacity for learning is based on this adjustable resistance.
AI systems have developed considerably in the past couple of years. Neural networks built with learning algorithms are now capable of performing tasks which synthetic systems previously could not do.
Deep-learning machines already have superhuman skills when it comes to tasks such as
video-game playing, and
even the ancient Chinese game of Go.
So it’s easy to think that humans are already outgunned.
But not so fast. Intelligent machines still lag behind humans in one crucial area of performance: the speed at which they learn. When it comes to mastering classic video games, for example, the best deep-learning machines take some 200 hours of play to reach the same skill levels that humans achieve in just two hours.
So computer scientists would dearly love to have some way to speed up the rate at which machines learn.
Today, Alexander Pritzel and pals at Google’s DeepMind subsidiary in London claim to have done just that. These guys have built a deep-learning machine that is capable of rapidly assimilating new experiences and then acting on them. The result is a machine that learns significantly faster than others and has the potential to match humans in the not too distant future.
First, some background.
Deep learning uses layers of neural networks to look for patterns in data. When a single layer spots a pattern it recognizes, it sends this information to the next layer, which looks for patterns in this signal, and so on.
So in face recognition,
one layer might look for edges in an image,
the next layer for circular patterns of edges (the kind that eyes and mouths make), and
the next for triangular patterns such as those made by two eyes and a mouth.
When all this happens, the final output is an indication that a face has been spotted.
Of course, the devil is in the details. There are various systems of feedback to allow the system to learn by adjusting various internal parameters such as the strength of connections between layers. These parameters must change slowly, since a big change in one layer can catastrophically affect learning in the subsequent layers. That’s why deep neural networks need so much training and why it takes so long.
Pritzel and co have tackled this problem with a technique they call Neural Episodic Control. “Neural episodic control demonstrates dramatic improvements on the speed of learning for a wide range of environments,” they say. “Critically, our agent is able to rapidly latch onto highly successful strategies as soon as they are experienced, instead of waiting for many steps of optimisation.”
The basic idea behind DeepMind’s approach is to copy the way humans and animals learn quickly. The general consensus is that humans can tackle situations in two different ways.
If the situation is familiar, our brains have already formed a model of it, which they use to work out how best to behave. This uses a part of the brain called the prefrontal cortex.
But when the situation is not familiar, our brains have to fall back on another strategy. This is thought to involve a much simpler test-and-remember approach involving the hippocampus. So we try something and remember the outcome of this episode. If it is successful, we try it again, and so on. But if it is not a successful episode, we try to avoid it in future.
This episodic approach suffices in the short term while our prefrontal brain learns. But it is soon outperformed by the prefrontal cortex and its model-based approach.
Pritzel and co have used this approach as their inspiration. Their new system has two approaches.
The first is a conventional deep-learning system that mimics the behaviur of the prefrontal cortex.
The second is more like the hippocampus. When the system tries something new, it remembers the outcome.
But crucially, it doesn’t try to learn what to remember. Instead, it remembers everything. “Our architecture does not try to learn when to write to memory, as this can be slow to learn and take a significant amount of time,” say Pritzel and co. “Instead, we elect to write all experiences to the memory, and allow it to grow very large compared to existing memory architectures.”
They then use a set of strategies to read from this large memory quickly. The result is that the system can latch onto successful strategies much more quickly than conventional deep-learning systems.
They go on to demonstrate how well all this works by training their machine to play classic Atari video games, such as Breakout, Pong, and Space Invaders. (This is a playground that DeepMind has used to train many deep-learning machines.)
The team, which includes DeepMind cofounder Demis Hassibis, shows that neural episodic control vastly outperforms other deep-learning approaches in the speed at which it learns. “Our experiments show that neural episodic control requires an order of magnitude fewer interactions with the environment,” they say.
That’s impressive work with significant potential. The researchers say that an obvious extension of this work is to test their new approach on more complex 3-D environments.
It’ll be interesting to see what environments the team chooses and the impact this will have on the real world. We’ll look forward to seeing how that works out.
Neuromorphic chips are being designed to specifically mimic the human brain – and they could soon replace CPUs
BRAIN ACTIVITY MAP
AI services like Apple’s Siri and others operate by sending your queries to faraway data centers, which send back responses. The reason they rely on cloud-based computing is that today’s electronics don’t come with enough computing power to run the processing-heavy algorithms needed for machine learning. The typical CPUs most smartphones use could never handle a system like Siri on the device. But Dr. Chris Eliasmith, a theoretical neuroscientist and co-CEO of Canadian AI startup Applied Brain Research, is confident that a new type of chip is about to change that.
“Many have suggested Moore’s law is ending and that means we won’t get ‘more compute’ cheaper using the same methods,” Eliasmith says. He’s betting on the proliferation of ‘neuromorphics’ — a type of computer chip that is not yet widely known but already being developed by several major chip makers.
Traditional CPUs process instructions based on “clocked time” – information is transmitted at regular intervals, as if managed by a metronome. By packing in digital equivalents of neurons, neuromorphics communicate in parallel (and without the rigidity of clocked time) using “spikes” – bursts of electric current that can be sent whenever needed. Just like our own brains, the chip’s neurons communicate by processing incoming flows of electricity – each neuron able to determine from the incoming spike whether to send current out to the next neuron.
What makes this a big deal is that these chips require far less power to process AI algorithms. For example, one neuromorphic chip made by IBM contains five times as many transistors as a standard Intel processor, yet consumes only 70 milliwatts of power. An Intel processor would use anywhere from 35 to 140 watts, or up to 2000 times more power.
Eliasmith points out that neuromorphics aren’t new and that their designs have been around since the 80s. Back then, however, the designs required specific algorithms be baked directly into the chip. That meant you’d need one chip for detecting motion, and a different one for detecting sound. None of the chips acted as a general processor in the way that our own cortex does.
This was partly because there hasn’t been any way for programmers to design algorithms that can do much with a general purpose chip. So even as these brain-like chips were being developed, building algorithms for them has remained a challenge.
Eliasmith and his team are keenly focused on building tools that would allow a community of programmers to deploy AI algorithms on these new cortical chips.
Central to these efforts is Nengo, a compiler that developers can use to build their own algorithms for AI applications that will operate on general purpose neuromorphic hardware. Compilers are a software tool that programmers use to write code, and that translate that code into the complex instructions that get hardware to actually do something. What makes Nengo useful is its use of the familiar Python programming language – known for it’s intuitive syntax – and its ability to put the algorithms on many different hardware platforms, including neuromorphic chips. Pretty soon, anyone with an understanding of Python could be building sophisticated neural nets made for neuromorphic hardware.
“Things like vision systems, speech systems, motion control, and adaptive robotic controllers have already been built with Nengo,” Peter Suma, a trained computer scientist and the other CEO of Applied Brain Research, tells me.
Perhaps the most impressive system built using the compiler is Spaun, a project that in 2012 earned international praise for being the most complex brain model ever simulated on a computer. Spaun demonstrated that computers could be made to interact fluidly with the environment, and perform human-like cognitive tasks like recognizing images and controlling a robot arm that writes down what it’s sees. The machine wasn’t perfect, but it was a stunning demonstration that computers could one day blur the line between human and machine cognition. Recently, by using neuromorphics, most of Spaun has been run 9000x faster, using less energy than it would on conventional CPUs – and by the end of 2017, all of Spaun will be running on Neuromorphic hardware.
Eliasmith won NSERC’s John C. Polyani award for that project — Canada’s highest recognition for a breakthrough scientific achievement – and once Suma came across the research, the pair joined forces to commercialize these tools.
“While Spaun shows us a way towards one day building fluidly intelligent reasoning systems, in the nearer term neuromorphics will enable many types of context aware AIs,” says Suma. Suma points out that while today’s AIs like Siri remain offline until explicitly called into action, we’ll soon have artificial agents that are ‘always on’ and ever-present in our lives.
“Imagine a SIRI that listens and sees all of your conversations and interactions. You’ll be able to ask it for things like – “Who did I have that conversation about doing the launch for our new product in Tokyo?” or “What was that idea for my wife’s birthday gift that Melissa suggested?,” he says.
When I raised concerns that some company might then have an uninterrupted window into even the most intimate parts of my life, I’m reminded that because the AI would be processed locally on the device, there’s no need for that information to touch a server owned by a big company. And for Eliasmith, this ‘always on’ component is a necessary step towards true machine cognition. “The most fundamental difference between most available AI systems of today and the biological intelligent systems we are used to, is the fact that the latter always operate in real-time. Bodies and brains are built to work with the physics of the world,” he says.
Already, major efforts across the IT industry are heating up to get their AI services into the hands of users. Companies like Apple, Facebook, Amazon, and even Samsung, are developing conversational assistants they hope will one day become digital helpers.
Guessing the location of a randomly chosen Street View image is hard, even for well-traveled humans. But Google’s latest artificial-intelligence machine manages it with relative ease. Here’s a tricky task. Pick a photograph from the Web at random. Now try to work out where it was taken using only the image itself. If the image shows a famous building or landmark, such as the Eiffel Tower or Niagara Falls, the task is straightforward. But the job becomes significantly harder when the image lacks specific location cues or is taken indoors or shows a pet or food or some other detail.Nevertheless, humans are surprisingly good at this task. To help, they bring to bear all kinds of knowledge about the world such as the type and language of signs on display, the types of vegetation, architectural styles, the direction of traffic, and so on. Humans spend a lifetime picking up these kinds of geolocation cues.So it’s easy to think that machines would struggle with this task. And indeed, they have.
Today, that changes thanks to the work of Tobias Weyand, a computer vision specialist at Google, and a couple of pals. These guys have trained a deep-learning machine to work out the location of almost any photo using only the pixels it contains.
Their new machine significantly outperforms humans and can even use a clever trick to determine the location of indoor images and pictures of specific things such as pets, food, and so on that have no location cues.
Their approach is straightforward, at least in the world of machine learning.
Weyand and co begin by dividing the world into a grid consisting of over 26,000 squares of varying size that depend on the number of images taken in that location.
So big cities, which are the subjects of many images, have a more fine-grained grid structure than more remote regions where photographs are less common. Indeed, the Google team ignored areas like oceans and the polar regions, where few photographs have been taken.
Next, the team created a database of geolocated images from the Web and used the location data to determine the grid square in which each image was taken. This data set is huge, consisting of 126 million images along with their accompanying Exif location data.
Weyand and co used 91 million of these images to teach a powerful neural network to work out the grid location using only the image itself. Their idea is to input an image into this neural net and get as the output a particular grid location or a set of likely candidates.
They then validated the neural network using the remaining 34 million images in the data set.
Finally they tested the network—which they call PlaNet—in a number of different ways to see how well it works.
The results make for interesting reading. To measure the accuracy of their machine, they fed it 2.3 million geotagged images from Flickr to see whether it could correctly determine their location. “PlaNet is able to localize 3.6 percent of the images at street-level accuracy and 10.1 percent at city-level accuracy,” say Weyand and co. What’s more, the machine determines the country of origin in a further 28.4 percent of the photos and the continent in 48.0 percent of them.
That’s pretty good. But to show just how good, Weyand and co put PlaNet through its paces in a test against 10 well-traveled humans. For the test, they used an online game that presents a player with a random view taken from Google Street View and asks him or her to pinpoint its location on a map of the world.
Anyone can play at www.geoguessr.com. Give it a try—it’s a lot of fun and more tricky than it sounds.
GeoGuesser Screen Capture Example
Needless to say, PlaNet trounced the humans. “In total, PlaNet won 28 of the 50 rounds with a median localization error of 1131.7 km, while the median human localization error was 2320.75 km,” say Weyand and co. “[This] small-scale experiment shows that PlaNet reaches superhuman performance at the task of geolocating Street View scenes.”
An interesting question is how PlaNet performs so well without being able to use the cues that humans rely on, such as vegetation, architectural style, and so on. But Weyand and co say they know why: “We think PlaNet has an advantage over humans because it has seen many more places than any human can ever visit and has learned subtle cues of different scenes that are even hard for a well-traveled human to distinguish.”
They go further and use the machine to locate images that do not have location cues, such as those taken indoors or of specific items. This is possible when images are part of albums that have all been taken at the same place. The machine simply looks through other images in the album to work out where they were taken and assumes the more specific image was taken in the same place.
That’s impressive work that shows deep neural nets flexing their muscles once again. Perhaps more impressive still is that the model uses a relatively small amount of memory unlike other approaches that use gigabytes of the stuff. “Our model uses only 377 MB, which even fits into the memory of a smartphone,” say Weyand and co.
That’s a tantalizing idea—the power of a superhuman neural network on a smartphone. It surely won’t be long now!
New software does in seconds what took staff 360,000 hours Bank seeking to streamline systems, avoid redundancies
At JPMorgan Chase & Co., a learning machine is parsing financial deals that once kept legal teams busy for thousands of hours.
The program, called COIN, for Contract Intelligence, does the mind-numbing job of interpreting commercial-loan agreements that, until the project went online in June, consumed 360,000 hours of work each year by lawyers and loan officers. The software reviews documents in seconds, is less error-prone and never asks for vacation.
Attendees discuss software on Feb. 27, the eve of JPMorgan’s Investor Day.
Photographer: Kholood Eid/Bloomberg
While the financial industry has long touted its technological innovations, a new era of automation is now in overdrive as cheap computing power converges with fears of losing customers to startups. Made possible by investments in machine learning and a new private cloud network, COIN is just the start for the biggest U.S. bank. The firm recently set up technology hubs for teams specializing in big data, robotics and cloud infrastructure to find new sources of revenue, while reducing expenses and risks.
The push to automate mundane tasks and create new tools for bankers and clients — a growing part of the firm’s $9.6 billion technology budget — is a core theme as the company hosts its annual investor day on Tuesday.
Behind the strategy, overseen by Chief Operating Operating Officer Matt Zames and Chief Information Officer Dana Deasy, is an undercurrent of anxiety: Though JPMorgan emerged from the financial crisis as one of few big winners, its dominance is at risk unless it aggressively pursues new technologies, according to interviews with a half-dozen bank executives.
That was the message Zames had for Deasy when he joined the firm from BP Plc in late 2013. The New York-based bank’s internal systems, an amalgam from decades of mergers, had too many redundant software programs that didn’t work together seamlessly.“Matt said, ‘Remember one thing above all else: We absolutely need to be the leaders in technology across financial services,’” Deasy said last week in an interview. “Everything we’ve done from that day forward stems from that meeting.”
After visiting companies including Apple Inc. and Facebook Inc. three years ago to understand how their developers worked, the bank set out to create its own computing cloud called Gaia that went online last year. Machine learning and big-data efforts now reside on the private platform, which effectively has limitless capacity to support their thirst for processing power. The system already is helping the bank automate some coding activities and making its 20,000 developers more productive, saving money, Zames said. When needed, the firm can also tap into outside cloud services from Amazon.com Inc., Microsoft Corp. and International Business Machines Corp.
Tech SpendingJPMorgan will make some of its cloud-backed technology available to institutional clients later this year, allowing firms like BlackRock Inc. to access balances, research and trading tools. The move, which lets clients bypass salespeople and support staff for routine information, is similar to one Goldman Sachs Group Inc. announced in 2015.JPMorgan’s total technology budget for this year amounts to 9 percent of its projected revenue — double the industry average, according to Morgan Stanley analyst Betsy Graseck. The dollar figure has inched higher as JPMorgan bolsters cyber defenses after a 2014 data breach, which exposed the information of 83 million customers.
“We have invested heavily in technology and marketing — and we are seeing strong returns,” JPMorgan said in a presentation Tuesday ahead of its investor day, noting that technology spending in its consumer bank totaled about $1 billion over the past two years.
One-third of the company’s budget is for new initiatives, a figure Zames wants to take to 40 percent in a few years. He expects savings from automation and retiring old technology will let him plow even more money into new innovations.
Not all of those bets, which include several projects based on a distributed ledger, like blockchain, will pay off, which JPMorgan says is OK. One example executives are fond of mentioning: The firm built an electronic platform to help trade credit-default swaps that sits unused.
‘Can’t Wait’“We’re willing to invest to stay ahead of the curve, even if in the final analysis some of that money will go to product or a service that wasn’t needed,” Marianne Lake, the lender’s finance chief, told a conference audience in June. That’s “because we can’t wait to know what the outcome, the endgame, really looks like, because the environment is moving so fast.”As for COIN, the program has helped JPMorgan cut down on loan-servicing mistakes, most of which stemmed from human error in interpreting 12,000 new wholesale contracts per year, according to its designers.
JPMorgan is scouring for more ways to deploy the technology, which learns by ingesting data to identify patterns and relationships. The bank plans to use it for other types of complex legal filings like credit-default swaps and custody agreements. Someday, the firm may use it to help interpret regulations and analyze corporate communications.
Another program called X-Connect, which went into use in January, examines e-mails to help employees find colleagues who have the closest relationships with potential prospects and can arrange introductions.
Creating Bots For simpler tasks, the bank has created bots to perform functions like granting access to software systems and responding to IT requests, such as resetting an employee’s password, Zames said. Bots are expected to handle 1.7 million access requests this year, doing the work of 140 people.
Photographer: Kholood Eid/Bloomberg
While growing numbers of people in the industry worry such advancements might someday take their jobs, many Wall Street personnel are more focused on benefits. A survey of more than 3,200 financial professionals by recruiting firm Options Group last year found a majority expect new technology will improve their careers, for example by improving workplace performance.
“Anything where you have back-office operations and humans kind of moving information from point A to point B that’s not automated is ripe for that,” Deasy said. “People always talk about this stuff as displacement. I talk about it as freeing people to work on higher-value things, which is why it’s such a terrific opportunity for the firm.”
To help spur internal disruption, the company keeps tabs on 2,000 technology ventures, using about 100 in pilot programs that will eventually join the firm’s growing ecosystem of partners. For instance, the bank’s machine-learning software was built with Cloudera Inc., a software firm that JPMorgan first encountered in 2009.
“We’re starting to see the real fruits of our labor,” Zames said. “This is not pie-in-the-sky stuff.”
In a new automotive application, we have used convolutional neural networks (CNNs) to map the raw pixels from a front-facing camera to the steering commands for a self-driving car. This powerful end-to-end approach means that with minimum training data from humans, the system learns to steer, with or without lane markings, on both local roads and highways. The system can also operate in areas with unclear visual guidance such as parking lots or unpaved roads.
Figure 1: NVIDIA’s self-driving car in action.
We designed the end-to-end learning system using an NVIDIA DevBox running Torch 7 for training. An NVIDIA DRIVETM PXself-driving car computer, also with Torch 7, was used to determine where to drive—while operating at 30 frames per second (FPS). The system is trained to automatically learn the internal representations of necessary processing steps, such as detecting useful road features, with only the human steering angle as the training signal. We never explicitly trained it to detect, for example, the outline of roads. In contrast to methods using explicit decomposition of the problem, such as lane marking detection, path planning, and control, our end-to-end system optimizes all processing steps simultaneously.
We believe that end-to-end learning leads to better performance and smaller systems. Better performance results because the internal components self-optimize to maximize overall system performance, instead of optimizing human-selected intermediate criteria, e. g., lane detection. Such criteria understandably are selected for ease of human interpretation which doesn’t automatically guarantee maximum system performance. Smaller networks are possible because the system learns to solve the problem with the minimal number of processing steps.
Convolutional Neural Networks to Process Visual Data
CNNs have revolutionized the computational pattern recognition process. Prior to the widespread adoption of CNNs, most pattern recognition tasks were performed using an initial stage of hand-crafted feature extraction followed by a classifier. The important breakthrough of CNNs is that features are now learned automatically from training examples. The CNN approach is especially powerful when applied to image recognition tasks because the convolution operation captures the 2D nature of images. By using the convolution kernels to scan an entire image, relatively few parameters need to be learned compared to the total number of operations.
While CNNs with learned features have been used commercially for over twenty years , their adoption has exploded in recent years because of two important developments.
First, large, labeled data sets such as the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) are now widely available for training and validation.
Second, CNN learning algorithms are now implemented on massively parallel graphics processing units (GPUs), tremendously accelerating learning and inference ability.
The CNNs that we describe here go beyond basic pattern recognition. We developed a system that learns the entire processing pipeline needed to steer an automobile. The groundwork for this project was actually done over 10 years ago in a Defense Advanced Research Projects Agency (DARPA) seedling project known as DARPA Autonomous Vehicle (DAVE), in which a sub-scale radio control (RC) car drove through a junk-filled alley way. DAVE was trained on hours of human driving in similar, but not identical, environments. The training data included video from two cameras and the steering commands sent by a human operator.
In many ways, DAVE was inspired by the pioneering work of Pomerleau, who in 1989 built the Autonomous Land Vehicle in a Neural Network (ALVINN)system. ALVINN is a precursor to DAVE, and it provided the initial proof of concept that an end-to-end trained neural network might one day be capable of steering a car on public roads. DAVE demonstrated the potential of end-to-end learning, and indeed was used to justify starting the DARPA Learning Applied to Ground Robots (LAGR) program, but DAVE’s performance was not sufficiently reliable to provide a full alternative to the more modular approaches to off-road driving. (DAVE’s mean distance between crashes was about 20 meters in complex environments.)
About a year ago we started a new effort to improve on the original DAVE, and create a robust system for driving on public roads. The primary motivation for this work is to avoid the need to recognize specific human-designated features, such as lane markings, guard rails, or other cars, and to avoid having to create a collection of “if, then, else” rules, based on observation of these features. We are excited to share the preliminary results of this new effort, which is aptly named: DAVE–2.
The DAVE-2 System
Figure 2: High-level view of the data collection system.
Figure 2 shows a simplified block diagram of the collection system for training data of DAVE-2. Three cameras are mounted behind the windshield of the data-acquisition car, and timestamped video from the cameras is captured simultaneously with the steering angle applied by the human driver. The steering command is obtained by tapping into the vehicle’s Controller Area Network (CAN) bus. In order to make our system independent of the car geometry, we represent the steering command as 1/r, where r is the turning radius in meters. We use 1/r instead of r to prevent a singularity when driving straight (the turning radius for driving straight is infinity). 1/r smoothly transitions through zero from left turns (negative values) to right turns (positive values).
Training data contains single images sampled from the video, paired with the corresponding steering command (1/r). Training with data from only the human driver is not sufficient; the network must also learn how to recover from any mistakes, or the car will slowly drift off the road. The training data is therefore augmented with additional images that show the car in different shifts from the center of the lane and rotations from the direction of the road.
The images for two specific off-center shifts can be obtained from the left and the right cameras. Additional shifts between the cameras and all rotations are simulated through viewpoint transformation of the image from the nearest camera. Precise viewpoint transformation requires 3D scene knowledge which we don’t have, so we approximate the transformation by assuming all points below the horizon are on flat ground, and all points above the horizon are infinitely far away. This works fine for flat terrain, but for a more complete rendering it introduces distortions for objects that stick above the ground, such as cars, poles, trees, and buildings. Fortunately these distortions don’t pose a significant problem for network training. The steering label for the transformed images is quickly adjusted to one that correctly steers the vehicle back to the desired location and orientation in two seconds.
Figure 3: Training the neural network.
Figure 3 shows a block diagram of our training system. Images are fed into a CNN that then computes a proposed steering command. The proposed command is compared to the desired command for that image, and the weights of the CNN are adjusted to bring the CNN output closer to the desired output. The weight adjustment is accomplished using back propagation as implemented in the Torch 7 machine learning package.
Once trained, the network is able to generate steering commands from the video images of a single center camera. Figure 4 shows this configuration.
Figure 4: The trained network is used to generate steering commands from a single front-facing center camera.
Training data was collected by driving on a wide variety of roads and in a diverse set of lighting and weather conditions. We gathered surface street data in central New Jersey and highway data from Illinois, Michigan, Pennsylvania, and New York. Other road types include two-lane roads (with and without lane markings), residential roads with parked cars, tunnels, and unpaved roads. Data was collected in clear, cloudy, foggy, snowy, and rainy weather, both day and night. In some instances, the sun was low in the sky, resulting in glare reflecting from the road surface and scattering from the windshield.
The data was acquired using either our drive-by-wire test vehicle, which is a 2016 Lincoln MKZ, or using a 2013 Ford Focus with cameras placed in similar positions to those in the Lincoln. Our system has no dependencies on any particular vehicle make or model. Drivers were encouraged to maintain full attentiveness, but otherwise drive as they usually do. As of March 28, 2016, about 72 hours of driving data was collected.
Figure 5: CNN architecture. The network has about 27 million connections and 250 thousand parameters.
We train the weights of our network to minimize the mean-squared error between the steering command output by the network, and either the command of the human driver or the adjusted steering command for off-center and rotated images (see “Augmentation”, later). Figure 5 shows the network architecture, which consists of 9 layers, including a normalization layer, 5 convolutional layers, and 3 fully connected layers. The input image is split into YUV planes and passed to the network.
The first layer of the network performs image normalization. The normalizer is hard-coded and is not adjusted in the learning process. Performing normalization in the network allows the normalization scheme to be altered with the network architecture, and to be accelerated via GPU processing.
The convolutional layers are designed to perform feature extraction, and are chosen empirically through a series of experiments that vary layer configurations. We then use strided convolutions in the first three convolutional layers with a 2×2 stride and a 5×5 kernel, and a non-strided convolution with a 3×3 kernel size in the final two convolutional layers.
We follow the five convolutional layers with three fully connected layers, leading to a final output control value which is the inverse-turning-radius. The fully connected layers are designed to function as a controller for steering, but we noted that by training the system end-to-end, it is not possible to make a clean break between which parts of the network function primarily as feature extractor, and which serve as controller.
The first step to training a neural network is selecting the frames to use. Our collected data is labeled with road type, weather condition, and the driver’s activity (staying in a lane, switching lanes, turning, and so forth). To train a CNN to do lane following, we simply select data where the driver is staying in a lane, and discard the rest. We then sample that video at 10 FPS because a higher sampling rate would include images that are highly similar, and thus not provide much additional useful information. To remove a bias towards driving straight the training data includes a higher proportion of frames that represent road curves.
After selecting the final set of frames, we augment the data by adding artificial shifts and rotations to teach the network how to recover from a poor position or orientation. The magnitude of these perturbations is chosen randomly from a normal distribution. The distribution has zero mean, and the standard deviation is twice the standard deviation that we measured with human drivers. Artificially augmenting the data does add undesirable artifacts as the magnitude increases (as mentioned previously).
Before road-testing a trained CNN, we first evaluate the network’s performance in simulation. Figure 6 shows a simplified block diagram of the simulation system, and Figure 7 shows a screenshot of the simulator in interactive mode.
Figure 6: Block-diagram of the drive simulator.
The simulator takes prerecorded videos from a forward-facing on-board camera connected to a human-driven data-collection vehicle, and generates images that approximate what would appear if the CNN were instead steering the vehicle. These test videos are time-synchronized with the recorded steering commands generated by the human driver.
Since human drivers don’t drive in the center of the lane all the time, we must manually calibrate the lane’s center as it is associated with each frame in the video used by the simulator. We call this position the “ground truth”.
The simulator transforms the original images to account for departures from the ground truth. Note that this transformation also includes any discrepancy between the human driven path and the ground truth. The transformation is accomplished by the same methods as described previously.
The simulator accesses the recorded test video along with the synchronized steering commands that occurred when the video was captured. The simulator sends the first frame of the chosen test video, adjusted for any departures from the ground truth, to the input of the trained CNN, which then returns a steering command for that frame. The CNN steering commands as well as the recorded human-driver commands are fed into the dynamic model  of the vehicle to update the position and orientation of the simulated vehicle.
Figure 7: Screenshot of the simulator in interactive mode. See text for explanation of the performance metrics. The green area on the left is unknown because of the viewpoint transformation. The highlighted wide rectangle below the horizon is the area which is sent to the CNN.
The simulator then modifies the next frame in the test video so that the image appears as if the vehicle were at the position that resulted by following steering commands from the CNN. This new image is then fed to the CNN and the process repeats.
The simulator records the off-center distance (distance from the car to the lane center), the yaw, and the distance traveled by the virtual car. When the off-center distance exceeds one meter, a virtual human intervention is triggered, and the virtual vehicle position and orientation is reset to match the ground truth of the corresponding frame of the original test video.
We evaluate our networks in two steps: first in simulation, and then in on-road tests.
In simulation we have the networks provide steering commands in our simulator to an ensemble of prerecorded test routes that correspond to about a total of three hours and 100 miles of driving in Monmouth County, NJ. The test data was taken in diverse lighting and weather conditions and includes highways, local roads, and residential streets.
We estimate what percentage of the time the network could drive the car (autonomy) by counting the simulated human interventions that occur when the simulated vehicle departs from the center line by more than one meter. We assume that in real life an actual intervention would require a total of six seconds: this is the time required for a human to retake control of the vehicle, re-center it, and then restart the self-steering mode. We calculate the percentage autonomy by counting the number of interventions, multiplying by 6 seconds, dividing by the elapsed time of the simulated test, and then subtracting the result from 1:
Thus, if we had 10 interventions in 600 seconds, we would have an autonomy value of
After a trained network has demonstrated good performance in the simulator, the network is loaded on the DRIVE PX in our test car and taken out for a road test. For these tests we measure performance as the fraction of time during which the car performs autonomous steering. This time excludes lane changes and turns from one road to another. For a typical drive in Monmouth County NJ from our office in Holmdel to Atlantic Highlands, we are autonomous approximately 98% of the time. We also drove 10 miles on the Garden State Parkway (a multi-lane divided highway with on and off ramps) with zero intercepts.
Here is a video of our test car driving in diverse conditions.
Visualization of Internal CNN State
Figure 8: How the CNN “sees” an unpaved road. Top: subset of the camera image sent to the CNN. Bottom left: Activation of the first layer feature maps. Bottom right: Activation of the second layer feature maps. This demonstrates that the CNN learned to detect useful road features on its own, i. e., with only the human steering angle as training signal. We never explicitly trained it to detect the outlines of roads.
Figures 8 and 9 show the activations of the first two feature map layers for two different example inputs, an unpaved road and a forest. In case of the unpaved road, the feature map activations clearly show the outline of the road while in case of the forest the feature maps contain mostly noise, i. e., the CNN finds no useful information in this image.
This demonstrates that the CNN learned to detect useful road features on its own, i. e., with only the human steering angle as training signal. We never explicitly trained it to detect the outlines of roads, for example.
Figure 9: Example image with no road. The activations of the first two feature maps appear to contain mostly noise, i. e., the CNN doesn’t recognize any useful features in this image.
We have empirically demonstrated that CNNs are able to learn the entire task of lane and road following without manual decomposition into road o
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4):541–551, Winter 1989.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks.
Danwei Wang and Feng Qi. Trajectory planning for a four-wheel-steering vehicle. In Proceedings of the 2001 IEEE International Conference on Robotics & Automation, May 21–26 2001. URL: http://www.ntu.edu.sg/home/edwwang/confpapers/wdwicar01.pdf.
rlane marking detection, semantic abstraction, path planning, and control. A small amount of training data from less than a hundred hours of driving was sufficient to train the car to operate in diverse conditions, on highways, local and residential roads in sunny, cloudy, and rainy conditions.
The CNN is able to learn meaningful road features from a very sparse training signal (steering alone).
The system learns for example to detect the outline of a road without the need of explicit labels during training.
More work is needed to improve the robustness of the network, to find methods to verify the robustness, and to improve visualization of the network-internal processing steps.
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backprop- agation applied to handwritten zip code recognition. Neural Computation, 1(4):541–551, Winter 1989. URL: http://yann.lecun.org/exdb/publis/pdf/lecun-89e.pdf.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012. URL: http://papers.nips.cc/paper/ 4824-imagenet-classification-with-deep-convolutional-neural-networks. pdf.
L. D. Jackel, D. Sharman, Stenard C. E., Strom B. I., , and D Zuckert. Optical character recognition for self-service banking. AT&T Technical Journal, 74(1):16–24, 1995.
Large scale visual recognition challenge (ILSVRC). URL: http://www.image-net.org/ challenges/LSVRC/.
Net-Scale Technologies, Inc. Autonomous off-road vehicle control using end-to-end learning, July 2004. Final technical report. URL: http://net-scale.com/doc/net-scale-dave-report.pdf.
Dean A. Pomerleau. ALVINN, an autonomous land vehicle in a neural network. Technical report, Carnegie Mellon University, 1989. URL: http://repository.cmu.edu/cgi/viewcontent. cgi?article=2874&context=compsci.
Danwei Wang and Feng Qi. Trajectory planning for a four-wheel-steering vehicle. In Proceedings of the 2001 IEEE International Conference on Robotics & Automation, May 21–26 2001. URL: http: //www.ntu.edu.sg/home/edwwang/confpapers/wdwicar01.pdf.
Gamalon has developed a technique that lets machines learn to recognize concepts in images or text much more efficiently.
An app developed by Gamalon recognizes objects after seeing a few examples. A learning program recognizes simpler concepts such as lines and rectangles.
Machine learning is becoming extremely powerful, but it requires extreme amounts of data.
You can, for instance, train a deep-learning algorithm to recognize a cat with a cat-fancier’s level of expertise, but you’ll need to feed it tens or even hundreds of thousands of images of felines, capturing a huge amount of variation in size, shape, texture, lighting, and orientation. It would be lot more efficient if, a bit like a person, an algorithm could develop an idea about what makes a cat a cat from fewer examples.
A Boston-based startup called Gamalon has developed technology that lets computers do this in some situations, and it is releasing two products Tuesday based on the approach.
If the underlying technique can be applied to many other tasks, then it could have a big impact. The ability to learn from less data could let robots explore and understand new environments very quickly, or allow computers to learn about your preferences without sharing your data.
Gamalon uses a technique that it calls Bayesian program synthesis to build algorithms capable of learning from fewer examples. Bayesian probability, named after the 18th century mathematician Thomas Bayes, provides a mathematical framework for refining predictions about the world based on experience. Gamalon’s system uses probabilistic programming—or code that deals in probabilities rather than specific variables—to build a predictive model that explains a particular data set. From just a few examples, a probabilistic program can determine, for instance, that it’s highly probable that cats have ears, whiskers, and tails. As further examples are provided, the code behind the model is rewritten, and the probabilities tweaked. This provides an efficient way to learn the salient knowledge from the data.
Probabilistic programming techniques have been around for a while. In 2015, for example, a team from MIT and NYU used probabilistic methods to have computers learn to recognize written characters and objects after seeing just one example (see “This AI Algorithm Learns Simple Tasks as Fast as We Do”). But the approach has mostly been an academic curiosity.
There are difficult computational challenges to overcome, because the program has to consider many different possible explanations, says Brenden Lake, a research fellow at NYU who led the 2015 work.
Still, in theory, Lake says, the approach has significant potential because it can automate aspects of developing a machine-learning model. “Probabilistic programming will make machine learning much easier for researchers and practitioners,” Lake says. “It has the potential to take care of the difficult [programming] parts automatically.”
There are certainly significant incentives to develop easier-to-use and less data-hungry machine-learning approaches. Machine learning currently involves acquiring a large raw data set, and often then labeling it manually. The learning is then done inside large data centers, using many computer processors churning away in parallel for hours or days. “There are only a few really large companies that can really afford to do this,” says Ben Vigoda, cofounder and CEO of Gamalon.
When Machines Have Ideas | Ben Vigoda | TEDxBoston
Our CEO, Ben Vigoda, gave a talk at TEDx Boston 2016 called “When Machines Have Ideas” that describes why building “stories” (i.e. Bayesian generative models) into machine intelligence systems can be very powerful.
In theory, Gamalon’s approach could make it a lot easier for someone to build and refine a machine-learning model, too. Perfecting a deep-learning algorithm requires a great deal of mathematical and machine-learning expertise. “There’s a black art to setting these systems up,” Vigoda says. With Gamalon’s approach, a programmer could train a model by feeding in significant examples.
Vigoda showed MIT Technology Review a demo with a drawing app that uses the technique. It is similar to the one released last year by Google, which uses deep learning to recognize the object a person is trying to sketch (see “Want to Understand AI? Try Sketching a Duck for a Neural Network”). But whereas Google’s app needs to see a sketch that matches the ones it has seen previously, Gamalon’s version uses a probabilistic program to recognize the key features of an object. For instance, one program understands that a triangle sitting atop a square is most likely a house. This means even if your sketch is very different from what it has seen before, providing it has those features, it will guess correctly.
The technique could have significant near-term commercial applications, too. The company’s first products use Bayesian program synthesis to recognize concepts in text.
One product, called Gamalon Structure, can extract concepts from raw text more efficiently than is normally possible. For example, it can take a manufacturer’s description of a television and determine what product is being described, the brand, the product name, the resolution, the size, and other features.
Another product, Gamalon Match, is used to categorize the products and price in a store’s inventory. In each case, even when different acronyms or abbreviations are used for a product or feature, the system can quickly be trained to recognize them.
Vigoda believes the ability to learn will have other practical benefits.
A computer could learn about a user’s interests without requiring an impractical amount of data or hours of training.
Personal data might not need to be shared with large companies, either, if machine learning can be done efficiently on a user’s smartphone or laptop.
And a robot or a self-driving car could learn about a new obstacle without needing to see hundreds of thousands of examples.