AI Software Juggles Probabilities to Learn from Less Data

By Hugo Angel,

Gamalon has developed a technique that lets machines learn to recognize concepts in images or text much more efficiently.
An app developed by Gamalon recognizes objects after seeing a few examples. A learning program recognizes simpler concepts such as lines and rectangles.
Machine learning is becoming extremely powerful, but it requires extreme amounts of data.
You can, for instance, train a deep-learning algorithm to recognize a cat with a cat-fancier’s level of expertise, but you’ll need to feed it tens or even hundreds of thousands of images of felines, capturing a huge amount of variation in size, shape, texture, lighting, and orientation. It would be lot more efficient if, a bit like a person, an algorithm could develop an idea about what makes a cat a cat from fewer examples.
A Boston-based startup called Gamalon has developed technology that lets computers do this in some situations, and it is releasing two products Tuesday based on the approach.
If the underlying technique can be applied to many other tasks, then it could have a big impact. The ability to learn from less data could let robots explore and understand new environments very quickly, or allow computers to learn about your preferences without sharing your data.
Gamalon uses a technique that it calls Bayesian program synthesis to build algorithms capable of learning from fewer examples. Bayesian probability, named after the 18th century mathematician Thomas Bayes, provides a mathematical framework for refining predictions about the world based on experience. Gamalon’s system uses probabilistic programming—or code that deals in probabilities rather than specific variables—to build a predictive model that explains a particular data set. From just a few examples, a probabilistic program can determine, for instance, that it’s highly probable that cats have ears, whiskers, and tails. As further examples are provided, the code behind the model is rewritten, and the probabilities tweaked. This provides an efficient way to learn the salient knowledge from the data.
Probabilistic programming techniques have been around for a while. In 2015, for example, a team from MIT and NYU used probabilistic methods to have computers learn to recognize written characters and objects after seeing just one example (see “This AI Algorithm Learns Simple Tasks as Fast as We Do”). But the approach has mostly been an academic curiosity.
There are difficult computational challenges to overcome, because the program has to consider many different possible explanations, says Brenden Lake, a research fellow at NYU who led the 2015 work.
Still, in theory, Lake says, the approach has significant potential because it can automate aspects of developing a machine-learning model.Probabilistic programming will make machine learning much easier for researchers and practitioners,” Lake says. “It has the potential to take care of the difficult [programming] parts automatically.
There are certainly significant incentives to develop easier-to-use and less data-hungry machine-learning approaches. Machine learning currently involves acquiring a large raw data set, and often then labeling it manually. The learning is then done inside large data centers, using many computer processors churning away in parallel for hours or days. “There are only a few really large companies that can really afford to do this,” says Ben Vigoda, cofounder and CEO of Gamalon.
When Machines Have Ideas | Ben Vigoda | TEDxBoston
Our CEO, Ben Vigoda, gave a talk at TEDx Boston 2016 called “When Machines Have Ideas” that describes why building “stories” (i.e. Bayesian generative models) into machine intelligence systems can be very powerful.
In theory, Gamalon’s approach could make it a lot easier for someone to build and refine a machine-learning model, too. Perfecting a deep-learning algorithm requires a great deal of mathematical and machine-learning expertise. “There’s a black art to setting these systems up,” Vigoda says. With Gamalon’s approach, a programmer could train a model by feeding in significant examples.
Vigoda showed MIT Technology Review a demo with a drawing app that uses the technique. It is similar to the one released last year by Google, which uses deep learning to recognize the object a person is trying to sketch (see “Want to Understand AI? Try Sketching a Duck for a Neural Network”). But whereas Google’s app needs to see a sketch that matches the ones it has seen previously, Gamalon’s version uses a probabilistic program to recognize the key features of an object. For instance, one program understands that a triangle sitting atop a square is most likely a house. This means even if your sketch is very different from what it has seen before, providing it has those features, it will guess correctly.
The technique could have significant near-term commercial applications, too. The company’s first products use Bayesian program synthesis to recognize concepts in text.
  • One product, called Gamalon Structure, can extract concepts from raw text more efficiently than is normally possible. For example, it can take a manufacturer’s description of a television and determine what product is being described, the brand, the product name, the resolution, the size, and other features.
  • Another product, Gamalon Match, is used to categorize the products and price in a store’s inventory. In each case, even when different acronyms or abbreviations are used for a product or feature, the system can quickly be trained to recognize them.
Vigoda believes the ability to learn will have other practical benefits.

  • A computer could learn about a user’s interests without requiring an impractical amount of data or hours of training.
  • Personal data might not need to be shared with large companies, either, if machine learning can be done efficiently on a user’s smartphone or laptop.
  • And a robot or a self-driving car could learn about a new obstacle without needing to see hundreds of thousands of examples.
February 14, 2017

An international team of scientists has come up with a blueprint for a large-scale quantum computer

By Hugo Angel,

‘It is the Holy Grail of science … we will be able to do certain things we could never even dream of before’
Courtesy Professor Winfried Hensinger
Quantum computing breakthrough could help ‘change life completely‘, say scientists
Scientists claim to have produced the first-ever blueprint for a large-scale quantum computer in a development that could bring about a technological revolution on a par with the invention of computing itself.
Until now quantum computers have had just a fraction of the processing power they are theoretically capable of producing.
But an international team of researchers believe they have finally overcome the main technical problems that have prevented the construction of more powerful machines.
They are currently building a prototype and a full-scale quantum computer – many millions of times faster than the best currently available – could be built in about a decade.
This is a modal window.
Scientists invent invisible underwater robots based on glass eels
Such devices work by utilising the almost magical properties found in the world of the very small, where an atom can apparently exist in two different places at the same time.
Professor Winfried Hensinger, head of the Ion Quantum Technology Group at Sussex University, who has been leading this research, told The Independent: “It is the Holy Grail of science, really, to build a quantum computer.
And we are now publishing the actual nuts-and-bolts construction plan for a large-scale quantum computer.
It is thought the astonishing processing power unleashed by quantum mechanics will lead to new, life-saving medicines, help solve the most intractable scientific problems, and probe the mysteries of the universe.
Life will change completely. We will be able to do certain things we could never even dream of before,” Professor Hensinger said.
You can imagine that suddenly the sky is the limit.
This is really, really exciting … it’s probably one of the most exciting times to be in this field.
He said small quantum computers had been built in the past but to test the theories.
This is not an academic study any more, it really is all the engineering required to build such a device,” he said.
Nobody has really gone ahead and drafted a full engineering plan of how you build one.
Many people questioned, because this is so hard to make this happen, that it can even be built.
We show that not only can it be built, but we provide a whole detailed plan on how to make it happen.
The problem is that existing quantum computers require lasers focused precisely on individual atoms. The larger the computer, the more lasers are required and the greater the chance of something going wrong.
But Professor Hensinger and colleagues used a different technique to monitor the atoms involving a microwave field and electricity in an ‘ion-trap’ device.

What we have is a solution that we can scale to arbitrary [computing] power,” he said.

Fig. 2. Gradient wires placed underneath each gate zone and embedded silicon photodetector.
(A) Illustration showing an isometric view of the two main gradient wires placed underneath each gate zone. Short wires are placed locally underneath each gate zone to form coils, which compensate for slowly varying magnetic fields and allow for individual addressing. The wire configuration in each zone can be seen in more detail in the inset.
(B) Silicon photodetector (marked green) embedded in the silicon substrate, transparent center segmented electrodes, and the possible detection angle are shown. VIA structures are used to prevent optical cross-talk from neighboring readout zones.
Source: Science Journals — AAAS. Blueprint for a microwave trapped ion quantum computer. Lekitsch et al. Sci. Adv. 2017;3: e1601540 1 February 2017
Fig. 4. Scalable module illustration. One module consisting of 36 × 36 junctions placed on the supporting steel frame structure: Nine wafers containing the required DACs and control electronics are placed between the wafer holding 36 × 36 junctions and the microchannel cooler (red layer) providing the cooling. X-Y-Z piezo actuators are placed in the four corners on top of the steel frame, allowing for accurate alignment of the module. Flexible electric wires supply voltages, currents, and control signals to the DACs and control electronics, such as field-programmable gate arrays (FPGAs). Coolant is supplied to the microchannel cooler layer via two flexible steel tubes placed in the center of the modules.
Source: Science Journals — AAAS. Blueprint for a microwave trapped ion quantum computer. Lekitsch et al. Sci. Adv. 2017;3: e1601540 1 February 2017
Fig. 5. Illustration of vacuum chambers. Schematic of octagonal UHV chambers connected together; each chamber is 4.5 × 4.5 m2 large and can hold >2.2 million individual X-junctions placed on steel frames.
Source: Science Journals — AAAS. Blueprint for a microwave trapped ion quantum computer. Lekitsch et al. Sci. Adv. 2017;3: e1601540 1 February 2017

We are already building it now. Within two years we think we will have completed a prototype which incorporates all the technology we state in this blueprint.

At the same time we are now looking for industry partner so we can really build a large-scale device that fills a building basically.
It’s extraordinarily expensive so we need industry partners … this will be in the 10s of millions, up to £100m.
Commenting on the research, described in a paper in the journal Science Advances, other academics praised the quality of the work but expressed caution about how quickly it could be developed.
Dr Toby Cubitt, a Royal Society research fellow in quantum information theory at University College London, said: “Many different technologies are competing to build the first large-scale quantum computer. Ion traps were one of the earliest realistic proposals. 
This work is an important step towards scaling up ion-trap quantum computing.
Though there’s still a long way to go before you’ll be making spreadsheets on your quantum computer.
And Professor Alan Woodward, of Surrey University, hailed the “tremendous step in the right direction”.
It is great work,” he said. “They have made some significant strides forward.

But he added it was “too soon to say” whether it would lead to the hoped-for technological revolution.

ORIGINAL: The Independent
Ian Johnston Science Correspondent
Thursday 2 February 2017 – Bringing Deep Learning AI to the Devices at the Edge of the Network

By Hugo Angel,


Photo  – The Team


Today we announced our funding of We are excited to be working with Ali Farhadi, Mohammad Rastegari and their team on this new company. We are also looking forward to working with Paul Allen’s team at the Allen Institute for AI and in particular our good friend and CEO of AI2, Dr. Oren Etzioni who is joining the board of Machine Learning and AI have been a key investment theme for us for the past several years and bringing deep learning capabilities such as image and speech recognition to small devices is a huge challenge.

Mohammad and Ali and their team have developed a platform that enables low resource devices to perform tasks that usually require large farms of GPUs in cloud environments. This, we believe, has the opportunity to change how we think about certain types of deep learning use cases as they get extended from the core to the edge. Image and voice recognition are great examples. These are broad areas of use cases out in the world – usually with a mobile device, but right now they require the device to be connected to the internet so those large farms of GPUs can process all the information your device is capturing/sending and having the core transmit back the answer. If you could do that on your phone (while preserving battery life) it opens up a new world of options.

It is just these kinds of inventions that put the greater Seattle area at the center of the revolution in machine learning and AI that is upon us. came out of the outstanding work the team was doing at the Allen Institute for Artificial Intelligence (AI2.) and Ali is a professor at the University of Washington. Between Microsoft, Amazon, the University of Washington and research institutes such as AI2, our region is leading the way as new types of intelligent applications takes shape. Madrona is energized to play our role as company builder and support for these amazing inventors and founders.

By Matt McIlwain

AI acceleration startup collects $2.6M in funding

I was excited by the promise of and its technique that drastically reduces the computing power necessary to perform complex operations like computer vision. Seems I wasn’t the only one: the company, just officially spun off from the Allen Institute for AI (AI2), has attracted $2.6 million in seed funding from its parent company and Madrona Venture Group.

The specifics of the product and process you can learn about in detail in my previous post, but the gist is this: machine learning models for things like object and speech recognition are notoriously computation-heavy, making them difficult to implement on smaller, less powerful devices.’s researchers use a bit of mathematical trickery to reduce that computing load by an order of magnitude or two — something it’s easy to see the benefit of.

Related Articles

McIlwain will join AI2 CEO Oren Etzioni on the board of; Ali Farhadi, who led the original project, will be the company’s CEO, and Mohammad Rastegari is CTO.
The new company aims to facilitate commercial applications of its technology (it isn’t quite plug and play yet), but the research that led up to it is, like other AI2 work, open source.


AI2 Repository:

ORIGINAL: TechCrunch

Why Apple Joined Rivals Amazon, Google, Microsoft In AI Partnership

By Hugo Angel,

Apple CEO Tim Cook (Photo credit: David Paul Morris/Bloomberg)

Apple is pushing past its famous secrecy for the sake of artificial intelligence.

In December, the Cupertino tech giant quietly published its first AI research paper. Now, it’s joining the Partnership on AI to Benefit People and Society, an industry nonprofit group founded by some of its biggest rivals, including Microsoft, Google and Amazon.

On Friday, the partnership announced that Apple’s head of advanced development for SiriTom Gruber, is joining its board. Gruber has been at Apple since 2010 when the iPhone maker bought Siri, the company he cofounded and where he served as CTO.

“We’re glad to see the industry engaging on some of the larger opportunities and concerns created with the advance of machine learning and AI,” wrote Gruber in a statement on the nonprofit’s website. “We believe it’s beneficial to Apple, our customers, and the industry to play an active role in its development and look forward to collaborating with the group to help drive discussion on how to advance AI while protecting the privacy and security of consumers.”

Other members of the board include

  • Greg Corrado from Google’s DeepMind,
  • Ralf Herbrich from Amazon,
  • Eric Horvitz from Microsoft,
  • Yann Lecun from Facebook, and
  • Francesca Rossi from IBM.

Outside of large companies, the group announced it’s also adding members from the

  • American Civil Liberties Union,
  • OpenAI,
  • MacArthur Foundation,
  • Peterson Institute of International Economics,
  • Arizona State University and the
  • University of California, Berkeley.

The group was formally announced in September.

Board member Horvitz, who is director of Microsoft Research, said the members of the group started meeting with each other at various AI conferences. They were already close colleagues in the field and they thought they could start working together to discuss emerging challenges and opportunities in AI.

 “We believed there were a lot of things companies could do together on issues and challenges in the realm of AI and society,” Horvitz said in an interview. “We don’t see these as areas for competition but for rich cooperation.

The organization will work together to develop best practices and educate the public around AI. Horvitz said the group tackle, for example, critical areas like health care and transportation. The group will look at the potential for biases in AI — after some experiments have shown that the way researchers train the AI algorithms can lead to biases in gender and race. The nonprofit will also try to develop standards around human-machine collaboration, for example, to deal with questions like when should a self-driving car hand off control to the driver.

“I think there’s a realization that AI will touch society quite deeply in the coming years in powerful and nuanced ways,” Horitz said. “We think it’s really important to involve the public as well as experts. Some of these directions has no simple answer. It can’t come from a company. We need to have multiple constituents checking in.”

The AI community has been critical of Apple’s secrecy for several years secrecy has hurt the company’s recruiting efforts for AI talent. The company has been falling behind in some of the major advancements in AI, especially as intelligent voice assistants from Amazon and Google have started taking off with consumers.

Horvitz said the group had been in discussions with Apple since before its launch in September. But Apple wasn’t ready to formally join the group until now. “My own sense is that Apple was in the middle of their iOS 10 and iPhone 7 launches” and wasn’t ready to announce, he said. “We’ve always treated Apple as a founding member of the group.

I think Apple had a realization that to do the best AI research and to have access to the top minds in the field is the expectation of engaging openly with academic research communities,” Horitz said. “Other companies like Microsoft have discovered this over the years. We can be quite competitive and be open to sharing ideas when it comes to the core foundational science.

“It’s my hope that this partnership with Apple shows that the company has a rich engagement with people, society and stakeholders,” he said.

Jan 27, 2017

A biomimetic robotic platform to study flight specializations of bats

By Hugo Angel,

Some Batty Ideas Take Flight. Bat Bot (shown) is able to imitate several flight maneuvers of bats, such as bank turns and diving flight. Such agile flight is made possible by highly malleable bones and skin in bat wings. Ramezani et al. identified and implemented the most important bat wing joints by means of a series of mechanical constraints. They then designed feedback control for their flapping wing platform and covered the structure in a flexible silicon membrane. This biomimetic robot may also shed light on the role of bat legs to modulate flight pitch. [CREDIT: ALIREZA RAMEZANI
Forget drones. Think bat-bots. Engineers have created a new autonomous flying machine that looks and maneuvers just like a bat. Weighing only 93 grams, the robot’s agility comes from its complex wings made of lightweight silicone-based membranes stretched over carbon-fiber bones, the researchers report today in Science Robotics. In addition to nine joints in each wing, it sports adjustable legs, which help it steer by deforming the membrane of its tail. Complex algorithms coordinate these components, letting the bot make batlike moves including banking turns and dives. But don’t bring out the bat-signal just yet. Remaining challenges include improving battery life and developing stronger electronic components so the device can survive minor crashes. Ultimately, though, the engineers hope this highly maneuverable alternative to quadrotor drones could serve as a helpful new sidekick—lending a wing in anything from dodging through beams as a construction surveyor to aiding in disaster relief by scouting dangerous sites. The next lesson researchers hope to teach the bat-bot? Perching upside-down.
Posted in:
DOI: 10.1126/science.aal0685

A biomimetic robotic platform to study flight specializations of bats

Alireza Ramezani1, Soon-Jo Chung2,* and
Seth Hutchinson1
+ Author Affiliations
*Corresponding author. Email:
Science Robotics 01 Feb 2017:
Vol. 2, Issue 3,
DOI: 10.1126/scirobotics.aal2505
Bats have long captured the imaginations of scientists and engineers with their unrivaled agility and maneuvering characteristics, achieved by functionally versatile dynamic wing conformations as well as more than 40 active and passive joints on the wings. Wing flexibility and complex wing kinematics not only bring a unique perspective to research in biology and aerial robotics but also pose substantial technological challenges for robot modeling, design, and control. We have created a fully self-contained, autonomous flying robot that weighs 93 grams, called Bat Bot (B2), to mimic such morphological properties of bat wings. Instead of using a large number of distributed control actuators, we implement highly stretchable silicone-based membrane wings that are controlled at a reduced number of dominant wing joints to best match the morphological characteristics of bat flight. First, the dominant degrees of freedom (DOFs) in the bat flight mechanism are identified and incorporated in B2’s design by means of a series of mechanical constraints. These biologically meaningful DOFs include asynchronous and mediolateral movements of the armwings and dorsoventral movements of the legs. Second, the continuous surface and elastic properties of bat skin under wing morphing are realized by an ultrathin (56 micrometers) membranous skin that covers the skeleton of the morphing wings. We have successfully achieved autonomous flight of B2 using a series of virtual constraints to control the articulated, morphing wings.
Biologically inspired flying robots showcase impressive flight characteristics [e.g., robot fly (1) and bird-like robots (2, 3)]. In recent years, biomimicry of bat flight has led to the development of robots that are capable of mimicking bat morphing characteristics on either a stationary (4) or a rotational pendular platform (5). However, these attempts are limited because of the inherent complexities of bat wing morphologies and lightweight form factors.
Arguably, bats have the most sophisticated powered flight mechanism among animals, as evidenced by the morphing properties of their wings. Their flight mechanism has several types of joints (e.g., ball-and-socket and revolute joints), which interlock the bones and muscles to one another and create a metamorphic musculoskeletal system that has more than 40 degrees of freedom (DOFs), both passive and active (see Fig. 1) (6). For insects, the wing structure is not as sophisticated as bats because it is a single, unjointed structural unit. Like bat wings, bird wings have several joints that can be moved actively and independently.
Fig. 1 Functional groups in bat (photo courtesy of A. D. Rummel and S. Swartz, the Aeromechanics and Evolutionary Morphology Laboratory, Brown University).
Enumerated bat joint angles and functional groups are depicted; using these groups makes it possible to categorize the sophisticated movements of the limbs during flight and to extract dominant DOFs and incorporate them in the flight kinematics of B2. The selected DOFs are coupled by a series of mechanical and virtual constraints.
Robotics research inspired by avian flight has successfully conceptualized bird wings as a rigid structure, which is nearly planar and translates—as a whole or in two to three parts—through space; however, the wing articulation involved in bat wingbeats is very pronounced. In the mechanism of bat flight, one wingbeat cycle consists of two movements: (i) a downstroke phase, which is initiated by both left and right forelimbs expanding backward and sideways while sweeping downward and forward relative to the body, and (ii) an upstroke phase, which brings the forelimbs upward and backward and is followed by the flexion of the elbows and wrists to fold the wings. There are more aspects of flapping flight that uniquely distinguish bats. Bat wings have (i) bones that deform adaptively during each wingbeat cycle, (ii) anisotropic wing membrane skin with adjustable stiffness across the wing, and (iii) a distributed network of skin sensory organs believed to provide continuous information regarding flows over the wing surfaces (7).
The motivation for our research into bat-inspired aerial robots is twofold. First, the study of these robots will provide insight into flapping aerial robotics, and the development of these soft-winged robots will have a practical impact on robotics applications where humans and robots share a common environment. From an engineering perspective, understanding bat flight is a rich and interesting problem. Unlike birds or insects, bats exclusively use structural flexibility to generate the controlled force distribution on each membrane wing. Wing flexibility and complex wing kinematics are crucial to the unrivaled agility of bat flight (8, 9). This aspect of bat flight brings a unique perspective to research in winged aerial robotics, because most previous work on bioinspired flight is focused on insect flight (1015) or hummingbird flight (16), using robots with relatively stiff wings (17, 18).
Bat-inspired aerial robots have a number of practical advantages over current aerial robots, such as quadrotors. In the case of humans and robots co-inhabiting shared spaces, the safety of bat-inspired robots with soft wings is the most important advantage. Although quadrotor platforms can demonstrate agile maneuvers in complex environments (19, 20), quadrotors and other rotorcraft are inherently unsafe for humans; demands of aerodynamic efficiency prohibit the use of rotor blades or propellers made of flexible material, and high noise levels pose a potential hazard for humans. In contrast, the compliant wings of a bat-like flapping robot flapping at lower frequencies (7 to 10 Hz versus 100 to 300 Hz of quadrotors) are inherently safe, because their wings comprise primarily flexible materials and are able to collide with one another, or with obstacles in their environment, with little or no damage.
Versatile wing conformation
The articulated mechanism of bats has speed-dependent morphing properties (21, 22) that respond differently to various flight maneuvers. For instance, consider a half-roll (180° roll) maneuver performed by insectivorous bats (23). Flexing a wing and consequently reducing the wing area would increase wing loading on the flexed wing, thereby reducing the lift force. In addition, pronation (pitch-down) of one wing and supination (pitch-up) of the other wing result in negative and positive angles of attack, respectively, thereby producing negative and positive lift forces on the wings, causing the bat to roll sharply. Bats use this maneuver to hunt insects because at 180° roll, they can use the natural camber on their wings to maximize descending acceleration. Insectivorous bats require a high level of agility because their insect preys are also capable of swooping during pursuit. With such formidable defense strategies used by their airborne prey, these bats require sharp changes in flight direction.
In mimicking bats’ functionally versatile dynamic wing conformations, two extreme paradigms are possible. On the one hand, many active joints can be incorporated in the design. This school of thought can lead to the design and development of robots with many degrees of actuation that simply cannot fly. Apart from performance issues that may appear from overactuating a dynamic system, these approaches are not practical for bat-inspired micro aerial vehicles (MAVs) because there are technical restrictions for sensing and actuating many joints in robots with tight weight (less than 100 g) and dimension restrictions. On the other hand, oversimplifying the morphing wing kinematics to oscillatory flat surfaces, which is similar to conventional ornithopters, underestimates the complexities of the bat flight mechanism. Such simplified ornithopters with simple wing kinematics may not help answer how bats achieve their impressive agile flight.
Body dimensional complexity
A better understanding of key DOFs in bat flight kinematics may help to design a simpler flying robot with substantially fewer joints that is yet capable of mimicking its biological counterparts. A similar paradigm has led to successful replications of the human terrestrial locomotion (walking and running) by using bipedal robots that have point feet (24), suggesting that feet are a redundant element of the human locomotion system. Assigning importance to the kinematic parameters can yield a simpler mechanism with fewer kinematic parameters if those parameters with higher kinematic contribution and significance are chosen. Such kinematic characterization methods have been applied to study various biological mechanisms (6, 9, 2528).
Among these studies, Riskin et al. (6) enhance our understanding of bat aerial locomotion in particular by using the method of principal components analysis (PCA) to project bat joint movements to the subspace of eigenmodes, isolating the various components of the wing conformation. By using only the first eigenmode, 34% of biological bat flight kinematics are reproducible. By superimposing the first and second eigenmodes, more than 57% of bat flight kinematics can be replicated. These findings, which emphasize the existence of synergies (29) in bat flight kinematics to describe the sophisticated movements of the limbs during flight, suggest the possibility of mimicking bat kinematics with only a few DOFs (30).
According to these PCAs, three functional groups, shown in Fig. 1, synthesize the wing morphing: (i) when wings spread, fingers bend; (ii) when wrists pronate, elbows bend; and (iii) the medial part of the wings is morphed in collaboration with the shoulders, hips, and knees (6). These dimensional complexity analyses reveal that the flapping motion of the wings, the mediolateral motion of the forelimbs, the flexion-extension of the fingers, the pronation-supination of the carpi, and the dorsoventral movement of the legs are the major DOFs. In developing our robotic platform Bat Bot (B2) (Fig. 2A), we selected these biologically meaningful DOFs and incorporated them in the design of B2 by means of a series of mechanical constraints.
Fig. 2Bat Bot.
(A) B2 is self-sustained and self-contained; it has an onboard computer and several sensors for performing autonomous navigation in its environment. The computing, sensing, and power electronics, which are accommodated within B2, are custom-made and yield a fully self-sustained system despite weight and size restrictions. The computing unit, or main control board (MCB), hosts a microprocessor. While the navigation-and-control algorithm runs on the MCB in real time, a data acquisition unit acquires sensor data and commands the micro actuators. The sensing electronics, which are circuit boards custom-designed to achieve the smallest size possible, interface with the sensors and the MCB by collecting two kinds of measurements. First, an inertial measurement unit (IMU), which is fixed to the ribcage in such a way that the x axis points forward and the z axis points upward, reads the attitudes of the robot with respect to the inertial frame. Second, five magnetic encoders are located at the elbows, hips, and flapping joint to read the relative angles between the limbs with respect to the body. (B) Dynamic modulus analysis. Samples of membrane were mounted vertically in the dynamic modulus analyzer using tension clamps with ribbed grips to ensure that there was no slipping of the sample. Data were collected using controlled force analysis at a ramp rate of 0.05 N/min over the range 0.001 to 1.000 N. The temperature was held at 24.56°C. The estimated average modulus, ultimate tensile strength (UTS), and elongation are 0.0028 MPa, 0.81 MPa, and 439.27%, respectively. The average modulus and UTS along fiber direction are 11.33 and 17.35 MPa, respectively. (C) The custom-made silicone-based membrane and embedded carbon fibers.
System design
B2’s flight mechanism (shown in Fig. 3, A to C) consists of the left and right wings, each including a forelimb and a hindlimb mechanism. The left and right wings are coupled with a mechanical oscillator. A motor spins a crankshaft mechanism, which moves both wings synchronously dorsoventrally while each wing can move asynchronously mediolaterally. The hindlimbs that synthesize the trailing edge of the wings can move asynchronously and dorsoventrally. If it were not for mechanical couplings and constraints, the morphing mechanism of B2 would have nine DOFs. Because the physical constraints are present, four DOFs are coupled, yielding a five-DOF mechanism.
(A) B2’s flight mechanism and its DOFs. We introduced mechanical couplings in the armwing to synthesize a mechanism with a few DOFs. (B) The armwing retains only one actuated movement, which is a push-pull movement produced by a spindle mechanism hosted in the shoulder. (C) The leg mechanism. (D) B2’s electronics architecture. At the center, the microprocessor from STMicroelectronics communicates with several components, including an IMU from VectorNav Technologies, an SD card reader, five AS5048 Hall effect encoders, and two dual-port dc motor drivers. Two wireless communication devices, an eight-channel micro RC receiver (DSM2) and a Bluetooth device, make it possible to communicate with the host (Panel). The microprocessor has several peripherals, such as universal synchronous/asynchronous receiver/transmitter (USART), serial peripheral interface (SPI), pulse-width modulation (PWM), and secure digital input/output (SDIO). To test and deploy the controller on the platform, we used Hardware-in-the-Loop (HIL) simulation. In this method, a real-time computer is used as a virtual plant (model), and the flight controller, which is embedded on the physical microprocessor, responds to the state variables of the virtual model. In this way, the functionality of the controller is validated and debugged before being deployed on the vehicle.
The forelimbs (see Fig. 3B), which provide membranal mechanical support and morphing leverage, consist of nine links: the humeral (p0-p1), humeral support (p1-p2), radial (p1-p3), radial support (p4-p5), carpal (p3-p4), carpal support (p1-p5), and three digital links. Mobilizing this structure requires embedding rotation in the humerus, pronating rotation in the wrists, and abduction-adduction and flexion-extension in the digits. All of these require the active actuation of the shoulders, wrists, and finger knuckles, respectively.
A few attempts have been made to incorporate similar DOFs in an MAV. Researchers at Brown University have used string-and-pulley–based actuating mechanisms to articulate a robotic membranous wing (4). In their design, the wing is mounted on a support to avoid any installation of actuators on the robotic wing. In this support, a bundle that includes several strings is routed through the wing’s links. It is then connected to several motors incorporated in the support. This form of actuation makes it possible to realize several active joints in the robotic wing. However, such a method is not practical for a flying MAV because it requires heavy actuators to be installed in the ribcage. Unlike the robotic wing from (4), we introduced physical constraints (see Fig. 3, A to C) in B2 to synthesize a flight mechanism with a few actuated joints. These mechanical constraints follow.
Morphing wing flight apparatus
A three-link mechanism, where each link is connected to the next one with a revolute joint while one link is pivoted to a fixed support, is uniquely defined mathematically using three angles or configuration variables. Regulating the position and orientation of the end effector in the three-link mechanism implies direct control of the three revolute joints. Constraining the mechanism with three rigid links results in a one-DOF mechanism requiring only one actuator.
Each of the forelimbs is similar to this three-link mechanism, and their links are hinged to one another using rigid one-DOF revolute joints. The rotational movement of the humeral link around the fixed shoulder joint p0 is affected by linear movements of the point p2relative to the humeral shoulder joint. A linear motion of the humeral support link at the shoulder moves the radial link relative to the humeral link and results in elbow flexion-extension. Although humeral and radial links move with respect to each other, a relative motion of the outer digital link with respect to the radial link is realized as the elbow flexion-extension is projected to the carpal plate through the radial support link (see Fig. 3B).
The ball-and-socket universal joints at two ends of the support radial link facilitate the passive movements of the carpal plate in a pronating direction. In contrast to biological bats, which actively rotate their wrists, B2 has passive carpal rotations with respect to the radius.
Digital links I, II, and III are cantilevered to the carpal plate (p6, p7, and p8); they are flexible slender carbon fiber tubes that can passively flex and extend with respect to the carpal plate, meaning that they introduce passive DOFs in the flight mechanism. In addition to these passive flexion-extension movements, the digital links can passively abduct and adduct with respect to each other. The fingers have no knuckles, and their relative angle with respect to one another is predefined.
As a result, each of B2’s forelimbs has one actuated DOF that transforms the linear motion of its spindle mechanism into three active and biologically meaningful movements: (i) active humeral retraction-protraction (shoulder angle), (ii) active elbow flexion-extension (elbow angle), and (iii) active carpal abduction-adduction (wrist angle). The passive DOFs include carpal pronation, digital abduction-adduction, and flexion-extension.
In the case of the hindlimbs (legs), it is challenging to accurately quantify the aerodynamic consequences of leg absence or presence in bats and determine their influence on the produced aerodynamic lift and drag forces. This is because the movements of hindlimbs affect the membrane locally at the trailing edge of the wings, whereas at distal positions, wings are mostly influenced by forelimbs. However, legs can enhance the agility of flight by providing additional control of the left and right sides of the trailing edge of the membrane wing (31). Adjusting the vertical position of the legs with respect to the body has two major effects: (i) leg-induced wing camber and (ii) increasing the angle of attack locally at the tail. In other words, increasing the leg angle increases lift, drag, and pitching moment (31). In addition, there is another benefit to carefully controlled tail actuation: Drag notably decreases because tails prevent flow detachments and delay the onset of flow separation (32).
Benefiting from these aerodynamic effects, bats have unique mechanistic bases; the anatomical evolutions in their hindlimbs enable these mammals to actively use their hindlimbs during flight (33). In contrast to terrestrial mammals, the ball-and-socket joint that connects the femoral bone to the body is rotated in such a way that knee flexion moves the ankle dorsoventrally. This condition yields pronounced knee flexions ventrally.
From a kinematics standpoint, the sophisticated movements of ankles in bats include dorsoventral and mediolateral movements. Ankles move ventrally during the downstroke, and they start moving dorsally during the upstroke (33). Motivated by the roles of legs in bat flight, we implemented two asynchronously active legs for controlling the trailing edge of the membrane wing in the design of B2. We hinged each leg to the body by one-DOF revolute joints such that the produced dorsoventral movement happens in a plane that is tilted at an angle relative to the parasagittal plane (see Fig. 3C). Contrary to biological bats, B2’s legs have no mediolateral movements; Riskin et al. (6) suggest that such movements are less pronounced in biological bats. To map the linear movements of our actuation system to the dorsoventral movements of the legs, we used a three-bar linkage mechanism (34).
Anisotropic membranous wing
The articulated body of B2 yields a structure that cannot accommodate conventional fabric covering materials, such as unstretchable nylon films. Unstretchable materials resist the forelimb and leg movements. As a result, we covered the skeleton of our robot with a custom-made, ultrathin (56 μm), silicone-based membrane that is designed to match the elastic properties of biological bats’ membranes. In general, bat skin spans the body such that it is anchored to forelimbs, digital bones, and hindlimbs. This yields a morphing mechanism with soft wings, which is driven by the movements of the limbs. These compliant and anisotropic structures with internal tensile forces in dorsoventral and mediolateral directions have elastin fiber bundles, which provide an extensibility and self-folding (self-packing) property to the wing membrane (35).
Reverse engineering all of these characteristics is not feasible from an engineering fabrication standpoint; therefore, we focused our attention on a few properties of the membrane wing. In producing such a membranous wing, we studied the anatomical properties of bats’ biological skin and found the key features to be (i) weight per unit of area (area density), (ii) tensile modulus, and (iii) stretchability (see Fig. 2, B and C). The area density is important because high-density membranes distributed across the robot’s skeleton increase the wing’s moment of inertia along the flapping axis and the overall payload of B2. In addition, internal tensile forces introduced by the membrane to the system are important because the micro motors used in the robot have limited torque outputs. When the pretension forces become large, the stall condition emerges in the actuators. This can damage the motor as well as the power electronics. The stretchability of the membrane defines the capacity of the wing to fold and unfold mediolaterally within the range of movement of actuators so that undesirable skin wrinkles or ruptures are avoided.
To produce an ultrathin and stretchable skin, we used two ultraflat metal sheets with a 10-μm flatness precision to sandwich our silicone materials. This ensures an even and consistent pressure distribution profile on the material. We synthesized a polymer in which two components—one containing a catalyst and the other containing polyorganosiloxanes with hydride functional groups—began vulcanization in the laboratory environment. The first component is a mixture of 65 to 75% by weight polyorganosiloxanes and 20 to 25% amorphous silica, and the second component is a mixture of 75 to 85% polyorganosiloxanes, 20 to 25% amorphous silica, and less than 0.1% platinum-siloxane complex. Platinum-siloxane is a catalyst for polymer chain growth. The Si–O bond length is about 1.68 Å with a bond angle of 130°, whereas the C–C bond found in most conventional polymers is about 1.54 Å with a 112° bond angle. Because of these geometric factors, silicone polymers exhibit a greater percentage of elongation and flexibility than carbon backbone polymers. However, silica is heavier than carbon, which could potentially make the wing too heavy and too rigid for flight. To solve this problem, we added hexamethyldisiloxane, which reduces the thickness and viscosity of the silicone, in an experimentally determined ratio.
Virtual constraints and feedback control
A crucial but unseen component of B2 is its flight control supported by its onboard sensors, high-performance micromotors with encoder feedback, and a microprocessor (see Fig. 3D). B2 and conventional flying robots such as fixed-wing and rotary-wing robots are analogous in that they all rely on oscillatory modulations of the magnitude and direction of aerodynamic forces. However, their flight control schemes are different. Conventional fixed-wing MAVs are often controlled by thrust and conventional control surfaces such as elevators, ailerons, and rudders. In contrast, B2 has nine active oscillatory joints (five of which are independent) in comparison to six DOFs (attitude and position) that are actively controlled. In other words, the control design requires suitable allocation of the control efforts to the joints.
In addition, challenges in flight control synthesis for B2 have roots in the nonlinear nature of the forces that act on it. B2, similar to fruit bats in size and mass (wing span, 30 to 40 cm; mass, 50 to 150 g), is capable of achieving a flapping frequency that is lower than or equal to its natural body response; as a result, it is often affected by nonlinear inertial and aerodynamic artifacts. Such forces often appear as nonlinear and nonaffine in-control terms in the equations of motion (36). Therefore, conventional approximation methods that assume flapping frequency to be much faster than the body dynamic response, such as the celebrated method of averaging, commonly applied to insect-scale flapping flight (10, 11), fail to make accurate predictions of the system’s behavior.
The approach taken in this paper is to asymptotically impose virtual constraints (holonomic constraints) on B2’s dynamic system through closed-loop feedback. This concept has a long history, but its application in nonlinear control theory is primarily due to the work of Isidori et al. (37, 38). The advantage of imposing these constraints through closed-loop feedback (software) rather than physically (hardware) is that B2’s wing configurations can be adjusted and modified during the flight. We have tested this concept on B2 to generate cruise flights, bank turning, and sharp diving maneuvers, and we anticipate that this can potentially help reconstruct the adaptive properties of bat flight for other maneuvers. For instance, bats use tip reversal at low flight speeds (hovering) to produce thrust and weight support, and the stroke plane becomes perpendicular to the body at higher flight speeds (39).
We parameterized the morphing structure of B2 by several configuration variables. The configuration variable vector qmorph, which defines the morphology of the forelimb and hindlimb as they evolve through the action of actuated coordinates, embodies nine biologically meaningful DOFs
where describes the retraction-protraction angle, is the radial flexion-extension angle, is the abduction-adduction angle of the carpus, qFL is the flapping angle, and is the dorsoventral movement of the hindlimb (see Fig. 3, B and C). Here, the superscript i denotes the right (R) or left (L) joint angles. The mechanical constraints described earlier yield a nonlinear map from actuated joint angles
to the morphology configuration variable vector qmorph. The spindle action shown in Fig. 3B is denoted by . The nonlinear map is explained mathematically in (40), which reflects two loops made by (p0-p1-p2) and (p1-p3-p4-p5), as shown in Fig. 3B. We used these configuration variables to develop B2’s nonlinear dynamic model and predefined actuator trajectories; see Materials and Methods and (40).
Now, the virtual constraints are given by
where rdes is the time-varying desired trajectory associated with the actuated coordinates, t is time, and β is the vector of the wing kinematic parameters explained in Materials and Methods. Once the virtual constraints (N) are enforced, the posture of B2 varies because the actuated portion of the system now implicitly follows the time-varying trajectory rdes. To design rdes, we precomputed the time evolution of B2’s joint trajectories for N = 0. We applied numerically stable approaches to guarantee that these trajectory evolutions take place on a constraint manifold (see Materials and Methods). Then, we used a finite-state nonlinear optimizer to shape these constraints subject to a series of predefined conditions (40).
The stability of the designed periodic solutions can be checked by inspecting the eigenvalues of the monodromy matrix [Eq. 22 in (40)] after defining a Poincaré map P and a Poincaré section (40). We computed the monodromy matrix by using a central difference scheme. We perturbed our system states around the equilibrium point at the beginning of the flapping cycle and then integrated the system dynamics given in Eqs. 10 and 16 throughout one flapping cycle.
To stabilize the designed periodic solution, we augmented the desired trajectory rdes with a correction term ,
where δβ is computed by Eq. 7. The Poincaré return map takes the robot states qk and (the Euler angles roll, pitch, and yaw and their rates) at the beginning of the kth flapping cycle and leads to the states at the beginning of the next flapping cycle,
We linearized the map P at , resulting in a dynamic system that describes the periodic behavior of the system at the beginning of each flapping cycle
where (*) denotes the equilibrium points and denotes deviations from the equilibrium points. The changes in the kinematic parameters are denoted by δβ. Here, the stability analysis of the periodic trajectories of the bat robot is relaxed to the stability analysis of the equilibrium of the linearized Poincaré return map on [see (40)]. As a result, classical feedback design tools can be applied to stabilize the system. We computed a constant state feedback gain matrix such that the closed-loop linearized map is exponentially stable:
We used this state feedback policy at the beginning of each flapping cycle to update the kinematic parameters as follows:
In Fig. 4C, the controller architecture is shown. The controller consists of two parts: (i) the discrete controller that updates the kinematic parameters β at ≈10 Hz and (ii) the morphing controller that enforces the predefined trajectories rdes and loops at 100 Hz.
Fig. 4 Untethered flights and controller architecture.
(A) Snapshots of a zero-path straight flight. (B) Snapshots of a diving maneuver. (C) The main controller consists of the discrete (C1) and morphing controllers (C2). The discrete and morphing controllers are updated through sensor measurements H1 and H2 at 10 and 100 Hz, respectively. The subsystems S1, S2, and S3 are the underactuated, actuated, and aerodynamic parts [see Materials and Methods and (40)].
Next, we used joint movements (, , , and ) to flex (extend) the armwings or ascend (descend) the legs and reconstructed two flight maneuvers: (i) a banking turn and (ii) a swoop maneuver. These joint motions were realized by modifying the term bi in the actuator-desired trajectories (Eq. 12 in Materials and Methods).
Banking turn maneuver
We performed extensive untethered flight experiments in a large indoor space (Stock Pavilion at the University of Illinois in Champaign-Urbana) where we could use a net (30 m by 30 m) to protect the sensitive electronics of B2 at the moment of landing. The flight arena was not equipped with any motion capture system. Although the vehicle landing position was adjusted by an operator to secure landings within the area, which is covered by the net, we landed outside the net many times. The launching task was performed by a human operator, thereby adding to the degree of inconsistency of the launches.
In all of these experiments, at the launch moment, the system reached its maximum flapping speed (≈10 Hz). In Fig. 5A, the time evolution of the roll angle qx sampled at 100 Hz is shown. The hand launch introduced initial perturbations, which considerably affected the first 10 wingbeats. Despite the external perturbations of the launch moment, the vehicle stabilized the roll angle within 20 wingbeats. This time envelope is denoted by Δtstab and is shown by the red region. Then, the operator sends a turn command, which is shown by the blue region. Immediately after sending the command, the roll angle increased, indicating a turn toward the right wing. The first flight test, which is shown in a solid black line and highlighted with green, does not follow the increase trend because the turn command was not applied for comparison purposes in this experiment.
Fig. 5The time evolution of the Euler angles roll qx, pitch qy, and yaw qz for eight flight tests is shown.
(A and B) The roll and pitch angles converge to a bounded neighborhood of 0° despite perturbations at the launch moment. The red region represents the time envelope required for vehicle stabilization and is denoted by Δtstab. For all of the flight experiments except the first [denoted by S.F. (straight flight) and highlighted by the green region], a bank turn command was sent at a time within the blue range. Then, the roll and pitch angles start to increase, indicating the beginning of the bank turn. (C) The behavior of the yaw angle. In the red region, vehicle heading is stabilized (except flight tests 1 and 4). In the blue region, the vehicle starts to turn toward the right armwing (negative heading rate). This behavior is not seen in the straight flight.
In Figs. 6 and 7, the morphing joint angles , , , and for these flight tests are reported. These joint angles were recorded by the onboard Hall effect sensors and were sampled at 100 Hz. As Fig. 6 (A to D) suggests, the controller achieves positive roll angle in the blue region by flexing the right armwing and extending the left arm wing.

Fig. 6 Arm wing joint angle time evolution.


Left and right armwing angles (A and B) and (C and D) are shown for eight flight tests. (A and C) Closeup views for the stabilization time envelope. The red region represents the joint movement during the stabilization time envelope. (B and D) After the stabilization time envelope, for all of the flight experiments except the first (highlighted with green), a bank turn command was sent at a time within the blue range.
Fig. 7 Leg joint angle time evolution.
Left and right leg angles  (A and B) and  (Cand D) are shown for eight flight tests. (A and C) Closeup views for the stabilization time envelope. (B and D) After the stabilization time envelope, the dorsal movement of the legs are applied to secure a successful belly landing. This dorsal movement can cause pitch-up artifacts, which are extremely nonlinear.

In Fig. 5 (B and C), the time evolutions of the Euler angles qy and qzare shown. Like the roll angle, the pitch angle was settled within a bounded neighborhood of 0 in the red region. At the moment of the banking turn (blue region), pitch-up artifacts appeared because of extreme nonlinear couplings between the roll and pitch dynamics. In addition, these pitch-ups, to some extent, are the result of the dorsal movement of the legs, which are applied to secure a successful belly landing (see Fig. 7, A to D). The straight flight pitch angle behaved differently because there were no sharp rises in the pitch angle in the blue region. In Fig. 5C, it is easy to observe that for all of the flight tests (except the straight flight), the rate of changes in the heading angle increased after the turn command is applied, suggesting the onset of the bank turning.

Diving maneuver
Next, a sharp diving maneuver, which is performed by bats when pursuing their prey, was reconstructed. Insectivorous echolocating bats face a sophisticated array of defenses used by their airborne prey. One such insect defense is the ultrasound-triggered dive, which is a sudden, rapid drop in altitude, sometimes all the way to the ground.
We tried to reconstruct this maneuver by triggering a sharp pitch-down motion at mid-flight. After launching the robot, the operator sent the command, which resulted in a sharp ventral movement of the legs (shown in Fig. 8C). Meanwhile, the armwings are stretched (shown in Fig. 8B). In Fig. 8A, a sharp rise of the pitch angle is noticeable. The vehicle swooped and reached a peak velocity of about 14 m/s. This extreme agile maneuver testifies to the level of attitude instability in B2.

Fig. 8 Joint angle evolution during swooping down.
(A) The time evolution of the Euler angles during the diving maneuver. (B) Armwing joint angles. (C) Leg joint angles. The red region indicates the stabilization time envelope; the light blue region indicates the dive time span.

Flight characteristics

B2’s flight characteristics are compared with Rousettus aegyptiacus flight information from (41). R. aegyptiacus flight information corresponds to the flight speed U that is within the range of 3 to 5 m/s. B2’s morphological details, which are presented in table S1, are used to compute B2’s flight characteristics. According to Rosén et al. (28), the arc length traveled by the wingtip stip is given by stip = 2ψbs, where ψ and bs are the flapping stroke angle and wingspan, respectively (stip,B2 = 0.48 and stip,Rous. = 0.36). A motion capture system (shown in fig. S3) was used to register the position coordinates px and py for four untethered flight tests (see fig. S2). The flight speed was calculated by taking the time derivative of pxand py. We considered the average flight speed B2 = 5.6 m/s in the succeeding calculations.
The measure K (28), which is similar to the reduced frequency and is computed on the basis of the wingtip speed, is given by K = stip/tf/U, where tf is the time span of a single wingbeat (KB2 = 0.86 and KRous. = 0.81). Subsequently, the advance ratio J is equal to the inverse of the measure K (JB2 = 1.16 and JRous. = 1.22). The wing loading Qs is given by Qs = Mbg/Smax (41), where Mb is the total body mass, g is the gravitational constant, and Smax is the maximum wing area (Qs,B2 = 13 N/m2 and Qs,Rous. = 11 N/m2).
The Strouhal number St is given by St = Δztip/tf/U (41), where Δztipis the vertical displacement of the wingtip with respect to the shoulder (28) (StB2 = 0.43 and StRous. = 0.4 − 0.6). Last, the nominal coefficient of lift Cl is computed. The coefficient is given by from (41), where ρair is the density of dry air, Vc is the velocity of the carpus (see Fig. 3B), and Fvert is the magnitude of the vertical lift force (see fig. S4). We measured Fvert by installing the robot on top of a miniature load cell, which is inside a wind tunnel. The wind tunnel is programmed to sustain air velocity at 4 to 6 m/s (Cl,B2 = 0.8 and Cl,Rous. = 1.0).
Bats are known to demonstrate exceptionally agile maneuvers thanks to many joints that are embedded in their flight mechanism, which synthesize sophisticated and functionally versatile dynamic wing conformations. Bats represent a unique solution to the challenges of maneuverable flapping flight and provide inspiration for vehicle design at bat-length scales.
The difficulties associated with reconstructing bat-inspired flight are exacerbated by the inherent complexities associated with the design of such bat robots. Consequently, we have identified and implemented the most important wing joints by means of a series of mechanical constraints and a feedback control design to control the six-DOF flight motion of the bat robot called B2.
The main results of this study are fourfold.

  • First, for robotics, this work demonstrates the synergistic design and flight control of an aerial robot with dynamic wing conformations similar to those of biological bats. Conventional flapping wing platforms have wings with few joints, which can be conceptualized as rigid bodies. These platforms often use conventional fixed-wing airplane control surfaces (e.g., rudders, ailerons, etc.); therefore, these robots are not suitable for examining the flight mechanisms of biological counterparts with nontrivial morphologies.This work has demonstrated several autonomous flight maneuvers (zero-path flight, banking turn, and diving) of a self-contained robotic platform that has fundamentally distinguished control arrays in comparison to existing flapping robots. B2 uses a morphing skeleton array wherein the use of a silicone-based skin enables the robot to morph its articulated structure in midair without losing an effective and smooth aerodynamic surface. This morphing property will not be realized with conventional fabrics (e.g., nylon and Mylar) that are primarily used in flapping wing research.
  • Next, for dynamics and control, this work applies the notion of stable periodic orbits to study aerial locomotion of B2, whose unstable flight dynamics are aggravated by the flexibility of the wings. The technique used in the paper can simplify stability analysis by establishing equivalence between the stability of a periodic orbit and a linearized Poincaré map.
  • Third, this work introduces a design scheme (as shown in Fig. 1) to mimic the key flight mechanisms of biological counterparts. There is no well-established methodology for reverse engineering the sophisticated locomotion of biological counterparts. These animals have several active and passive joints that make it impractical to incorporate all of them in the design. The framework that is introduced in this study accommodates the key DOFs of bat wings and legs in a 93-g flying robot with tight payload and size restrictions. These DOFs include the retraction-protraction of the shoulders, flexion-extension of the elbows, abduction-adduction of the wrists, and dorsoventral movement of the legs. The design framework is staged in two steps: introducing mechanical constraints motivated by PCA of bat flight kinematics and designing virtual constraints motivated by holonomically constrained mechanical systems.
  • Last but not least, this research contributes to biological studies on bat flight. The existing methods for biology rely on vision-based motion capture systems that use high-speed imaging sensors to record the trajectory of joints and limbs during bat flight. Although these approaches can effectively analyze the joint kinematics of bat wings in flight, they cannot help understand how specific DOFs or specific wing movement patterns contribute to a particular flight maneuver of a bat. B2 can be used to reconstruct flight maneuvers of bats by applying wing movement patterns observed in bat flight, thereby helping us understand the role of the dominant DOFs of bats. In this work, we have demonstrated the effectiveness of using this robot to reproduce flight maneuvers such as straight flight, banking turn, and diving flight. Motivated by previous biological studies such as that by Gardiner et al. (42), which inspects the role of the legs in modulating the pitch movement of bat flight, we have successfully implemented the dorsoventral movement control of the legs of B2 to produce a sharp diving maneuver or to maintain a straight path. Furthermore, in this work, bank turn maneuvers of bats (23) have been successfully reconstructed by controlling asymmetric wing folding of the two main wings. The self-sufficiency of an autonomous robotic platform in sensing, actuation, and computation permits extensive analysis of dynamic system responses. In other words, thorough and effective inspection of the key DOFs in bat flight is possible by selectively perturbing these joint angles of the robot and analyzing the response. It is the presence of several varying parameters in bat flight kinematics that hinders such a systematic analysis. Consequently, we envision the potential applications of our robotic platform as an important tool for studying bat flight in the context of robotic-inspired biology.
Nonlinear dynamics
The mathematical dynamic model of B2 is developed using the Lagrange method (36) after computing kinetic and potential energies. Rotary and translational kinetic energies are evaluated after defining the position and attitude of the body with respect to the inertial frame. Euler angles are used to define the attitude of the robot with respect to the inertial frame, whereas body coordinate frames, which are attached to the wings, define the wing movements with respect to the body coordinate frame.
Modeling assumptions
The following assumptions are made during the nonlinear dynamic modeling:
  1. (1) Wing inertial forces are considered because the wings are not massless.
  2. (2) There is no spanwise and chordwise flexibility in the wings; that is, it is a rigid flapping wing aircraft. Therefore, there is no flexibility-induced phase difference between flapping and feathering motions, and no degrees of underactuation are introduced as a result of passive phase difference between the flapping and feathering (pitch) motions.
  3. (3) Strip theory (43) is used for computing aerodynamic forces and moments.
  4. (4) The aerodynamic center is assumed to be located at the quarter-chord point (31), and the aerodynamic forces, which act on the aerodynamic center, include the lift and drag forces.
Method of Lagrange
During free-fall ballistic motions, B2 with its links and joints represents an open kinematic chain that evolves under the influence of gravitational and external aerodynamic forces. We used the method of Lagrange to mathematically define this dynamics. This open kinematic chain is uniquely determined with the fuselage Euler angles roll, pitch, and yaw (qx; qy; qz); fuselage center of mass (CoM) positions (px; py; pz); and morphing joint angles qmorph defined in Eq. 1. Therefore, the robot’s configuration variable vector is


where is the robot’s configuration variable space. We derived Lagrange equations after computing the total energy of the free open kinematic chain as the difference between the total kinetic energy and the total potential energy. Following Hamilton’s principle of least action, the equations of motion for the open kinematic chain with ballistic motions are given by:


where , , and denote the inertial matrix, the Coriolis matrix, and the gravity vector, respectively. The generalized forces , which reflect the role of aerodynamic forces as well the action of several morphing motors in B2, are described in (40).

Virtual constraints and offline actuator trajectory design
For wing articulations, we use a framework based on defining a set of parameterized and time-varying holonomic constraints (37, 38). This method permits shaping of the overall system dynamics through such constraints. These holonomic constraints control the posture of the articulated flight mechanism by driving the actuated portion of the system and take place through the action of the servo actuators that are embedded in the robot.
We partitioned the configuration variable vector q into the actuated coordinates qact and the remaining coordinates , which includes Euler angles and body CoM positions. The dynamics (Eq. 9) are rewritten as

In the equation above, , ,Embedded Image , , , , and are block matrices; Embedded Image and are two components of the generalized forces (40). The nonlinear system in Eq. 10 shows that the actuated and unactuated dynamics are coupled by the inertial, Coriolis, gravity, and aerodynamic terms.
The actuated dynamics represent the servo actuators in the robot. The action of these actuators is described by introducing parameterized and time-varying holonomic constraints into the dynamic system. To shape the actuated coordinates, we defined a constraint manifold, and we used numerically stable approaches to enforce the evolution of the trajectories on this manifold. Thereafter, a finite-state nonlinear optimizer shapes these constraints.
The servo actuators move the links to the desired positions. This is similar to the behavior of a holonomically constrained mechanical system and, mathematically speaking, is equivalent to the time evolution of the system dynamics given by Eq. 10 over the manifold(11)where N is the constraint equation and is given by N(t, β, qact) = qact− rdes(t, β). In the constraint equation, rdes is the vector of the desired trajectories for the actuated coordinates qact and is given by(12)where t denotes time and β = {ω, ϕi, ai, bi} parameterizes the periodic actuator trajectories that define the wing motion. These parameters are the control input to the system. Imposing the constraint equations to the system dynamics (Eq. 10) at only the acceleration level will lead to numeric problems owing to the difficulties of obtaining accurate position and velocity initial values (44). In addition, numeric discretization errors will be present during the process of integration, and the constraints will not be satisfied. Therefore, the constraints in position and velocity levels are also considered (45)

Embedded Image (13)

where κ1,2 are two constant matrices and

Substituting Eqs. 14 and 15 to Eq. 13 gives . Now, interlocking Eq. 10 and forms the following system of ordinary differential equations on a parameterized manifold:

Now, the numeric integration of the above differential-algebraic equation (DAE) is possible, and consequently, it is possible to design predefined periodic trajectories for the actuators. In (40), we have used finite-state optimization and shooting methods to design periodic solutions for the DAE.
To verify the accuracy of the proposed nonlinear dynamic model in predicting the behavior of the vehicle, we compared the trajectories from eight different flight experiments with the model-predicted trajectories. In fig. S1, the time evolution of the pitch angle qy and pitch rate angle is shown.
Supplementary Text
Fig. S1. Nonlinear model verification.
Fig. S2. Flight speed measurements.
Fig. S3. Motion capture system.
Fig. S4. Wind tunnel measurements.
Table S1. B2’s morphological details.
Movie S1. Membrane.
Movie S4. Swoop maneuver.
References (4659)
  1. K. Ma, P. Chirarattanon, S. Fulsler, R. Wood, Controlled flight of a biologically inspired, insect-scale robot. Science 340, 603–607 (2013). Abstract/FREE Full TextGoogle Scholar
  2. A. Paranjape, S.-J. Chung, J. Kim, Novel dihedral-based control of flapping-wing aircraft with application to perching. IEEE Trans. Robot. 29,1071–1084 (2013). Google Scholar
  3. J. W. Gerdes, S. K. Gupta, S. A. Wilkerson, A review of bird-inspired flapping wing miniature air vehicle designs. J. Mech. Robot. 4, 021003(2012). Google Scholar
  4. J. W. Bahlman, S. M. Swartz, K. S. Breuer, Design and characterization of a multi-articulated robotic bat wing. Bioinspir. Biomim. 8, 016009(2013). Google Scholar
  5. S.-J. Chung, M. Dorothy, Neurobiologically inspired control of engineered flapping flight. J. Guid. Control Dyn. 33, 440–453 (2010). CrossRefGoogle Scholar
  6. D. K. Riskin, D. J. Willis, J. Iriarte-Díaz, T. L. Hedrick, M. Kostandov, J.Chen, D. H. Laidlaw, K. S. Breuer, S. M. Swartz, Quantifying the complexity of bat wing kinematics. J. Theor. Biol. 254, 604–615 (2008). CrossRefPubMedWeb of ScienceGoogle Scholar
  7. S. M. Swartz, J. Iriarte-Diaz, D. K. Riskin, A. Song, X. Tian, D. J. Willis, K. S. Breuer, Wing structure and the aerodynamic basis of flight in bats, paper presented at 45th AIAA Aerospace Sciences Meeting and Exhibit, 8 to 11 January 2007, Reno, NV (2007);
  8. A. Azuma, The Biokinetics of Flying and Swimming (Springer Science & Business Media, 2012).
  9. X. Tian, J. Iriarte-Diaz, K. Middleton, R. Galvao, E. Israeli, A. Roemer, A. Sullivan, A. Song, S. Swartz, K. Breuer, Direct measurements of the kinematics and dynamics of bat flight. Bioinspir. Biomim. 1, S10–S18(2006). CrossRefPubMedGoogle Scholar
  10. X. Deng, L. Schenato, W. C. Wu, S. S. Sastry, Flapping flight for biomimetic robotic insects: Part I-system modeling. IEEE Trans. Robot.22, 776–788 (2006). CrossRefWeb of ScienceGoogle Scholar
  11. X. Deng, L. Schenato, S. S. Sastry, Flapping flight for biomimetic robotic insects: Part II-flight control design. IEEE Trans. Robot. 22,789–803 (2006). CrossRefWeb of ScienceGoogle Scholar
  12. R. J. Wood, S. Avadhanula, E. Steltz, M. Seeman, J. Entwistle, A.Bachrach, G. Barrows, S. Sanders, R. S. Fearing, Enabling technologies and subsystem integration for an autonomous palm-sized glider. IEEE Robot. Autom. Mag. 14, 82–91 (2007). Google Scholar
  13. R. J. Wood, The first takeoff of a biologically inspired at-scale robotic insect. IEEE Trans. Robot. 24, 341–347 (2008). CrossRefWeb of ScienceGoogle Scholar
  14. D. B. Doman, C. Tang, S. Regisford, Modeling interactions between flexible flapping-wing spars, mechanisms, and drive motors. J. Guid. Control Dyn.34, 1457–1473 (2011). Google Scholar
  15. I. Faruque, J. Sean Humbert, Dipteran insect flight dynamics. Part 1 Longitudinal motion about hover. J. Theor. Biol. 264, 538–552 (2010). CrossRefPubMedWeb of ScienceGoogle Scholar
  16. J. Dietsch, Air and sea robots add new perspectives to the global knowledge base. IEEE Robot. Autom. Mag. 18, 8–9 (2011). Google Scholar
  17. S. A. Combes, T. L. Daniel, Shape, flapping and flexion: Wing and fin design for forward flight. J. Exp. Biol. 204, 2073 (2001). PubMedWeb of ScienceGoogle Scholar
  18. S. A. Combes, T. L. Daniel, Into thin air: Contributions of aerodynamic and inertial-elastic forces to wing bending in the hawkmoth Manduca sexta. J. Exp. Biol. 206, 2999–3006 (2003). Abstract/FREE Full TextGoogle Scholar
  19. S. Lupashin, A. Schöllig, M. Sherback, R. D’Andrea, A simple learning strategy for high-speed quadrocopter multi-flips, in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2010), pp. 1642–1648.
  20. N. Michael, D. Mellinger, Q. Lindsey, V. Kumar, The grasp multiple micro-UAV testbed. IEEE Robot. Autom. Mag. 17, 56–65 (2010). Google Scholar
  21. H. D. Aldridge, Kinematics and aerodynamics of the greater horseshoe bat, Rhinolophus ferrumequinum, in horizontal flight at a various flight speeds. J. Exp. Biol. 126, 479–497 (1986). Abstract/FREE Full TextGoogle Scholar
  22. H. D. Aldridge, Body accelerations during the wingbeat in six bat species: The function of the upstroke in thrust generation. J. Exp. Biol.130, 275–293 (1987). Abstract/FREE Full TextGoogle Scholar
  23. U. M. Norberg, Some advanced flight manoeuvres of bats. J. Exp. Biol.64, 489–495 (1976). Abstract/FREE Full TextGoogle Scholar
  24. C. Chevallereau, G. Abba, Y. Aoustin, F. Plestan, E. R. Westervelt, C.Canudas-de-wit, J. W. Grizzle, Rabbit: A testbed for advanced control theory. IEEE Control Syst. Mag. 23, 57–79 (2003). CrossRefWeb of ScienceGoogle Scholar
  25. Y. P. Ivanenko, A. d’Avella, R. E. Poppele, F. Lacquaniti, On the origin of planar covariation of elevation angles during human locomotion. J. Neurophysiol. 99, 1890–1898 (2008). Abstract/FREE Full TextGoogle Scholar
  26. T. Chau, A review of analytical techniques for gait data. Part 1: Fuzzy, statistical and fractal methods. Gait Posture 13, 49–66 (2001). CrossRefPubMedWeb of ScienceGoogle Scholar
  27. G. Cappellini, Y. P. Ivanenko, R. E. Poppele, F. Lacquaniti, Motor patterns in human walking and running. J. Neurophysiol. 95, 3426–3437 (2006). Abstract/FREE Full TextGoogle Scholar
  28. M. Rosén, G. Spedding, A. Hedenström, The relationship between wingbeat kinematics and vortex wake of a thrush nightingale. J. Exp. Biol.207, 4255–4268 (2004). Abstract/FREE Full TextGoogle Scholar
  29. N. A. Bernstein, The Coordination and Regulation of Movements(Pergamon Press, 1967).
  30. J. Hoff, A. Ramezani, S.-J. Chung, S. Hutchinson, Synergistic design of a bio-inspired micro aerial vehicle with articulated wings. Proc. Rob. Sci. Syst. 10.15607/RSS.2016.XII.009 (2016). Google Scholar
  31. A. L. Thomas, G. K. Taylor, Animal flight dynamics I. Stability in gliding flight. J. Theor. Biol. 212, 399–424 (2001). CrossRefPubMedWeb of ScienceGoogle Scholar
  32. W. Maybury, J. Rayner, L. B. Couldrick, Lift generation by the avian tail. Proc. Biol. Sci. 268, 1443–1448 (2001). Abstract/FREE Full TextGoogle Scholar
  33. J. A. Cheney, D. Ton, N. Konow, D. K. Riskin, K. S. Breuer, S. M. Swartz,Hindlimb motion during steady flight of the lesser dog-faced fruit bat,Cynopterus brachyotis. PLOS ONE 9, e98093 (2014). CrossRefPubMedGoogle Scholar
  34. A. Ramezani, X. Shi, S.-J. Chung, S. Hutchinson, Bat Bot (B2), A biologically inspired flying machine, in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2016), pp. 3219–3226.
  35. H. Tanaka, H. Okada, Y. Shimasue, H. Liu, Flexible flapping wings with self-organized microwrinkles. Bioinspir. Biomim. 10, 046005 (2015). Google Scholar
  36. A. Ramezani, X. Shi, S.-J. Chung, S. Hutchinson, Lagrangian modeling and flight control of articulated-winged bat robot, in Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, 2015), pp. 2867–2874.
  37. C. I. Byrnes, A. Isidori, A frequency domain philosophy for nonlinear systems, in Proceedings of the IEEE Conference on Decision and Control(IEEE, 1984), pp. 1569–1573.
  38. A. Isidori, C. Moog, On the nonlinear equivalent of the notion of transmission zeros, in Modelling and Adaptive Control, C. I. Byrnes, A. Kurzhanski, Eds. (Springer, 1988), pp. 146–158.
  39. M. Wolf, L. C. Johansson, R. von Busse, Y. Winter, A. Hedenström,Kinematics of flight and the relationship to the vortex wake of a Pallas’ long tongued bat (Glossophaga soricina). J. Exp. Biol. 213, 2142–2153(2010). Abstract/FREE Full TextGoogle Scholar 
  40. Materials and methods are available as supplementary materials at the Science website.
  41. D. K. Riskin, J. Iriarte-Daz, K. M. Middleton, K. S. Breuer, S. M. Swartz,The effect of body size on the wing movements of pteropodid bats, with insights into thrust and lift production. J. Exp. Biol. 213, 4110–4122(2010). Abstract/FREE Full TextGoogle Scholar
  42. J. D. Gardiner, G. Dimitriadis, J. R. Codd, R. L. Nudds, A potential role for bat tail membranes in flight control. PLOS ONE 6, e18214 (2011). CrossRefPubMedGoogle Scholar
  43. J. D. DeLaurier, An aerodynamic model for flapping-wing flight.Aeronaut. J. 97, 125–130 (1993). Google Scholar
  44. U. M. Ascher, H. Chin, L. R. Petzold, S. Reich, Stabilization of constrained mechanical systems with daes and invariant manifolds. J. Struct. Mech. 23, 135–157 (1995). Google Scholar
  45. C. Führer, B. J. Leimkuhler, Numerical solution of differential-algebraic equations for constrained mechanical motion. Numer. Math. 59, 55–69(1991). CrossRefGoogle Scholar
  46. A. H. Nayfeh, Perturbation methods in nonlinear dynamics, inNonlinear Dynamics Aspects of Accelerators. Lecture Notes in Physics, J. M. Jowett, M. Month, S. Turner, Eds. (Springer, 1986), pp. 238–314.
  47. M. Goman, A. Khrabrov, State-space representation of aerodynamic characteristics of an aircraft at high angles of attack. J. Aircr. 31,1109–1115 (1994). Google Scholar
  48. A. A. Paranjape, S.-J. Chung, H. H. Hilton, A. Chakravarthy, Dynamics and performance of tailless micro aerial vehicle with flexible articulated wings.AIAA J. 50, 1177–1188 (2012). Google Scholar
  49. H. K. Khalil, J. Grizzle, Nonlinear Systems (Prentice Hall, 1996).
  50. G. Meurant, An Introduction to Differentiable Manifolds and Riemannian Geometry, vol. 120 (Academic Press, 1986).
  51. R. R. Burridge, A. A. Rizzi, D. E. Koditschek, Sequential composition of dynamically dexterous robot behaviors. Int. J. Rob. Res. 18, 534–555(1999). Google Scholar
  52. J. B. Dingwell, J. P. Cusumano, Nonlinear time series analysis of normal and pathological human walking. Chaos 10, 848–863 (2000). CrossRefPubMedWeb of ScienceGoogle Scholar
  53. M. S. Garcia, “Stability, scaling, and chaos in passive-dynamic gait models,” thesis, Cornell University, Ithaca, NY (1999).
  54. J. Guckenheimer, S. Johnson, International Hybrid Systems Workshop(Springer, 1994), pp. 202–225.
  55. Y. Hurmuzlu, C. Basdogan, J. J. Carollo, Presenting joint kinematics of human locomotion using phase plane portraits and Poincaré maps. J. Biomech. 27, 1495–1499 (1994). CrossRefPubMedGoogle Scholar
  56. S. G. Nersesov, V. Chellaboina, W. M. Haddad, A generalization of Poincaré’s theorem to hybrid and impulsive dynamical systems, inProceedings of the American Control Conference (IEEE, 2002), pp. 1240–1245.
  57. T. S. Parker, L. Chua, Practical Numerical Algorithms for Chaotic Systems(Springer Science & Business Media, 2012).
  58. B. Thuilot, A. Goswami, B. Espiau, Bifurcation and chaos in a simple passive bipedal gait, in Proceedings of the IEEE International Conference on Robotics and Automation (IEEE, 1997), pp. 792–798.
  59. E. R. Westervelt, J. W. Grizzle, C. Chevallereau, J. H. Choi, B. Morris,Feedback Control of Dynamic Bipedal Robot Locomotion (CRC Press, 2007).


We thank the team of graduate and undergraduate students from the aerospace, electrical, computer, and mechanical engineering departments for their contribution in constructing the prototype of B2 at the University of Illinois at Urbana-Champaign. In particular, we are indebted to Ph.D. students X. Shi (for hardware developments), J. Hoff (for wing kinematic analysis), and S. U. Ahmed (for helping with flight experiments). We extend our appreciation to our collaborators S. Swartz, K. S. Breuer, and H. Vejdani at Brown University for helping us to better understand the key mechanisms of bat flight.Funding: 
This work was supported by NSF (grant 1427111).

Author contributions: 
A.R., S.-J.C., and S.H. designed B2. A.R., S.-J.C., and S.H. designed control experiments, analyzed, and interpreted the data. A.R. constructed B2 and designed its controller with critical feedback from S.-J.C., and S.H. A.R. performed flight experiments. All authors prepared the manuscript.

Competing interests: 
The authors declare that they have no competing interests. Data and materials availability:Please contact S.-J.C. for data and other materials.

Copyright © 2017, American Association for the Advancement of Science

Top 10 Hot Artificial Intelligence (AI) Technologies

By Hugo Angel,

forrester-ai-technologiesThe market for artificial intelligence (AI) technologies is flourishing. Beyond the hype and the heightened media attention, the numerous startups and the internet giants racing to acquire them, there is a significant increase in investment and adoption by enterprises. A Narrative Science survey found last year that 38% of enterprises are already using AI, growing to 62% by 2018. Forrester Research predicted a greater than 300% increase in investment in artificial intelligence in 2017 compared with 2016. IDC estimated that the AI market will grow from $8 billion in 2016 to more than $47 billion in 2020.

Coined in 1955 to describe a new computer science sub-discipline, “Artificial Intelligence” today includes a variety of technologies and tools, some time-tested, others relatively new. To help make sense of what’s hot and what’s not, Forrester just published a TechRadar report on Artificial Intelligence (for application development professionals), a detailed analysis of 13 technologies enterprises should consider adopting to support human decision-making.

Based on Forrester’s analysis, here’s my list of the 10 hottest AI technologies:

  1. Natural Language Generation: Producing text from computer data. Currently used in customer service, report generation, and summarizing business intelligence insights. Sample vendors:
    • Attivio,
    • Automated Insights,
    • Cambridge Semantics,
    • Digital Reasoning,
    • Lucidworks,
    • Narrative Science,
    • SAS,
    • Yseop.
  2. Speech Recognition: Transcribe and transform human speech into format useful for computer applications. Currently used in interactive voice response systems and mobile applications. Sample vendors:
    • NICE,
    • Nuance Communications,
    • OpenText,
    • Verint Systems.
  3. Virtual Agents: “The current darling of the media,” says Forrester (I believe they refer to my evolving relationships with Alexa), from simple chatbots to advanced systems that can network with humans. Currently used in customer service and support and as a smart home manager. Sample vendors:
    • Amazon,
    • Apple,
    • Artificial Solutions,
    • Assist AI,
    • Creative Virtual,
    • Google,
    • IBM,
    • IPsoft,
    • Microsoft,
    • Satisfi.
  4. Machine Learning Platforms: Providing algorithms, APIs, development and training toolkits, data, as well as computing power to design, train, and deploy models into applications, processes, and other machines. Currently used in a wide range of enterprise applications, mostly `involving prediction or classification. Sample vendors:
    • Amazon,
    • Fractal Analytics,
    • Google,
    • Microsoft,
    • SAS,
    • Skytree.
  5. AI-optimized Hardware: Graphics processing units (GPU) and appliances specifically designed and architected to efficiently run AI-oriented computational jobs. Currently primarily making a difference in deep learning applications. Sample vendors:
    • Alluviate,
    • Cray,
    • Google,
    • IBM,
    • Intel,
    • Nvidia.
  6. Decision Management: Engines that insert rules and logic into AI systems and used for initial setup/training and ongoing maintenance and tuning. A mature technology, it is used in a wide variety of enterprise applications, assisting in or performing automated decision-making. Sample vendors:
    • Advanced Systems Concepts,
    • Informatica,
    • Maana,
    • Pegasystems,
    • UiPath.
  7. Deep Learning Platforms: A special type of machine learning consisting of artificial neural networks with multiple abstraction layers. Currently primarily used in pattern recognition and classification applications supported by very large data sets. Sample vendors:
    • Deep Instinct,
    • Ersatz Labs,
    • Fluid AI,
    • MathWorks,
    • Peltarion,
    • Saffron Technology,
    • Sentient Technologies.
  8. Biometrics: Enable more natural interactions between humans and machines, including but not limited to image and touch recognition, speech, and body language. Currently used primarily in market research. Sample vendors:
    • 3VR,
    • Affectiva,
    • Agnitio,
    • FaceFirst,
    • Sensory,
    • Synqera,
    • Tahzoo.
  9. Robotic Process Automation: Using scripts and other methods to automate human action to support efficient business processes. Currently used where it’s too expensive or inefficient for humans to execute a task or a process. Sample vendors:
    • Advanced Systems Concepts,
    • Automation Anywhere,
    • Blue Prism,
    • UiPath,
    • WorkFusion.
  10. Text Analytics and NLP: Natural language processing (NLP) uses and supports text analytics by facilitating the understanding of sentence structure and meaning, sentiment, and intent through statistical and machine learning methods. Currently used in fraud detection and security, a wide range of automated assistants, and applications for mining unstructured data. Sample vendors:
    • Basis Technology,
    • Coveo,
    • Expert System,
    • Indico,
    • Knime,
    • Lexalytics,
    • Linguamatics,
    • Mindbreeze,
    • Sinequa,
    • Stratifyd,
    • Synapsify.

There are certainly many business benefits gained from AI technologies today, but according to a survey Forrester conducted last year, there are also obstacles to AI adoption as expressed by companies with no plans of investing in AI:

There is no defined business case 42%
Not clear what AI can be used for 39%
Don’t have the required skills 33%
Need first to invest in modernizing data mgt platform 29%
Don’t have the budget 23%
Not certain what is needed for implementing an AI system 19%
AI systems are not proven 14%
Do not have the right processes or governance 13%
AI is a lot of hype with little substance 11%
Don’t own or have access to the required data 8%
Not sure what AI means 3%
Once enterprises overcome these obstacles, Forrester concludes, they stand to gain from AI driving accelerated transformation in customer-facing applications and developing an interconnected web of enterprise intelligence.

Follow me on Twitter @GilPress or Facebook or Google+

Artificial Intelligence in the UK: Landscape and learnings from 226 startups

By Hugo Angel,

Figure 1 (above): The landscape of early stage UK AI companies.
With every paradigm shift in technology, waves of innovation follow as companies improve and then reimagine processes. Today we are in the early stages of the global artificial intelligence (AI) revolution. Machine learning algorithms, whose results improve with experience, enable us to find patterns in large data sets and make predictions more effectively — about people, equipment, systems and processes. (For an accessible introduction to AI, read our Primer.) But what are the dynamics of AI entrepreneurship in the UK?
We’ve mapped 226 independent, early stage AI software companies based in the UK and met with over 40 of these companies in recent weeks. Below, we share six powerful dynamics we see that are shaping the UK AI market— from changing activity levels and areas of focus, to trends in monetisation and the size and staging of investment.
Interested in more venture capital insights? Sign up for our blog posts.
AI entrepreneur? Get in touch.
The UK AI landscape: 226 companies and counting
Over time, we expect the distinction between ‘AI’ companies and other software providers to blur and then disappear, as machine learning is employed to tackle a wide variety of business processes and sectors. Today, however, it is possible to point to a sub-set of early stage software companies defined by their focus on AI.
We’ve researched 226 early stage AI companies in the UK and met with 40 of them. We’ve developed a map (Figure 1, above) to place the 226 according to their:
  • Purpose: Is the company focused on improving a business function (for example, marketing or human resources) or a sector (healthcare, education, agriculture)? Or does the company develop an AI technology with cross-domain application?
  • Customer Type: Does the company predominantly sell to other businesses (‘B2B’) or to consumers (‘B2C’)?
  • Funding: How much funding has the company received to date? We bracket this from ‘angel’ investment (under $500,000) through to ‘growth’ capital ($8m to ~$100m).
We’ll update our map regularly. We apologise if we’ve omitted or mis-classified your company; we’re aware that many early stage companies may be using, but not presenting, extensive AI. Please get in touch with additions or corrections.
After analysing the market and meeting with 40 companies in recent weeks, we highlight the following six dynamics in the market:
1. A focus on AI for business functions
Most early stage UK AI companies — five in every six — are applying machine learning to challenges in specific business functions or sectors (Figure 2, below). Reflecting the nascent stage of the field, however, one in six is developing an AI technology — a capability, platform or set of algorithms — applicable across multiple domains. These companies’ activities range from the development of computer vision solutions to the creation of algorithms for autonomous decision-making.
To whom are AI companies selling? Nine out of 10 AI companies are predominantly ‘B2B’, developing and selling solutions to other businesses (Figure 3, below). Just one in 10 sells directly to consumers (‘B2C’).
Add caption
A ‘cold start’ challenge around data is inhibiting the number of new B2C AI companies. Training machine learning algorithms usually requires large volumes of data. While B2B companies can analyse the varied and extensive data sets of the businesses they serve, in the absence of public or permissioned (e.g. Facebook profile) data, customer-facing companies usually begin without large volumes of consumer data to analyse. Typically, therefore, they deploy machine learning over time as their user bases and data sets grow. Gousto, for example, is an MMC portfolio company that delivers recipes and associated ingredients to consumers for them to cook at home. Today, Gousto’s team of machine learning PhDs, data analysts and engineers leverage AI for warehouse automation and menu design. Since its inception Gousto has had a vision for the use of AI, but the Company has achieved its vision over time.
Given the ‘cold start’ challenge, the reality is that most consumers will first experience machine learning via the world’s most popular consumer applications — Facebook, Google, Amazon, Netflix, Pinterest and others — that leverage vast data sets and machine learning teams to deliver facial recognition, search and entertainment recommendations, translation capabilities and more.
2. AI entrepreneurship is unevenly spread
A heat map highlights areas of early stage activity, as measured by the number of companies in each segment (Figure 4, below).
Figure 4: A heat map of early stage AI companies in the UK
Activity is greatest within:
  • The Marketing & Advertising,
  • Information Technology, and
  • Business Intelligence & Analytics functions; and
  • The Finance sector.

Activity is extensive within:

  • The Human Resources function; and
  • The Infrastructure,
  • Healthcare and
  • Retail sectors.
The sectors above are well suited to the application of AI, explaining the concentration of AI activity within them. The opportunity for value creation within each segment is demonstrable as well as significant. In marketing and in finance, for example, improvements in campaign conversion and financial performance against a benchmark are readily quantifiable. All offer numerous prediction and optimisation challenges well suited to the application of machine learning. All offer large data sets for training and deployment. A path to better-than-human performance is technically achievable. The alternatives are impractical or expensive. And all are specialised verticals, distant from the competitive threat posed by the consumer- and horizontal focus of the AI platform providers Google, Amazon, Microsoft and IBM — with the exception of healthcare where Google and IBM both enable and challenge.

As attractive market fundamentals catalyse activity, the strongest AI companies can develop a competitive moat by:

  • bringing deep domain expertise to bear in a complex domain;
  • developing proprietary algorithms;
  • creating a network effect around data by leveraging non-public data sets; and
  • by securing adequate capital to build a high quality machine learning team and go-to-market resources.

Activity in Marketing & Advertising dominates; one in five early stage UK AI companies target this function. The fundamentals of modern marketing and advertising represent a sweet-spot for AI. Consumers have billions of touch points with websites and apps, providing a rich seam of available, but complex, data. Further, almost every stage of the marketing and advertising value chain is ripe for optimisation and automation — including

  • content processing,
  • consumer segmentation,
  • consumer targeting,
  • programmatic advertising optimisation,
  • purchase discovery for consumers and
  • analysis of consumer sentiment.
Areas of lower activity

In a number of areas, activity appears modest relative to market opportunities. In the Manufacturing sector, for example, there are few startups to address a substantial need.

  • Machine learning has the potential to unlock 20% more production capacity through predictive, optimised maintenance of machines.
  • Raw material costs and re-working can be reduced through improved analysis of product quality data.
  • Further, ‘buffering’ — storing raw materials and part-developed products to compensate for unforeseen inefficiencies during production — can be reduced by up to 30% given more predictable production capacity.
  • The proliferation of sensors in the manufacturing industry, including sensor data from the production line, machine tool parameters and environmental data, has also increased significantly the data available for machine learning.

Within the Compliance & Fraud function, there appear few startups capitalising on banks’ ballooning expenditure on compliance.

  • 30,000 people in Citi — 12% of the Bank’s workforce — now work in compliance. In its 1Q15 conference call, Citi highlighted that over 50% of the $3.4B it saved through efficiency initiatives was being consumed by additional investments in regulation and compliance. The dynamics are similar among Citi’s peers.
  • JP Morgan increased compliance spend 50% between 2011 and 2015, to $9B, while
  • Goldman Sachs highlighted that its 11% increase in headcount in the last four years has largely been to meet regulatory compliance needs.

Our discussions with banks highlight particular focus on ‘Know Your Customer’ (KYC) and Anti- Money Laundering (AML) initiatives. Beyond presenting an extensive need, the sector offers large data sets for training, expensive human alternatives and, in some areas at least, an evident ability for machine learning to deliver better-than-human performance given the pragmatic impossibility of humans monitoring the data deluge. There may be few UK compliance companies given in-house efforts by the banks, concern regarding potential client concentration, or competition from US startups — but opportunities appear considerable.

3. AI entrepreneurship has doubled
The number of AI companies founded annually in the UK (Figure 5, below) has doubled in recent years (2014–2016) compared with the prior period (2011–2013). Over 60% of all UK AI companies were founded in the last 36 months. During this period, a new AI company has been founded in the UK on almost a weekly basis.
Entrepreneurship in AI is being fuelled by the broader coming of age of AI as well as factors specific to early stage entrepreneurship.

Regarding AI activity generally,

  • seeds planted during the last 20 years of AI research are bearing fruit today.
  • New algorithms, particularly convolutional and recurrent neural networks, are delivering more effective results.
  • A logarithmic increase in the availability of training data has made it possible to tune machine learning algorithms to deliver accurate predictions.
  • Development of graphical processing units (GPUs) has decreased the time required to train a neural network by 5x-10x. And
  • a six-fold increase in public awareness of AI during the last five years has increased buyers’ interest in the technology.

Additional factors are fuelling an increase in new AI startups.

  • Venture capital funding of AI companies has increased seven-fold in five years as investors see promise in the sector.
  • The provision of AI infrastructure and services from industry cloud providers (Google, Amazon, Microsoft and IBM) is reducing the difficulty and cost of deploying machine learning solutions. And
  • the growth of open source AI software — particularly TensorFlow, a library of components for machine learning — has reduced barriers to involvement.

Subject to continued venture capital funding, we expect high levels of AI entrepreneurship in the UK to continue.

Where are new AI companies focusing? 
The HR business function and Finance sector have the highest proportion of new AI companies (Figure 6, below). Two thirds of AI HR and Finance companies are less than two years old.
Recent activity in HR stems from a paradigm shift taking place within the function. HR is evolving from an administrative system of record to a predictive driver of growth and efficiency. Business owners are seeking to leverage previously under-utilised data sets to drive utility — ranging from competency-based recruitment to predictive modelling of employee churn.
It is unsurprising that in the business intelligence, security and compliance functions, and in the retail and infrastructure sectors, the proportion of new AI companies is lower. With large data sets ripe for machine learning, these sectors were the first to attract AI entrepreneurs.
4. A nascent sector relative to global peers
The UK AI sector is at a nascent stage in its development relative to global peers, presenting both opportunities and challenges.
Today, three quarters of UK AI companies are at the earliest stages of their journey, with ‘seed’ or ‘angel’ funding, compared with half of US peers (Figure 7, below). At the other end of the spectrum, just one in 10 UK AI companies is in the late, ‘growth capital’ stage compared with one in five in the US. In 2015, the last full year for which data are available, almost all capital infusions into UK AI companies were at the angel, seed or Series A stage — while among global AI peers a third received later-stage funding (Figure 8, below).

This dynamic presents both opportunities and risks. A vibrant startup scene presents unrivalled opportunities for entrepreneurs, employees and investors in early stage companies. At the same time, more developed and richly funded overseas competitors may increase competitive pressures on UK companies. This effect may be exacerbated by the high proportion of AI companies that sell to enterprises, many of which source providers globally. The UK maintains valuable assets for AI research, including a quarter of the world’s top 25 universities and a growing ecosystem of AI executives and investors following the acquisitions of

5. The journey to monetisation can be longer

Over 40% of the AI companies we meet are yet to generate revenue (Figure 9, below). This is not an artefact of us meeting ‘early stage’ companies; the median profile of a company we meet is one 

  • founded 2–3 years ago that has 
  • raised £1.3m, has 
  • a team of nine and is 
  • spending £76,000 per month.
The idea that most AI companies — applied AI companies, at least — plan to be acquired pre-revenue instead of selling software and services is a myth. All the companies we met were implementing or developing monetisation plans. Why, then, are some AI companies taking longer to monetise or scale than is usual for early stage companies? We see four reasons:
  • The bar to a minimum viable product (MVP) in this technically challenging field can be higher, requiring longer development periods. 90% of AI companies are B2B. The long sales cycles typical in B2B sales are exacerbated by many AI companies’ focus on sectors, such as finance, with sprawling and sensitive data sets.
  • Deployment periods can be lengthy given
    * extensive per-client data integration,
    * data cleansing and
    * customisation requirements.
    Half the AI companies we meet have a pure software-as-a-service model; as many monetise significant client integration and customisation work in the form of project revenue (Figure 10, above).
  • The limited number of personnel available for implementation in early stage companies is inhibiting many AI companies’ growth. In a sentiment echoed by several companies, one told us “we couldn’t implement more sales even if we had them.” One third of many teams are engaged in deployment support.
  • Exacerbated by cash burn rates increased by the high cost of machine learning talent, a longer path to monetisation can pose a challenge to AI companies.

We recommend that AI companies raise sufficient capital to last them through this period of risk, to go-to-market and beyond.

6. Investments are larger and staging is atypical

Globally at least, investments into AI firms are typically 20% to 60% larger than average capital infusions (Figure 11, below, shows 2015 data). This reflects company fundamentals and dynamics in the supply and demand of capital. AI companies’ capital requirements can be higher given

  • longer development periods prior to product viability,
  • the high cost of machine learning talent and
  • the larger teams required for complex deployments.

Beyond these fundamentals, however, capital infusions are being inflated by

  • extensive supply (many venture capitalists seek opportunities to invest in artificial intelligence companies) and
  • limited demand (there are relatively few AI companies in which to invest).

Venture capital investment in early stage AI companies has increased seven-fold in five years, while the number of investable prospects remains limited.

Further, in the UK a sizeable minority of companies jump from a seed rounds to a much larger raise than is typical for a subsequent round (Figure 12, below). 1 in 3 UK AI companies that raised more than $8m in a funding round raised less than $1m previously. As above, this dynamic is driven in part by AI companies’ capital requirements, but as much by the limited number of attractive investment opportunities in AI. Companies’ valuation expectations, meanwhile, are being supported by ‘acquihire’ offers for nascent teams.
Conclusion: An inflection point in UK AI
The last 36 months have marked an inflection point in early stage UK AI. Entrepreneurship has doubled, as AI technology comes of age and investment has increased. Yet, companies are early in their development relative to global peers, offering entrepreneurs and employees unprecedented opportunity and challenge. Three quarters of UK AI companies are at the earliest stages of their journey and activity remains uneven. Startups have concentrated on readily addressable business functions, where data sets are plentiful and optimisation challenges are pronounced. Today, business processes are being optimised. In the future, they will be re-imagined. Within the last 24 months, additional functions and sectors are starting to be tackled by AI entrepreneurs. The path to monetisation for today’s AI companies can be longer, but effective entrepreneurs are taking advantage of attractive capital dynamics to raise sufficient sums of money earlier in their journey.
As the AI revolution continues, the distinction between ‘AI companies’ and other software providers will further blur. Today, however, we are pleased to highlight the dynamics of a group of companies delivering remarkable benefits. Together, they are shaping the ‘fourth industrial revolution’.
By David Kelnar. Investment Director & Head of Research at MMC Ventures. 2x CEO/CFO. Love tech, venture capital, trends and triathlon.
Dec 21, 2016

The Great A.I.Awakening

By Hugo Angel,

How Google used artificial intelligence to transform GoogleTranslate, one of its more popular services — and howmachine learning is poised to reinvent computing itself.
Credit Illustration by Pablo Delcan

Prologue: You Are What You Have Read

Late one Friday night in early November, Jun Rekimoto, a distinguished professor of human-computer interaction at the University of Tokyo, was online preparing for a lecture when he began to notice some peculiar posts rolling in on social media. Apparently Google Translate, the company’s popular machine-translation service, had suddenly and almost immeasurably improved. Rekimoto visited Translate himself and began to experiment with it. He was astonished. He had to go to sleep, but Translate refused to relax its grip on his imagination.

Rekimoto wrote up his initial findings in a blog post.
First, he compared a few sentences from two published versions of “The Great Gatsby,Takashi Nozaki’s 1957 translation and Haruki Murakami’s more recent iteration, with what this new Google Translate was able to produce. Murakami’s translation is written “in very polished Japanese,” Rekimoto explained to me later via email, but the prose is distinctively “Murakami-style.” By contrast, Google’s translation — despite some “small unnaturalness” — reads to him as “more transparent.”
The second half of Rekimoto’s post examined the service in the other direction, from Japanese to English. He dashed off his own Japanese interpretation of the opening to Hemingway’s “The Snows of Kilimanjaro,” then ran that passage back through Google into English. He published this version alongside Hemingway’s original, and proceeded to invite his readers to guess which was the work of a machine.
NO. 1:
Kilimanjaro is a snow-covered mountain 19,710 feet high, and is said to be the highest mountain in Africa. Its western summit is called the Masai “Ngaje Ngai,” the House of God. Close to the western summit there is the dried and frozen carcass of a leopard. No one has explained what the leopard was seeking at that altitude.
NO. 2:
Kilimanjaro is a mountain of 19,710 feet covered with snow and is said to be the highest mountain in Africa. The summit of the west is called “Ngaje Ngai” in Masai, the house of God. Near the top of the west there is a dry and frozen dead body of leopard. No one has ever explained what leopard wanted at that altitude.
Even to a native English speaker, the missing article on the leopard is the only real giveaway that No. 2 was the output of an automaton. Their closeness was a source of wonder to Rekimoto, who was well acquainted with the capabilities of the previous service. Only 24 hours earlier, Google would have translated the same Japanese passage as follows:
Kilimanjaro is 19,710 feet of the mountain covered with snow, and it is said that the highest mountain in Africa. Top of the west, “Ngaje Ngai” in the Maasai language, has been referred to as the house of God. The top close to the west, there is a dry, frozen carcass of a leopard. Whether the leopard had what the demand at that altitude, there is no that nobody explained.
Rekimoto promoted his discovery to his hundred thousand or so followers on Twitter, and over the next few hours thousands of people broadcast their own experiments with the machine-translation service. Some were successful, others meant mostly for comic effect. As dawn broke over Tokyo, Google Translate was the No. 1 trend on Japanese Twitter, just above some cult anime series and the long-awaited new single from a girl-idol supergroup. Everybody wondered: How had Google Translate become so uncannily artful?
Four days later, a couple of hundred journalists, entrepreneurs and advertisers from all over the world gathered in Google’s London engineering office for a special announcement. Guests were greeted with Translate-branded fortune cookies. Their paper slips had a foreign phrase on one side — mine was in Norwegian — and on the other, an invitation to download the Translate app. Tables were set with trays of doughnuts and smoothies, each labeled with a placard that advertised its flavor in German (zitrone), Portuguese (baunilha) or Spanish (manzana). After a while, everyone was ushered into a plush, dark theater.

Sundar Pichai, chief executive of Google, outside his office in Mountain View, Calif. CreditBrian Finke for The New York Times
Sadiq Khan, the mayor of London, stood to make a few opening remarks. A friend, he began, had recently told him he reminded him of Google. “Why, because I know all the answers?” the mayor asked. “No,” the friend replied, “because you’re always trying to finish my sentences.” The crowd tittered politely. Khan concluded by introducing Google’s chief executive, Sundar Pichai, who took the stage.
Pichai was in London in part to inaugurate Google’s new building there, the cornerstone of a new “knowledge quarter” under construction at King’s Cross, and in part to unveil the completion of the initial phase of a company transformation he announced last year. The Google of the future, Pichai had said on several occasions, was going to be “A.I. first.What that meant in theory was complicated and had welcomed much speculation. What it meant in practice, with any luck, was that soon the company’s products would no longer represent the fruits of traditional computer programming, exactly, but “machine learning.
A rarefied department within the company, Google Brain, was founded five years ago on this very principle: that artificial “neural networks” that acquaint themselves with the world via trial and error, as toddlers do, might in turn develop something like human flexibility. This notion is not new — a version of it dates to the earliest stages of modern computing, in the 1940s — but for much of its history most computer scientists saw it as vaguely disreputable, even mystical. Since 2011, though, Google Brain has demonstrated that this approach to artificial intelligence could solve many problems that confounded decades of conventional efforts. Speech recognition didn’t work very well until Brain undertook an effort to revamp it; the application of machine learning made its performance on Google’s mobile platform, Android, almost as good as human transcription. The same was true of image recognition. Less than a year ago, Brain for the first time commenced with the gut renovation of an entire consumer product, and its momentous results were being celebrated tonight.
Translate made its debut in 2006 and since then has become one of Google’s most reliable and popular assets; it serves more than 500 million monthly users in need of 140 billion words per day in a different language. It exists not only as its own stand-alone app but also as an integrated feature within Gmail, Chrome and many other Google offerings, where we take it as a push-button given — a frictionless, natural part of our digital commerce. It was only with the refugee crisis, Pichai explained from the lectern, that the company came to reckon with Translate’s geopolitical importance: On the screen behind him appeared a graph whose steep curve indicated a recent fivefold increase in translations between Arabic and German. (It was also close to Pichai’s own heart. He grew up in India, a land divided by dozens of languages.) The team had been steadily adding new languages and features, but gains in quality over the last four years had slowed considerably.
Until today. As of the previous weekend, Translate had been converted to an A.I.-based system for much of its traffic, not just in the United States but in Europe and Asia as well: The rollout included translations between English and Spanish, French, Portuguese, German, Chinese, Japanese, Korean and Turkish. The rest of Translate’s hundred-odd languages were to come, with the aim of eight per month, by the end of next year. The new incarnation, to the pleasant surprise of Google’s own engineers, had been completed in only nine months. The A.I. system had demonstrated overnight improvements roughly equal to the total gains the old one had accrued over its entire lifetime.
Pichai has an affection for the obscure literary reference; he told me a month earlier, in his office in Mountain View, Calif., that Translate in part exists because not everyone can be like the physicist Robert Oppenheimer, who learned Sanskrit to read the Bhagavad Gita in the original. In London, the slide on the monitors behind him flicked to a Borges quote: “Uno no es lo que es por lo que escribe, sino por lo que ha leído.”
Grinning, Pichai read aloud an awkward English version of the sentence that had been rendered by the old Translate system: “One is not what is for what he writes, but for what he has read.”
To the right of that was a new A.I.-rendered version: “You are not what you write, but what you have read.”
It was a fitting remark: The new Google Translate was run on the first machines that had, in a sense, ever learned to read anything at all.
Google’s decision to reorganize itself around A.I. was the first major manifestation of what has become an industrywide machine-learning delirium. Over the past four years, six companies in particular — Google, Facebook, Apple, Amazon, Microsoft and the Chinese firm Baidu — have touched off an arms race for A.I. talent, particularly within universities. Corporate promises of resources and freedom have thinned out top academic departments. It has become widely known in Silicon Valley that Mark Zuckerberg, chief executive of Facebook, personally oversees, with phone calls and video-chat blandishments, his company’s overtures to the most desirable graduate students. Starting salaries of seven figures are not unheard-of. Attendance at the field’s most important academic conference has nearly quadrupled. What is at stake is not just one more piecemeal innovation but control over what very well could represent an entirely new computational platform: pervasive, ambient artificial intelligence.
What is at stake is not just one more piecemeal innovation but control over what very well could represent an entirely new computational platform. 
The phrase “artificial intelligence” is invoked as if its meaning were self-evident, but it has always been a source of confusion and controversy. Imagine if you went back to the 1970s, stopped someone on the street, pulled out a smartphone and showed her Google Maps. Once you managed to convince her you weren’t some oddly dressed wizard, and that what you withdrew from your pocket wasn’t a black-arts amulet but merely a tiny computer more powerful than that the one that guided Apollo missions, Google Maps would almost certainly seem to her a persuasive example of “artificial intelligence.” In a very real sense, it is. It can do things any map-literate human can manage, like get you from your hotel to the airport — though it can do so much more quickly and reliably. It can also do things that humans simply and obviously cannot: It can evaluate the traffic, plan the best route and reorient itself when you take the wrong exit.
Practically nobody today, however, would bestow upon Google Maps the honorific “A.I.,” so sentimental and sparing are we in our use of the word “intelligence.” Artificial intelligence, we believe, must be something that distinguishes HAL from whatever it is a loom or wheelbarrow can do. The minute we can automate a task, we downgrade the relevant skill involved to one of mere mechanism. Today Google Maps seems, in the pejorative sense of the term, robotic: It simply accepts an explicit demand (the need to get from one place to another) and tries to satisfy that demand as efficiently as possible. The goal posts for “artificial intelligence” are thus constantly receding.
When he has an opportunity to make careful distinctions, Pichai differentiates between the current applications of A.I. and the ultimate goal of “artificial general intelligence.” Artificial general intelligence will not involve dutiful adherence to explicit instructions, but instead will demonstrate a facility with the implicit, the interpretive. It will be a general tool, designed for general purposes in a general context. Pichai believes his company’s future depends on something like this. Imagine if you could tell Google Maps, “I’d like to go to the airport, but I need to stop off on the way to buy a present for my nephew.” A more generally intelligent version of that service — a ubiquitous assistant, of the sort that Scarlett Johansson memorably disembodied three years ago in the Spike Jonze film “Her”— would know all sorts of things that, say, a close friend or an earnest intern might know: your nephew’s age, and how much you ordinarily like to spend on gifts for children, and where to find an open store. But a truly intelligent Maps could also conceivably know all sorts of things a close friend wouldn’t, like what has only recently come into fashion among preschoolers in your nephew’s school — or more important, what its users actually want. If an intelligent machine were able to discern some intricate if murky regularity in data about what we have done in the past, it might be able to extrapolate about our subsequent desires, even if we don’t entirely know them ourselves.
The new wave of A.I.-enhanced assistants — Apple’s Siri, Facebook’s M, Amazon’s Echo — are all creatures of machine learning, built with similar intentions. The corporate dreams for machine learning, however, aren’t exhausted by the goal of consumer clairvoyance. A medical-imaging subsidiary of Samsung announced this year that its new ultrasound devices could detect breast cancer. Management consultants are falling all over themselves to prep executives for the widening industrial applications of computers that program themselves. DeepMind, a 2014 Google acquisition, defeated the reigning human grandmaster of the ancient board game Go, despite predictions that such an achievement would take another 10 years.
In a famous 1950 essay, Alan Turing proposed a test for an artificial general intelligence: a computer that could, over the course of five minutes of text exchange, successfully deceive a real human interlocutor. Once a machine can translate fluently between two natural languages, the foundation has been laid for a machine that might one day “understand” human language well enough to engage in plausible conversation. Google Brain’s members, who pushed and helped oversee the Translate project, believe that such a machine would be on its way to serving as a generally intelligent all-encompassing personal digital assistant.
What follows here is the story of how a team of Google researchers and engineers — at first one or two, then three or four, and finally more than a hundred — made considerable progress in that direction. It’s an uncommon story in many ways, not least of all because it defies many of the Silicon Valley stereotypes we’ve grown accustomed to. It does not feature people who think that everything will be unrecognizably different tomorrow or the next day because of some restless tinkerer in his garage. It is neither a story about people who think technology will solve all our problems nor one about people who think technology is ineluctably bound to create apocalyptic new ones. It is not about disruption, at least not in the way that word tends to be used.
It is, in fact, three overlapping stories that converge in Google Translate’s successful metamorphosis to A.I. — a technical story, an institutional story and a story about the evolution of ideas. The technical story is about one team on one product at one company, and the process by which they refined, tested and introduced a brand-new version of an old product in only about a quarter of the time anyone, themselves included, might reasonably have expected. The institutional story is about the employees of a small but influential artificial-intelligence group within that company, and the process by which their intuitive faith in some old, unproven and broadly unpalatable notions about computing upended every other company within a large radius. The story of ideas is about the cognitive scientists, psychologists and wayward engineers who long toiled in obscurity, and the process by which their ostensibly irrational convictions ultimately inspired a paradigm shift in our understanding not only of technology but also, in theory, of consciousness itself.
It’s an uncommon story in many ways, not least of all because it defies many of the Silicon Valley stereotypes we’ve grown accustomed to. 
The first story, the story of Google Translate, takes place in Mountain View over nine months, and it explains the transformation of machine translation. The second story, the story of Google Brain and its many competitors, takes place in Silicon Valley over five years, and it explains the transformation of that entire community. The third story, the story of deep learning, takes place in a variety of far-flung laboratories — in Scotland, Switzerland, Japan and most of all Canada — over seven decades, and it might very well contribute to the revision of our self-image as first and foremost beings who think.
All three are stories about artificial intelligence. The seven-decade story is about what we might conceivably expect or want from it. The five-year story is about what it might do in the near future. The nine-month story is about what it can do right this minute. These three stories are themselves just proof of concept. All of this is only the beginning.

Part I: Learning Machine
1. The Birth of Brain
Jeff Dean, though his title is senior fellow, is the de facto head of Google Brain. Dean is a sinewy, energy-efficient man with a long, narrow face, deep-set eyes and an earnest, soapbox-derby sort of enthusiasm. The son of a medical anthropologist and a public-health epidemiologist, Dean grew up all over the world — Minnesota, Hawaii, Boston, Arkansas, Geneva, Uganda, Somalia, Atlanta — and, while in high school and college, wrote software used by the World Health Organization. He has been with Google since 1999, as employee 25ish, and has had a hand in the core software systems beneath nearly every significant undertaking since then. A beloved artifact of company culture is Jeff Dean Facts, written in the style of the Chuck Norris Facts meme: “Jeff Dean’s PIN is the last four digits of pi.” “When Alexander Graham Bell invented the telephone, he saw a missed call from Jeff Dean.” “Jeff Dean got promoted to Level 11 in a system where the maximum level is 10.” (This last one is, in fact, true.)
The Google engineer and Google Brain leader Jeff Dean. CreditBrian Finke for The New York Times
One day in early 2011, Dean walked into one of the Google campus’s “microkitchens” — the “Googley” word for the shared break spaces on most floors of the Mountain View complex’s buildings — and ran into Andrew Ng, a young Stanford computer-science professor who was working for the company as a consultant. Ng told him about Project Marvin, an internal effort (named after the celebrated A.I. pioneer Marvin Minsky) he had recently helped establish to experiment with “neural networks,” pliant digital lattices based loosely on the architecture of the brain. Dean himself had worked on a primitive version of the technology as an undergraduate at the University of Minnesota in 1990, during one of the method’s brief windows of mainstream acceptability. Now, over the previous five years, the number of academics working on neural networks had begun to grow again, from a handful to a few dozen. Ng told Dean that Project Marvin, which was being underwritten by Google’s secretive X lab, had already achieved some promising results.
Dean was intrigued enough to lend his “20 percent” — the portion of work hours every Google employee is expected to contribute to programs outside his or her core job — to the project. Pretty soon, he suggested to Ng that they bring in another colleague with a neuroscience background, Greg Corrado. (In graduate school, Corrado was taught briefly about the technology, but strictly as a historical curiosity. “It was good I was paying attention in class that day,” he joked to me.) In late spring they brought in one of Ng’s best graduate students, Quoc Le, as the project’s first intern. By then, a number of the Google engineers had taken to referring to Project Marvin by another name: Google Brain.
Since the term “artificial intelligence” was first coined, at a kind of constitutional convention of the mind at Dartmouth in the summer of 1956, a majority of researchers have long thought the best approach to creating A.I. would be to write a very big, comprehensive program that laid out both the rules of logical reasoning and sufficient knowledge of the world. If you wanted to translate from English to Japanese, for example, you would program into the computer all of the grammatical rules of English, and then the entirety of definitions contained in the Oxford English Dictionary, and then all of the grammatical rules of Japanese, as well as all of the words in the Japanese dictionary, and only after all of that feed it a sentence in a source language and ask it to tabulate a corresponding sentence in the target language. You would give the machine a language map that was, as Borges would have had it, the size of the territory. This perspective is usually called “symbolic A.I.” — because its definition of cognition is based on symbolic logic — or, disparagingly, “good old-fashioned A.I.”
There are two main problems with the old-fashioned approach. The first is that it’s awfully time-consuming on the human end. The second is that it only really works in domains where rules and definitions are very clear: in mathematics, for example, or chess. Translation, however, is an example of a field where this approach fails horribly, because words cannot be reduced to their dictionary definitions, and because languages tend to have as many exceptions as they have rules. More often than not, a system like this is liable to translate “minister of agriculture” as “priest of farming.” Still, for math and chess it worked great, and the proponents of symbolic A.I. took it for granted that no activities signaled “general intelligence” better than math and chess.
An excerpt of a 1961 documentary emphasizing the longstanding premise of artificial-intelligence research: If you could program a computer to mimic higher-order cognitive tasks like math or chess, you were on a path that would eventually lead to something akin to consciousness. Video posted on YouTube by Roberto Pieraccini
There were, however, limits to what this system could do. In the 1980s, a robotics researcher at Carnegie Mellon pointed out that it was easy to get computers to do adult things but nearly impossible to get them to do things a 1-year-old could do, like hold a ball or identify a cat. By the 1990s, despite punishing advancements in computer chess, we still weren’t remotely close to artificial general intelligence.
There has always been another vision for A.I. — a dissenting view — in which the computers would learn from the ground up (from data) rather than from the top down (from rules). This notion dates to the early 1940s, when it occurred to researchers that the best model for flexible automated intelligence was the brain itself. A brain, after all, is just a bunch of widgets, called neurons, that either pass along an electrical charge to their neighbors or don’t. What’s important are less the individual neurons themselves than the manifold connections among them. This structure, in its simplicity, has afforded the brain a wealth of adaptive advantages. The brain can operate in circumstances in which information is poor or missing; it can withstand significant damage without total loss of control; it can store a huge amount of knowledge in a very efficient way; it can isolate distinct patterns but retain the messiness necessary to handle ambiguity.
There was no reason you couldn’t try to mimic this structure in electronic form, and in 1943 it was shown that arrangements of simple artificial neurons could carry out basic logical functions. They could also, at least in theory, learn the way we do. With life experience, depending on a particular person’s trials and errors, the synaptic connections among pairs of neurons get stronger or weaker. An artificial neural network could do something similar, by gradually altering, on a guided trial-and-error basis, the numerical relationships among artificial neurons. It wouldn’t need to be preprogrammed with fixed rules. It would, instead, rewire itself to reflect patterns in the data it absorbed.
This attitude toward artificial intelligence was evolutionary rather than creationist. If you wanted a flexible mechanism, you wanted one that could adapt to its environment. If you wanted something that could adapt, you didn’t want to begin with the indoctrination of the rules of chess. You wanted to begin with very basic abilities — sensory perception and motor control — in the hope that advanced skills would emerge organically. Humans don’t learn to understand language by memorizing dictionaries and grammar books, so why should we possibly expect our computers to do so?
Google Brain was the first major commercial institution to invest in the possibilities embodied by this way of thinking about A.I. Dean, Corrado and Ng began their work as a part-time, collaborative experiment, but they made immediate progress. They took architectural inspiration for their models from recent theoretical outlines — as well as ideas that had been on the shelf since the 1980s and 1990s — and drew upon both the company’s peerless reserves of data and its massive computing infrastructure. They instructed the networks on enormous banks of “labeled” data — speech files with correct transcriptions, for example — and the computers improved their responses to better match reality.
“The portion of evolution in which animals developed eyes was a big development,” Dean told me one day, with customary understatement. We were sitting, as usual, in a whiteboarded meeting room, on which he had drawn a crowded, snaking timeline of Google Brain and its relation to inflection points in the recent history of neural networks. “Now computers have eyes. We can build them around the capabilities that now exist to understand photos. Robots will be drastically transformed. They’ll be able to operate in an unknown environment, on much different problems.” These capacities they were building may have seemed primitive, but their implications were profound.

Geoffrey Hinton, whose ideas helped lay the foundation for the neural-network approach to Google Translate, at Google’s offices in Toronto. CreditBrian Finke for The New York Times
2. The Unlikely Intern
In its first year or so of existence, Brain’s experiments in the development of a machine with the talents of a 1-year-old had, as Dean said, worked to great effect. Its speech-recognition team swapped out part of their old system for a neural network and encountered, in pretty much one fell swoop, the best quality improvements anyone had seen in 20 years. Their system’s object-recognition abilities improved by an order of magnitude. This was not because Brain’s personnel had generated a sheaf of outrageous new ideas in just a year. It was because Google had finally devoted the resources — in computers and, increasingly, personnel — to fill in outlines that had been around for a long time.
A great preponderance of these extant and neglected notions had been proposed or refined by a peripatetic English polymath named Geoffrey Hinton. In the second year of Brain’s existence, Hinton was recruited to Brain as Andrew Ng left. (Ng now leads the 1,300-person A.I. team at Baidu.) Hinton wanted to leave his post at the University of Toronto for only three months, so for arcane contractual reasons he had to be hired as an intern. At intern training, the orientation leader would say something like, “Type in your LDAP” — a user login — and he would flag a helper to ask, “What’s an LDAP?” All the smart 25-year-olds in attendance, who had only ever known deep learning as the sine qua non of artificial intelligence, snickered: “Who is that old guy? Why doesn’t he get it?”
“At lunchtime,” Hinton said, “someone in the queue yelled: ‘Professor Hinton! I took your course! What are you doing here?’ After that, it was all right.”
A few months later, Hinton and two of his students demonstrated truly astonishing gains in a big image-recognition contest, run by an open-source collective called ImageNet, that asks computers not only to identify a monkey but also to distinguish between spider monkeys and howler monkeys, and among God knows how many different breeds of cat. Google soon approached Hinton and his students with an offer. They accepted. “I thought they were interested in our I.P.,” he said. “Turns out they were interested in us.”
Hinton comes from one of those old British families emblazoned like the Darwins at eccentric angles across the intellectual landscape, where regardless of titular preoccupation a person is expected to make sideline contributions to minor problems in astronomy or fluid dynamics. His great-great-grandfather was George Boole, whose foundational work in symbolic logic underpins the computer; another great-great-grandfather was a celebrated surgeon, his father a venturesome entomologist, his father’s cousin a Los Alamos researcher; the list goes on. He trained at Cambridge and Edinburgh, then taught at Carnegie Mellon before he ended up at Toronto, where he still spends half his time. (His work has long been supported by the largess of the Canadian government.) I visited him in his office at Google there. He has tousled yellowed-pewter hair combed forward in a mature Noel Gallagher style and wore a baggy striped dress shirt that persisted in coming untucked, and oval eyeglasses that slid down to the tip of a prominent nose. He speaks with a driving if shambolic wit, and says things like, “Computers will understand sarcasm before Americans do.”
Hinton had been working on neural networks since his undergraduate days at Cambridge in the late 1960s, and he is seen as the intellectual primogenitor of the contemporary field. For most of that time, whenever he spoke about machine learning, people looked at him as though he were talking about the Ptolemaic spheres or bloodletting by leeches. Neural networks were taken as a disproven folly, largely on the basis of one overhyped project: the Perceptron, an artificial neural network that Frank Rosenblatt, a Cornell psychologist, developed in the late 1950s. The New York Times reported that the machine’s sponsor, the United States Navy, expected it would “be able to walk, talk, see, write, reproduce itself and be conscious of its existence.” It went on to do approximately none of those things. Marvin Minsky, the dean of artificial intelligence in America, had worked on neural networks for his 1954 Princeton thesis, but he’d since grown tired of the inflated claims that Rosenblatt — who was a contemporary at Bronx Science — made for the neural paradigm. (He was also competing for Defense Department funding.) Along with an M.I.T. colleague, Minsky published a book that proved that there were painfully simple problems the Perceptron could never solve.
Minsky’s criticism of the Perceptron extended only to networks of one “layer,” i.e., one layer of artificial neurons between what’s fed to the machine and what you expect from it — and later in life, he expounded ideas very similar to contemporary deep learning. But Hinton already knew at the time that complex tasks could be carried out if you had recourse to multiple layers. The simplest description of a neural network is that it’s a machine that makes classifications or predictions based on its ability to discover patterns in data. With one layer, you could find only simple patterns; with more than one, you could look for patterns of patterns. Take the case of image recognition, which tends to rely on a contraption called a “convolutional neural net.” (These were elaborated in a seminal 1998 paper whose lead author, a Frenchman named Yann LeCun, did his postdoctoral research in Toronto under Hinton and now directs a huge A.I. endeavor at Facebook.) The first layer of the network learns to identify the very basic visual trope of an “edge,” meaning a nothing (an off-pixel) followed by a something (an on-pixel) or vice versa. Each successive layer of the network looks for a pattern in the previous layer. A pattern of edges might be a circle or a rectangle. A pattern of circles or rectangles might be a face. And so on. This more or less parallels the way information is put together in increasingly abstract ways as it travels from the photoreceptors in the retina back and up through the visual cortex. At each conceptual step, detail that isn’t immediately relevant is thrown away. If several edges and circles come together to make a face, you don’t care exactly where the face is found in the visual field; you just care that it’s a face.
A demonstration from 1993 showing an early version of the researcher Yann LeCun’s convolutional neural network, which by the late 1990s was processing 10 to 20 percent of all checks in the United States. A similar technology now drives most state-of-the-art image-recognition systems. Video posted on YouTube by Yann LeCun
The issue with multilayered, “deep” neural networks was that the trial-and-error part got extraordinarily complicated. In a single layer, it’s easy. Imagine that you’re playing with a child. You tell the child, “Pick up the green ball and put it into Box A.” The child picks up a green ball and puts it into Box B. You say, “Try again to put the green ball in Box A.” The child tries Box A. Bravo.
Now imagine you tell the child, “Pick up a green ball, go through the door marked 3 and put the green ball into Box A.” The child takes a red ball, goes through the door marked 2 and puts the red ball into Box B. How do you begin to correct the child? You cannot just repeat your initial instructions, because the child does not know at which point he went wrong. In real life, you might start by holding up the red ball and the green ball and saying, “Red ball, green ball.” The whole point of machine learning, however, is to avoid that kind of explicit mentoring. Hinton and a few others went on to invent a solution (or rather, reinvent an older one) to this layered-error problem, over the halting course of the late 1970s and 1980s, and interest among computer scientists in neural networks was briefly revived. “People got very excited about it,” he said. “But we oversold it.” Computer scientists quickly went back to thinking that people like Hinton were weirdos and mystics.
These ideas remained popular, however, among philosophers and psychologists, who called it “connectionism” or “parallel distributed processing.” “This idea,” Hinton told me, “of a few people keeping a torch burning, it’s a nice myth. It was true within artificial intelligence. But within psychology lots of people believed in the approach but just couldn’t do it.” Neither could Hinton, despite the generosity of the Canadian government. “There just wasn’t enough computer power or enough data. People on our side kept saying, ‘Yeah, but if I had a really big one, it would work.’ It wasn’t a very persuasive argument.”
‘The portion of evolution in which animals developed eyes was a big development. Now computers have eyes.’ 
3. A Deep Explanation of Deep Learning
When Pichai said that Google would henceforth be “A.I. first,” he was not just making a claim about his company’s business strategy; he was throwing in his company’s lot with this long-unworkable idea. Pichai’s allocation of resources ensured that people like Dean could ensure that people like Hinton would have, at long last, enough computers and enough data to make a persuasive argument. An average brain has something on the order of 100 billion neurons. Each neuron is connected to up to 10,000 other neurons, which means that the number of synapses is between 100 trillion and 1,000 trillion. For a simple artificial neural network of the sort proposed in the 1940s, the attempt to even try to replicate this was unimaginable. We’re still far from the construction of a network of that size, but Google Brain’s investment allowed for the creation of artificial neural networks comparable to the brains of mice.
To understand why scale is so important, however, you have to start to understand some of the more technical details of what, exactly, machine intelligences are doing with the data they consume. A lot of our ambient fears about A.I. rest on the idea that they’re just vacuuming up knowledge like a sociopathic prodigy in a library, and that an artificial intelligence constructed to make paper clips might someday decide to treat humans like ants or lettuce. This just isn’t how they work. All they’re doing is shuffling information around in search of commonalities — basic patterns, at first, and then more complex ones — and for the moment, at least, the greatest danger is that the information we’re feeding them is biased in the first place.
If that brief explanation seems sufficiently reassuring, the reassured nontechnical reader is invited to skip forward to the next section, which is about cats. If not, then read on. (This section is also, luckily, about cats.)
Imagine you want to program a cat-recognizer on the old symbolic-A.I. model. You stay up for days preloading the machine with an exhaustive, explicit definition of “cat.” You tell it that a cat has four legs and pointy ears and whiskers and a tail, and so on. All this information is stored in a special place in memory called Cat. Now you show it a picture. First, the machine has to separate out the various distinct elements of the image. Then it has to take these elements and apply the rules stored in its memory. If(legs=4) and if(ears=pointy) and if(whiskers=yes) and if(tail=yes) and if(expression=supercilious), then(cat=yes). But what if you showed this cat-recognizer a Scottish Fold, a heart-rending breed with a prized genetic defect that leads to droopy doubled-over ears? Our symbolic A.I. gets to (ears=pointy) and shakes its head solemnly, “Not cat.” It is hyperliteral, or “brittle.” Even the thickest toddler shows much greater inferential acuity.
Now imagine that instead of hard-wiring the machine with a set of rules for classification stored in one location of the computer’s memory, you try the same thing on a neural network. There is no special place that can hold the definition of “cat.” There is just a giant blob of interconnected switches, like forks in a path. On one side of the blob, you present the inputs (the pictures); on the other side, you present the corresponding outputs (the labels). Then you just tell it to work out for itself, via the individual calibration of all of these interconnected switches, whatever path the data should take so that the inputs are mapped to the correct outputs. The training is the process by which a labyrinthine series of elaborate tunnels are excavated through the blob, tunnels that connect any given input to its proper output. The more training data you have, the greater the number and intricacy of the tunnels that can be dug. Once the training is complete, the middle of the blob has enough tunnels that it can make reliable predictions about how to handle data it has never seen before. This is called “supervised learning.”
The reason that the network requires so many neurons and so much data is that it functions, in a way, like a sort of giant machine democracy. Imagine you want to train a computer to differentiate among five different items. Your network is made up of millions and millions of neuronal “voters,” each of whom has been given five different cards: one for cat, one for dog, one for spider monkey, one for spoon and one for defibrillator. You show your electorate a photo and ask, “Is this a cat, a dog, a spider monkey, a spoon or a defibrillator?” All the neurons that voted the same way collect in groups, and the network foreman peers down from above and identifies the majority classification: “A dog?”
You say: “No, maestro, it’s a cat. Try again.”
Now the network foreman goes back to identify which voters threw their weight behind “cat” and which didn’t. The ones that got “cat” right get their votes counted double next time — at least when they’re voting for “cat.” They have to prove independently whether they’re also good at picking out dogs and defibrillators, but one thing that makes a neural network so flexible is that each individual unit can contribute differently to different desired outcomes. What’s important is not the individual vote, exactly, but the pattern of votes. If Joe, Frank and Mary all vote together, it’s a dog; but if Joe, Kate and Jessica vote together, it’s a cat; and if Kate, Jessica and Frank vote together, it’s a defibrillator. The neural network just needs to register enough of a regularly discernible signal somewhere to say, “Odds are, this particular arrangement of pixels represents something these humans keep calling ‘cats.’ ” The more “voters” you have, and the more times you make them vote, the more keenly the network can register even very weak signals. If you have only Joe, Frank and Mary, you can maybe use them only to differentiate among a cat, a dog and a defibrillator. If you have millions of different voters that can associate in billions of different ways, you can learn to classify data with incredible granularity. Your trained voter assembly will be able to look at an unlabeled picture and identify it more or less accurately.
Part of the reason there was so much resistance to these ideas in computer-science departments is that because the output is just a prediction based on patterns of patterns, it’s not going to be perfect, and the machine will never be able to define for you what, exactly, a cat is. It just knows them when it sees them. This wooliness, however, is the point. The neuronal “voters” will recognize a happy cat dozing in the sun and an angry cat glaring out from the shadows of an untidy litter box, as long as they have been exposed to millions of diverse cat scenes. You just need lots and lots of the voters — in order to make sure that some part of your network picks up on even very weak regularities, on Scottish Folds with droopy ears, for example — and enough labeled data to make sure your network has seen the widest possible variance in phenomena.
It is important to note, however, that the fact that neural networks are probabilistic in nature means that they’re not suitable for all tasks. It’s no great tragedy if they mislabel 1 percent of cats as dogs, or send you to the wrong movie on occasion, but in something like a self-driving car we all want greater assurances. This isn’t the only caveat. Supervised learning is a trial-and-error process based on labeled data. The machines might be doing the learning, but there remains a strong human element in the initial categorization of the inputs. If your data had a picture of a man and a woman in suits that someone had labeled “woman with her boss,” that relationship would be encoded into all future pattern recognition. Labeled data is thus fallible the way that human labelers are fallible. If a machine was asked to identify creditworthy candidates for loans, it might use data like felony convictions, but if felony convictions were unfair in the first place — if they were based on, say, discriminatory drug laws — then the loan recommendations would perforce also be fallible.
Image-recognition networks like our cat-identifier are only one of many varieties of deep learning, but they are disproportionately invoked as teaching examples because each layer does something at least vaguely recognizable to humans — picking out edges first, then circles, then faces. This means there’s a safeguard against error. For instance, an early oddity in Google’s image-recognition software meant that it could not always identify a barbell in isolation, even though the team had trained it on an image set that included a lot of exercise categories. A visualization tool showed them the machine had learned not the concept of “dumbbell” but the concept of “dumbbell+arm,” because all the dumbbells in the training set were attached to arms. They threw into the training mix some photos of solo barbells. The problem was solved. Not everything is so easy.
Google Brain’s investment allowed for the creation of artificial neural networks comparable to the brains of mice. 
4. The Cat Paper
Over the course of its first year or two, Brain’s efforts to cultivate in machines the skills of a 1-year-old were auspicious enough that the team was graduated out of the X lab and into the broader research organization. (The head of Google X once noted that Brain had paid for the entirety of X’s costs.) They still had fewer than 10 people and only a vague sense for what might ultimately come of it all. But even then they were thinking ahead to what ought to happen next. First a human mind learns to recognize a ball and rests easily with the accomplishment for a moment, but sooner or later, it wants to ask for the ball. And then it wades into language.
The first step in that direction was the cat paper, which made Brain famous.
What the cat paper demonstrated was that a neural network with more than a billion “synaptic” connections — a hundred times larger than any publicized neural network to that point, yet still many orders of magnitude smaller than our brains — could observe raw, unlabeled data and pick out for itself a high-order human concept. The Brain researchers had shown the network millions of still frames from YouTube videos, and out of the welter of the pure sensorium the network had isolated a stable pattern any toddler or chipmunk would recognize without a moment’s hesitation as the face of a cat. The machine had not been programmed with the foreknowledge of a cat; it reached directly into the world and seized the idea for itself. (The researchers discovered this with the neural-network equivalent of something like an M.R.I., which showed them that a ghostly cat face caused the artificial neurons to “vote” with the greatest collective enthusiasm.) Most machine learning to that point had been limited by the quantities of labeled data. The cat paper showed that machines could also deal with raw unlabeled data, perhaps even data of which humans had no established foreknowledge. This seemed like a major advance not only in cat-recognition studies but also in overall artificial intelligence.
The lead author on the cat paper was Quoc Le. Le is short and willowy and soft-spoken, with a quick, enigmatic smile and shiny black penny loafers. He grew up outside Hue, Vietnam. His parents were rice farmers, and he did not have electricity at home. His mathematical abilities were obvious from an early age, and he was sent to study at a magnet school for science. In the late 1990s, while still in school, he tried to build a chatbot to talk to. He thought, How hard could this be?
“But actually,” he told me in a whispery deadpan, “it’s very hard.”
He left the rice paddies on a scholarship to a university in Canberra, Australia, where he worked on A.I. tasks like computer vision. The dominant method of the time, which involved feeding the machine definitions for things like edges, felt to him like cheating. Le didn’t know then, or knew only dimly, that there were at least a few dozen computer scientists elsewhere in the world who couldn’t help imagining, as he did, that machines could learn from scratch. In 2006, Le took a position at the Max Planck Institute for Biological Cybernetics in the medieval German university town of Tübingen. In a reading group there, he encountered two new papers by Geoffrey Hinton. People who entered the discipline during the long diaspora all have conversion stories, and when Le read those papers, he felt the scales fall away from his eyes.
“There was a big debate,” he told me. “A very big debate.” We were in a small interior conference room, a narrow, high-ceilinged space outfitted with only a small table and two whiteboards. He looked to the curve he’d drawn on the whiteboard behind him and back again, then softly confided, “I’ve never seen such a big debate.”
He remembers standing up at the reading group and saying, “This is the future.” It was, he said, an “unpopular decision at the time.” A former adviser from Australia, with whom he had stayed close, couldn’t quite understand Le’s decision. “Why are you doing this?” he asked Le in an email.
“I didn’t have a good answer back then,” Le said. “I was just curious. There was a successful paradigm, but to be honest I was just curious about the new paradigm. In 2006, there was very little activity.” He went to join Ng at Stanford and began to pursue Hinton’s ideas. “By the end of 2010, I was pretty convinced something was going to happen.”
What happened, soon afterward, was that Le went to Brain as its first intern, where he carried on with his dissertation work — an extension of which ultimately became the cat paper. On a simple level, Le wanted to see if the computer could be trained to identify on its own the information that was absolutely essential to a given image. He fed the neural network a still he had taken from YouTube. He then told the neural network to throw away some of the information contained in the image, though he didn’t specify what it should or shouldn’t throw away. The machine threw away some of the information, initially at random. Then he said: “Just kidding! Now recreate the initial image you were shown based only on the information you retained.” It was as if he were asking the machine to find a way to “summarize” the image, and then expand back to the original from the summary. If the summary was based on irrelevant data — like the color of the sky rather than the presence of whiskers — the machine couldn’t perform a competent reconstruction. Its reaction would be akin to that of a distant ancestor whose takeaway from his brief exposure to saber-tooth tigers was that they made a restful swooshing sound when they moved. Le’s neural network, unlike that ancestor, got to try again, and again and again and again. Each time it mathematically “chose” to prioritize different pieces of information and performed incrementally better. A neural network, however, was a black box. It divined patterns, but the patterns it identified didn’t always make intuitive sense to a human observer. The same network that hit on our concept of cat also became enthusiastic about a pattern that looked like some sort of furniture-animal compound, like a cross between an ottoman and a goat.
Le didn’t see himself in those heady cat years as a language guy, but he felt an urge to connect the dots to his early chatbot. After the cat paper, he realized that if you could ask a network to summarize a photo, you could perhaps also ask it to summarize a sentence. This problem preoccupied Le, along with a Brain colleague named Tomas Mikolov, for the next two years.
In that time, the Brain team outgrew several offices around him. For a while they were on a floor they shared with executives. They got an email at one point from the administrator asking that they please stop allowing people to sleep on the couch in front of Larry Page and Sergey Brin’s suite. It unsettled incoming V.I.P.s. They were then allocated part of a research building across the street, where their exchanges in the microkitchen wouldn’t be squandered on polite chitchat with the suits. That interim also saw dedicated attempts on the part of Google’s competitors to catch up. (As Le told me about his close collaboration with Tomas Mikolov, he kept repeating Mikolov’s name over and over, in an incantatory way that sounded poignant. Le had never seemed so solemn. I finally couldn’t help myself and began to ask, “Is he … ?” Le nodded. “At Facebook,” he replied.)
Members of the Google Brain team in 2012, after their famous “cat paper” demonstrated the ability of neural networks to analyze unlabeled data. When shown millions of still frames from YouTube, a network isolated a pattern resembling the face of a cat. CreditGoogle
They spent this period trying to come up with neural-network architectures that could accommodate not only simple photo classifications, which were static, but also complex structures that unfolded over time, like language or music. Many of these were first proposed in the 1990s, and Le and his colleagues went back to those long-ignored contributions to see what they could glean. They knew that once you established a facility with basic linguistic prediction, you could then go on to do all sorts of other intelligent things — like predict a suitable reply to an email, for example, or predict the flow of a sensible conversation. You could sidle up to the sort of prowess that would, from the outside at least, look a lot like thinking.

Part II: Language Machine
5. The Linguistic Turn
The hundred or so current members of Brain — it often feels less like a department within a colossal corporate hierarchy than it does a club or a scholastic society or an intergalactic cantina — came in the intervening years to count among the freest and most widely admired employees in the entire Google organization. They are now quartered in a tiered two-story eggshell building, with large windows tinted a menacing charcoal gray, on the leafy northwestern fringe of the company’s main Mountain View campus. Their microkitchen has a foosball table I never saw used; a Rock Band setup I never saw used; and a Go kit I saw used on a few occasions. (I did once see a young Brain research associate introducing his colleagues to ripe jackfruit, carving up the enormous spiky orb like a turkey.)
When I began spending time at Brain’s offices, in June, there were some rows of empty desks, but most of them were labeled with Post-it notes that said things like “Jesse, 6/27.” Now those are all occupied. When I first visited, parking was not an issue. The closest spaces were those reserved for expectant mothers or Teslas, but there was ample space in the rest of the lot. By October, if I showed up later than 9:30, I had to find a spot across the street.
Brain’s growth made Dean slightly nervous about how the company was going to handle the demand. He wanted to avoid what at Google is known as a “success disaster” — a situation in which the company’s capabilities in theory outpaced its ability to implement a product in practice. At a certain point he did some back-of-the-envelope calculations, which he presented to the executives one day in a two-slide presentation.
“If everyone in the future speaks to their Android phone for three minutes a day,” he told them, “this is how many machines we’ll need.” They would need to double or triple their global computational footprint.
“That,” he observed with a little theatrical gulp and widened eyes, “sounded scary. You’d have to” — he hesitated to imagine the consequences — “build new buildings.”
There was, however, another option: just design, mass-produce and install in dispersed data centers a new kind of chip to make everything faster. These chips would be called T.P.U.s, or “tensor processing units,” and their value proposition — counterintuitively — is that they are deliberately less precise than normal chips. Rather than compute 12.246 times 54.392, they will give you the perfunctory answer to 12 times 54. On a mathematical level, rather than a metaphorical one, a neural network is just a structured series of hundreds or thousands or tens of thousands of matrix multiplications carried out in succession, and it’s much more important that these processes be fast than that they be exact. “Normally,” Dean said, “special-purpose hardware is a bad idea. It usually works to speed up one thing. But because of the generality of neural networks, you can leverage this special-purpose hardware for a lot of other things.”
Just as the chip-design process was nearly complete, Le and two colleagues finally demonstrated that neural networks might be configured to handle the structure of language. He drew upon an idea, called “word embeddings,” that had been around for more than 10 years. When you summarize images, you can divine a picture of what each stage of the summary looks like — an edge, a circle, etc. When you summarize language in a similar way, you essentially produce multidimensional maps of the distances, based on common usage, between one word and every single other word in the language. The machine is not “analyzing” the data the way that we might, with linguistic rules that identify some of them as nouns and others as verbs. Instead, it is shifting and twisting and warping the words around in the map. In two dimensions, you cannot make this map useful. You want, for example, “cat” to be in the rough vicinity of “dog,” but you also want “cat” to be near “tail” and near “supercilious” and near “meme,” because you want to try to capture all of the different relationships — both strong and weak — that the word “cat” has to other words. It can be related to all these other words simultaneously only if it is related to each of them in a different dimension. You can’t easily make a 160,000-dimensional map, but it turns out you can represent a language pretty well in a mere thousand or so dimensions — in other words, a universe in which each word is designated by a list of a thousand numbers. Le gave me a good-natured hard time for my continual requests for a mental picture of these maps. “Gideon,” he would say, with the blunt regular demurral of Bartleby, “I do not generally like trying to visualize thousand-dimensional vectors in three-dimensional space.”
Still, certain dimensions in the space, it turned out, did seem to represent legible human categories, like gender or relative size. If you took the thousand numbers that meant “king” and literally just subtracted the thousand numbers that meant “queen,” you got the same numerical result as if you subtracted the numbers for “woman” from the numbers for “man.” And if you took the entire space of the English language and the entire space of French, you could, at least in theory, train a network to learn how to take a sentence in one space and propose an equivalent in the other. You just had to give it millions and millions of English sentences as inputs on one side and their desired French outputs on the other, and over time it would recognize the relevant patterns in words the way that an image classifier recognized the relevant patterns in pixels. You could then give it a sentence in English and ask it to predict the best French analogue.
The major difference between words and pixels, however, is that all of the pixels in an image are there at once, whereas words appear in a progression over time. You needed a way for the network to “hold in mind” the progression of a chronological sequence — the complete pathway from the first word to the last. In a period of about a week, in September 2014, three papers came out — one by Le and two others by academics in Canada and Germany — that at last provided all the theoretical tools necessary to do this sort of thing. That research allowed for open-ended projects like Brain’s Magenta, an investigation into how machines might generate art and music. It also cleared the way toward an instrumental task like machine translation. Hinton told me he thought at the time that this follow-up work would take at least five more years.
It’s no great tragedy if neural networks mislabel 1 percent of cats as dogs, but in something like a self-driving car we all want greater assurances. 
6. The Ambush
Le’s paper showed that neural translation was plausible, but he had used only a relatively small public data set. (Small for Google, that is — it was actually the biggest public data set in the world. A decade of the old Translate had gathered production data that was between a hundred and a thousand times bigger.) More important, Le’s model didn’t work very well for sentences longer than about seven words.
Mike Schuster, who then was a staff research scientist at Brain, picked up the baton. He knew that if Google didn’t find a way to scale these theoretical insights up to a production level, someone else would. The project took him the next two years. “You think,” Schuster says, “to translate something, you just get the data, run the experiments and you’re done, but it doesn’t work like that.”
Schuster is a taut, focused, ageless being with a tanned, piston-shaped head, narrow shoulders, long camo cargo shorts tied below the knee and neon-green Nike Flyknits. He looks as if he woke up in the lotus position, reached for his small, rimless, elliptical glasses, accepted calories in the form of a modest portion of preserved acorn and completed a relaxed desert decathlon on the way to the office; in reality, he told me, it’s only an 18-mile bike ride each way. Schuster grew up in Duisburg, in the former West Germany’s blast-furnace district, and studied electrical engineering before moving to Kyoto to work on early neural networks. In the 1990s, he ran experiments with a neural-networking machine as big as a conference room; it cost millions of dollars and had to be trained for weeks to do something you could now do on your desktop in less than an hour. He published a paper in 1997 that was barely cited for a decade and a half; this year it has been cited around 150 times. He is not humorless, but he does often wear an expression of some asperity, which I took as his signature combination of German restraint and Japanese restraint.
The issues Schuster had to deal with were tangled. For one thing, Le’s code was custom-written, and it wasn’t compatible with the new open-source machine-learning platform Google was then developing, TensorFlow. Dean directed to Schuster two other engineers, Yonghui Wu and Zhifeng Chen, in the fall of 2015. It took them two months just to replicate Le’s results on the new system. Le was around, but even he couldn’t always make heads or tails of what they had done.
As Schuster put it, “Some of the stuff was not done in full consciousness. They didn’t know themselves why they worked.”
This February, Google’s research organization — the loose division of the company, roughly a thousand employees in all, dedicated to the forward-looking and the unclassifiable — convened their leads at an offsite retreat at the Westin St. Francis, on Union Square, a luxury hotel slightly less splendid than Google’s own San Francisco shop a mile or so to the east. The morning was reserved for rounds of “lightning talks,” quick updates to cover the research waterfront, and the afternoon was idled away in cross-departmental “facilitated discussions.” The hope was that the retreat might provide an occasion for the unpredictable, oblique, Bell Labs-ish exchanges that kept a mature company prolific.
At lunchtime, Corrado and Dean paired up in search of Macduff Hughes, director of Google Translate. Hughes was eating alone, and the two Brain members took positions at either side. As Corrado put it, “We ambushed him.”
“O.K.,” Corrado said to the wary Hughes, holding his breath for effect. “We have something to tell you.”
They told Hughes that 2016 seemed like a good time to consider an overhaul of Google Translate — the code of hundreds of engineers over 10 years — with a neural network. The old system worked the way all machine translation has worked for about 30 years: It sequestered each successive sentence fragment, looked up those words in a large statistically derived vocabulary table, then applied a battery of post-processing rules to affix proper endings and rearrange it all to make sense. The approach is called “phrase-based statistical machine translation,” because by the time the system gets to the next phrase, it doesn’t know what the last one was. This is why Translate’s output sometimes looked like a shaken bag of fridge magnets. Brain’s replacement would, if it came together, read and render entire sentences at one draft. It would capture context — and something akin to meaning.
The stakes may have seemed low: Translate generates minimal revenue, and it probably always will. For most Anglophone users, even a radical upgrade in the service’s performance would hardly be hailed as anything more than an expected incremental bump. But there was a case to be made that human-quality machine translation is not only a short-term necessity but also a development very likely, in the long term, to prove transformational. In the immediate future, it’s vital to the company’s business strategy. Google estimates that 50 percent of the internet is in English, which perhaps 20 percent of the world’s population speaks. If Google was going to compete in China — where a majority of market share in search-engine traffic belonged to its competitor Baidu — or India, decent machine translation would be an indispensable part of the infrastructure. Baidu itself had published a pathbreaking paper about the possibility of neural machine translation in July 2015.
‘You think to translate something, you just get the data, run the experiments and you’re done, but it doesn’t work like that.’ 
And in the more distant, speculative future, machine translation was perhaps the first step toward a general computational facility with human language. This would represent a major inflection point — perhaps the major inflection point — in the development of something that felt like true artificial intelligence.

Most people in Silicon Valley were aware of machine learning as a fast-approaching horizon, so Hughes had seen this ambush coming. He remained skeptical. A modest, sturdily built man of early middle age with mussed auburn hair graying at the temples, Hughes is a classic line engineer, the sort of craftsman who wouldn’t have been out of place at a drafting table at 1970s Boeing. His jeans pockets often look burdened with curious tools of ungainly dimension, as if he were porting around measuring tapes or thermocouples, and unlike many of the younger people who work for him, he has a wardrobe unreliant on company gear. He knew that various people in various places at Google and elsewhere had been trying to make neural translation work — not in a lab but at production scale — for years, to little avail.
Hughes listened to their case and, at the end, said cautiously that it sounded to him as if maybe they could pull it off in three years.
Dean thought otherwise. “We can do it by the end of the year, if we put our minds to it.” One reason people liked and admired Dean so much was that he had a long record of successfully putting his mind to it. Another was that he wasn’t at all embarrassed to say sincere things like “if we put our minds to it.”
Hughes was sure the conversion wasn’t going to happen any time soon, but he didn’t personally care to be the reason. “Let’s prepare for 2016,” he went back and told his team. “I’m not going to be the one to say Jeff Dean can’t deliver speed.”
A month later, they were finally able to run a side-by-side experiment to compare Schuster’s new system with Hughes’s old one. Schuster wanted to run it for English-French, but Hughes advised him to try something else. “English-French,” he said, “is so good that the improvement won’t be obvious.”
It was a challenge Schuster couldn’t resist. The benchmark metric to evaluate machine translation is called a BLEU score, which compares a machine translation with an average of many reliable human translations. At the time, the best BLEU scores for English-French were in the high 20s. An improvement of one point was considered very good; an improvement of two was considered outstanding.
The neural system, on the English-French language pair, showed an improvement over the old system of seven points.
Hughes told Schuster’s team they hadn’t had even half as strong an improvement in their own system in the last four years.
To be sure this wasn’t some fluke in the metric, they also turned to their pool of human contractors to do a side-by-side comparison. The user-perception scores, in which sample sentences were graded from zero to six, showed an average improvement of 0.4 — roughly equivalent to the aggregate gains of the old system over its entire lifetime of development.
Google’s Quoc Le (right), whose work demonstrated the plausibility of neural translation, with Mike Schuster, who helped apply that work to Google Translate. CreditBrian Finke for The New York Times
In mid-March, Hughes sent his team an email. All projects on the old system were to be suspended immediately.

7. Theory Becomes Product
Until then, the neural-translation team had been only three people — Schuster, Wu and Chen — but with Hughes’s support, the broader team began to coalesce. They met under Schuster’s command on Wednesdays at 2 p.m. in a corner room of the Brain building called Quartz Lake. The meeting was generally attended by a rotating cast of more than a dozen people. When Hughes or Corrado were there, they were usually the only native English speakers. The engineers spoke Chinese, Vietnamese, Polish, Russian, Arabic, German and Japanese, though they mostly spoke in their own efficient pidgin and in math. It is not always totally clear, at Google, who is running a meeting, but in Schuster’s case there was no ambiguity.
The steps they needed to take, even then, were not wholly clear. “This story is a lot about uncertainty — uncertainty throughout the whole process,” Schuster told me at one point. “The software, the data, the hardware, the people. It was like” — he extended his long, gracile arms, slightly bent at the elbows, from his narrow shoulders — “swimming in a big sea of mud, and you can only see this far.” He held out his hand eight inches in front of his chest. “There’s a goal somewhere, and maybe it’s there.”
Most of Google’s conference rooms have videochat monitors, which when idle display extremely high-resolution oversaturated public Google+ photos of a sylvan dreamscape or the northern lights or the Reichstag. Schuster gestured toward one of the panels, which showed a crystalline still of the Washington Monument at night.

“The view from outside is that everyone has binoculars and can see ahead so far.”
The theoretical work to get them to this point had already been painstaking and drawn-out, but the attempt to turn it into a viable product — the part that academic scientists might dismiss as “mere” engineering — was no less difficult. For one thing, they needed to make sure that they were training on good data. Google’s billions of words of training “reading” were mostly made up of complete sentences of moderate complexity, like the sort of thing you might find in Hemingway. Some of this is in the public domain: The original Rosetta Stone of statistical machine translation was millions of pages of the complete bilingual records of the Canadian Parliament. Much of it, however, was culled from 10 years of collected data, including human translations that were crowdsourced from enthusiastic respondents. The team had in their storehouse about 97 million unique English “words.” But once they removed the emoticons, and the misspellings, and the redundancies, they had a working vocabulary of only around 160,000.
Then you had to refocus on what users actually wanted to translate, which frequently had very little to do with reasonable language as it is employed. Many people, Google had found, don’t look to the service to translate full, complex sentences; they translate weird little shards of language. If you wanted the network to be able to handle the stream of user queries, you had to be sure to orient it in that direction. The network was very sensitive to the data it was trained on. As Hughes put it to me at one point: “The neural-translation system is learning everything it can. It’s like a toddler. ‘Oh, Daddy says that word when he’s mad!’ ” He laughed. “You have to be careful.”
More than anything, though, they needed to make sure that the whole thing was fast and reliable enough that their users wouldn’t notice. In February, the translation of a 10-word sentence took 10 seconds. They could never introduce anything that slow. The Translate team began to conduct latency experiments on a small percentage of users, in the form of faked delays, to identify tolerance. They found that a translation that took twice as long, or even five times as long, wouldn’t be registered. An eightfold slowdown would. They didn’t need to make sure this was true across all languages. In the case of a high-traffic language, like French or Chinese, they could countenance virtually no slowdown. For something more obscure, they knew that users wouldn’t be so scared off by a slight delay if they were getting better quality. They just wanted to prevent people from giving up and switching over to some competitor’s service.
Schuster, for his part, admitted he just didn’t know if they ever could make it fast enough. He remembers a conversation in the microkitchen during which he turned to Chen and said, “There must be something we don’t know to make it fast enough, but I don’t know what it could be.”
He did know, though, that they needed more computers — “G.P.U.s,” graphics processors reconfigured for neural networks — for training.
Hughes went to Schuster to ask what he thought. “Should we ask for a thousand G.P.U.s?”
Schuster said, “Why not 2,000?”
In the more distant, speculative future, machine translation was perhaps the first step toward a general computational facility with human language. 
Ten days later, they had the additional 2,000 processors.
By April, the original lineup of three had become more than 30 people — some of them, like Le, on the Brain side, and many from Translate. In May, Hughes assigned a kind of provisional owner to each language pair, and they all checked their results into a big shared spreadsheet of performance evaluations. At any given time, at least 20 people were running their own independent weeklong experiments and dealing with whatever unexpected problems came up. One day a model, for no apparent reason, started taking all the numbers it came across in a sentence and discarding them. There were months when it was all touch and go. “People were almost yelling,” Schuster said.
By late spring, the various pieces were coming together. The team introduced something called a “word-piece model,” a “coverage penalty,” “length normalization.” Each part improved the results, Schuster says, by maybe a few percentage points, but in aggregate they had significant effects. Once the model was standardized, it would be only a single multilingual model that would improve over time, rather than the 150 different models that Translate currently used. Still, the paradox — that a tool built to further generalize, via learning machines, the process of automation required such an extraordinary amount of concerted human ingenuity and effort — was not lost on them. So much of what they did was just gut. How many neurons per layer did you use? 1,024 or 512? How many layers? How many sentences did you run through at a time? How long did you train for?
“We did hundreds of experiments,” Schuster told me, “until we knew that we could stop the training after one week. You’re always saying: When do we stop? How do I know I’m done? You never know you’re done. The machine-learning mechanism is never perfect. You need to train, and at some point you have to stop. That’s the very painful nature of this whole system. It’s hard for some people. It’s a little bit an art — where you put your brush to make it nice. It comes from just doing it. Some people are better, some worse.”
By May, the Brain team understood that the only way they were ever going to make the system fast enough to implement as a product was if they could run it on T.P.U.s, the special-purpose chips that Dean had called for. As Chen put it: “We did not even know if the code would work. But we did know that without T.P.U.s, it definitely wasn’t going to work.” He remembers going to Dean one on one to plead, “Please reserve something for us.” Dean had reserved them. The T.P.U.s, however, didn’t work right out of the box. Wu spent two months sitting next to someone from the hardware team in an attempt to figure out why. They weren’t just debugging the model; they were debugging the chip. The neural-translation project would be proof of concept for the whole infrastructural investment.
One Wednesday in June, the meeting in Quartz Lake began with murmurs about a Baidu paper that had recently appeared on the discipline’s chief online forum. Schuster brought the room to order. “Yes, Baidu came out with a paper. It feels like someone looking through our shoulder — similar architecture, similar results.” The company’s BLEU scores were essentially what Google achieved in its internal tests in February and March. Le didn’t seem ruffled; his conclusion seemed to be that it was a sign Google was on the right track. “It is very similar to our system,” he said with quiet approval.
The Google team knew that they could have published their results earlier and perhaps beaten their competitors, but as Schuster put it: “Launching is more important than publishing. People say, ‘Oh, I did something first,’ but who cares, in the end?”
This did, however, make it imperative that they get their own service out first and better. Hughes had a fantasy that they wouldn’t even inform their users of the switch. They would just wait and see if social media lit up with suspicions about the vast improvements.
“We don’t want to say it’s a new system yet,” he told me at 5:36 p.m. two days after Labor Day, one minute before they rolled out Chinese-to-English to 10 percent of their users, without telling anyone. “We want to make sure it works. The ideal is that it’s exploding on Twitter: ‘Have you seen how awesome Google Translate got?’ ”

8. A Celebration
The only two reliable measures of time in the seasonless Silicon Valley are the rotations of seasonal fruit in the microkitchens — from the pluots of midsummer to the Asian pears and Fuyu persimmons of early fall — and the zigzag of technological progress. On an almost uncomfortably warm Monday afternoon in late September, the team’s paper was at last released. It had an almost comical 31 authors. The next day, the members of Brain and Translate gathered to throw themselves a little celebratory reception in the Translate microkitchen. The rooms in the Brain building, perhaps in homage to the long winters of their diaspora, are named after Alaskan locales; the Translate building’s theme is Hawaiian.
The Hawaiian microkitchen has a slightly grainy beach photograph on one wall, a small lei-garlanded thatched-hut service counter with a stuffed parrot at the center and ceiling fixtures fitted to resemble paper lanterns. Two sparse histograms of bamboo poles line the sides, like the posts of an ill-defended tropical fort. Beyond the bamboo poles, glass walls and doors open onto rows of identical gray desks on either side. That morning had seen the arrival of new hooded sweatshirts to honor 10 years of Translate, and many team members went over to the party from their desks in their new gear. They were in part celebrating the fact that their decade of collective work was, as of that day, en route to retirement. At another institution, these new hoodies might thus have become a costume of bereavement, but the engineers and computer scientists from both teams all seemed pleased.
‘It was like swimming in a big sea of mud, and you can only see this far.’ Schuster held out his hand eight inches in front of his chest. 
Google’s neural translation was at last working. By the time of the party, the company’s Chinese-English test had already processed 18 million queries. One engineer on the Translate team was running around with his phone out, trying to translate entire sentences from Chinese to English using Baidu’s alternative. He crowed with glee to anybody who would listen. “If you put in more than two characters at once, it times out!” (Baidu says this problem has never been reported by users.)
When word began to spread, over the following weeks, that Google had introduced neural translation for Chinese to English, some people speculated that it was because that was the only language pair for which the company had decent results. Everybody at the party knew that the reality of their achievement would be clear in November. By then, however, many of them would be on to other projects.
Hughes cleared his throat and stepped in front of the tiki bar. He wore a faded green polo with a rumpled collar, lightly patterned across the midsection with dark bands of drying sweat. There had been last-minute problems, and then last-last-minute problems, including a very big measurement error in the paper and a weird punctuation-related bug in the system. But everything was resolved — or at least sufficiently resolved for the moment. The guests quieted. Hughes ran efficient and productive meetings, with a low tolerance for maundering or side conversation, but he was given pause by the gravity of the occasion. He acknowledged that he was, perhaps, stretching a metaphor, but it was important to him to underline the fact, he began, that the neural translation project itself represented a “collaboration between groups that spoke different languages.”
Their neural-translation project, he continued, represented a “step function forward” — that is, a discontinuous advance, a vertical leap rather than a smooth curve. The relevant translation had been not just between the two teams but from theory into reality. He raised a plastic demi-flute of expensive-looking Champagne.
“To communication,” he said, “and cooperation!”

The engineers assembled looked around at one another and gave themselves over to little circumspect whoops and applause.
Jeff Dean stood near the center of the microkitchen, his hands in his pockets, shoulders hunched slightly inward, with Corrado and Schuster. Dean saw that there was some diffuse preference that he contribute to the observance of the occasion, and he did so in a characteristically understated manner, with a light, rapid, concise addendum.
What they had shown, Dean said, was that they could do two major things at once: “Do the research and get it in front of, I dunno, half a billion people.”
Everyone laughed, not because it was an exaggeration but because it wasn’t.

Epilogue: Machines Without Ghosts
Perhaps the most famous historic critique of artificial intelligence, or the claims made on its behalf, implicates the question of translation. The Chinese Room argument was proposed in 1980 by the Berkeley philosopher John Searle. In Searle’s thought experiment, a monolingual English speaker sits alone in a cell. An unseen jailer passes him, through a slot in the door, slips of paper marked with Chinese characters. The prisoner has been given a set of tables and rules in English for the composition of replies. He becomes so adept with these instructions that his answers are soon “absolutely indistinguishable from those of Chinese speakers.” Should the unlucky prisoner be said to “understand” Chinese? Searle thought the answer was obviously not. This metaphor for a computer, Searle later wrote, exploded the claim that “the appropriately programmed digital computer with the right inputs and outputs would thereby have a mind in exactly the sense that human beings have minds.”
For the Google Brain team, though, or for nearly everyone else who works in machine learning in Silicon Valley, that view is entirely beside the point. This doesn’t mean they’re just ignoring the philosophical question. It means they have a fundamentally different view of the mind. Unlike Searle, they don’t assume that “consciousness” is some special, numinously glowing mental attribute — what the philosopher Gilbert Ryle called the “ghost in the machine.” They just believe instead that the complex assortment of skills we call “consciousness” has randomly emerged from the coordinated activity of many different simple mechanisms. The implication is that our facility with what we consider the higher registers of thought are no different in kind from what we’re tempted to perceive as the lower registers. Logical reasoning, on this account, is seen as a lucky adaptation; so is the ability to throw and catch a ball. Artificial intelligence is not about building a mind; it’s about the improvement of tools to solve problems. As Corrado said to me on my very first day at Google, “It’s not about what a machine ‘knows’ or ‘understands’ but what it ‘does,’ and — more importantly — what it doesn’t do yet.”
Where you come down on “knowing” versus “doing” has real cultural and social implications. At the party, Schuster came over to me to express his frustration with the paper’s media reception. “Did you see the first press?” he asked me. He paraphrased a headline from that morning, blocking it word by word with his hand as he recited it: GOOGLE SAYS A.I. TRANSLATION IS INDISTINGUISHABLE FROM HUMANS’. Over the final weeks of the paper’s composition, the team had struggled with this; Schuster often repeated that the message of the paper was “It’s much better than it was before, but not as good as humans.” He had hoped it would be clear that their efforts weren’t about replacing people but helping them.
And yet the rise of machine learning makes it more difficult for us to carve out a special place for us. If you believe, with Searle, that there is something special about human “insight,” you can draw a clear line that separates the human from the automated. If you agree with Searle’s antagonists, you can’t. It is understandable why so many people cling fast to the former view. At a 2015 M.I.T. conference about the roots of artificial intelligence, Noam Chomsky was asked what he thought of machine learning. He pooh-poohed the whole enterprise as mere statistical prediction, a glorified weather forecast. Even if neural translation attained perfect functionality, it would reveal nothing profound about the underlying nature of language. It could never tell you if a pronoun took the dative or the accusative case. This kind of prediction makes for a good tool to accomplish our ends, but it doesn’t succeed by the standards of furthering our understanding of why things happen the way they do. A machine can already detect tumors in medical scans better than human radiologists, but the machine can’t tell you what’s causing the cancer.
Then again, can the radiologist?
Medical diagnosis is one field most immediately, and perhaps unpredictably, threatened by machine learning. Radiologists are extensively trained and extremely well paid, and we think of their skill as one of professional insight — the highest register of thought. In the past year alone, researchers have shown not only that neural networks can find tumors in medical images much earlier than their human counterparts but also that machines can even make such diagnoses from the texts of pathology reports. What radiologists do turns out to be something much closer to predictive pattern-matching than logical analysis. They’re not telling you what caused the cancer; they’re just telling you it’s there.
Once you’ve built a robust pattern-matching apparatus for one purpose, it can be tweaked in the service of others. One Translate engineer took a network he put together to judge artwork and used it to drive an autonomous radio-controlled car. A network built to recognize a cat can be turned around and trained on CT scans — and on infinitely more examples than even the best doctor could ever review. A neural network built to translate could work through millions of pages of documents of legal discovery in the tiniest fraction of the time it would take the most expensively credentialed lawyer. The kinds of jobs taken by automatons will no longer be just repetitive tasks that were once — unfairly, it ought to be emphasized — associated with the supposed lower intelligence of the uneducated classes. We’re not only talking about three and a half million truck drivers who may soon lack careers. We’re talking about inventory managers, economists, financial advisers, real estate agents. What Brain did over nine months is just one example of how quickly a small group at a large company can automate a task nobody ever would have associated with machines.
The most important thing happening in Silicon Valley right now is not disruption. Rather, it’s institution-building — and the consolidation of power — on a scale and at a pace that are both probably unprecedented in human history. Brain has interns; it has residents; it has “ninja” classes to train people in other departments. Everywhere there are bins of free bike helmets, and free green umbrellas for the two days a year it rains, and little fruit salads, and nap pods, and shared treadmill desks, and massage chairs, and random cartons of high-end pastries, and places for baby-clothes donations, and two-story climbing walls with scheduled instructors, and reading groups and policy talks and variegated support networks. The recipients of these major investments in human cultivation — for they’re far more than perks for proles in some digital salt mine — have at hand the power of complexly coordinated servers distributed across 13 data centers on four continents, data centers that draw enough electricity to light up large cities.
But even enormous institutions like Google will be subject to this wave of automation; once machines can learn from human speech, even the comfortable job of the programmer is threatened. As the party in the tiki bar was winding down, a Translate engineer brought over his laptop to show Hughes something. The screen swirled and pulsed with a vivid, kaleidoscopic animation of brightly colored spheres in long looping orbits that periodically collapsed into nebulae before dispersing once more.
Hughes recognized what it was right away, but I had to look closely before I saw all the names — of people and files. It was an animation of the history of 10 years of changes to the Translate code base, every single buzzing and blooming contribution by every last team member. Hughes reached over gently to skip forward, from 2006 to 2008 to 2015, stopping every once in a while to pause and remember some distant campaign, some ancient triumph or catastrophe that now hurried by to be absorbed elsewhere or to burst on its own. Hughes pointed out how often Jeff Dean’s name expanded here and there in glowing spheres.
Hughes called over Corrado, and they stood transfixed. To break the spell of melancholic nostalgia, Corrado, looking a little wounded, looked up and said, “So when do we get to delete it?”
Don’t worry about it,” Hughes said. “The new code base is going to grow. Everything grows.
Correction: December 22, 2016 
An earlier version of this article referred incorrectly to a computer used in space travel. A computer was used to guide Apollo missions — not the “Apollo shuttle.” (There was no such shuttle.)
Gideon Lewis-Kraus is a writer at large for the magazine and a fellow at New America. He last wrote about the contradictions of travel photography
DEC. 14, 2016

AI Software Learns to Make AI Software

By Hugo Angel,


by Tom Simonite

January 18, 2017

Google and others think software that learns to learn could take over some work done by AI experts.

Progress in artificial intelligence causes some people to worry that software will take jobs such as driving trucks away from humans. Now leading researchers are finding that they can make software that can learn to do one of the trickiest parts of their own jobs—the task of designing machine-learning software.

In one experiment, researchers at the Google Brain artificial intelligence research group had software design a machine-learning system to take a test used to benchmark software that processes language. What it came up with surpassed previously published results from software designed by humans.

In recent months several other groups have also reported progress on getting learning software to make learning software. They include researchers at

If self-starting AI techniques become practical, they could increase the pace at which machine-learning software is implemented across the economy. Companies must currently pay a premium for machine-learning experts, who are in short supply.

Jeff Dean, who leads the Google Brain research group, mused last week that some of the work of such workers could be supplanted by software. He described what he termed “automated machine learning” as one of the most promising research avenues his team was exploring.

Currently the way you solve problems is you have expertise and data and computation,” said Dean, at the AI Frontiers conference in Santa Clara, California. “Can we eliminate the need for a lot of machine-learning expertise?

One set of experiments from Google’s DeepMind group suggests that what researchers are terming “learning to learn” could also help lessen the problem of machine-learning software needing to consume vast amounts of data on a specific task in order to perform it well.

The researchers challenged their software to create learning systems for collections of multiple different, but related, problems, such as navigating mazes. It came up with designs that showed an ability to generalize, and pick up new tasks with less additional training than would be usual.

The idea of creating software that learns to learn has been around for a while, but previous experiments didn’t produce results that rivaled what humans could come up with. “It’s exciting,” says Yoshua Bengio, a professor at the University of Montreal, who previously explored the idea in the 1990s.

Bengio says the more potent computing power now available, and the advent of a technique called deep learning, which has sparked recent excitement about AI, are what’s making the approach work. But he notes that so far it requires such extreme computing power that it’s not yet practical to think about lightening the load, or partially replacing, machine-learning experts.

Google Brain’s researchers describe using 800 high-powered graphics processors to power software that came up with designs for image recognition systems that rivaled the best designed by humans.

Otkrist Gupta, a researcher at the MIT Media Lab, believes that will change. He and MIT colleagues plan to open-source the software behind their own experiments, in which learning software designed deep-learning systems that matched human-crafted ones on standard tests for object recognition.

Gupta was inspired to work on the project by frustrating hours spent designing and testing machine-learning models. He thinks companies and researchers are well motivated to find ways to make automated machine learning practical.

Easing the burden on the data scientist is a big payoff,” he says. “It could make you more productive, make you better models, and make you free to explore higher-level ideas.

Deep Learning AI Listens to Machines For Signs of Trouble

By Hugo Angel,

By Jeremy Hsu,
December 27th, 2016
Image: 3DSignals


Driving your car until it breaks down on the road is never anyone’s favorite way to learn the need for routine maintenance. But preventive or scheduled maintenance checks often miss many of the problems that can come up. An Israeli startup has come up with a better idea: Use artificial intelligence to listen for early warning signs that a car might be nearing a breakdown.

The service of 3DSignals, a startup based in Kefar Sava, Israel, relies on the artificial intelligence technique known as deep learning to understand the noise patterns of troubled machines and predict problems in advance. 3DSignals has already begun talking with leading European automakers about possibly using the deep learning service to detect possible trouble both in auto factory machinery and in the cars themselves. The startup has even chatted with companies about using their service to automatically detect problems in future taxi fleets of driverless cars.

Deep learning usually refers to software algorithms known as artificial neural networks. These neural networks can learn to become better at specific tasks by filtering relevant data through multiple (deep) layers of artificial neurons. Many companies such as Google and Facebook have used deep learning to develop AI systems that

Many tech giants have also applied deep learning to make their services become better at automatically recognizing the spoken sounds of different human languages. But few companies have bothered with using deep learning to develop AI that’s good at listening to other acoustic signals such as the sounds of machines or music. That’s where 3DSignals hopes it can become a big player with its deep learning focus on more general sound patterns, Lavi explains.

I think most of the world is occupied with deep learning on images. This is by far the most popular application and the most recent. But part of the industry is doing deep learning on acoustics focused on speech recognition and conversation. I think we are probably in the very small group of companies doing acoustics which is more general. This is my aim, to be the world leader in general acoustics deep learning.

For each client, 3DSignals installs ultrasonic microphones that can detect sounds ranging up to 100 kilohertz (human hearing range is between 20 hertz and 20 kilohertz). The startup’s “Internet of Things” service connects the microphones to a computing device that can process some of the data and then upload the information to an online network where the deep learning algorithms do their work. Clients can always check the status of their machines by using any Web-connected device such as a smartphone or tablet.

The first clients for 3DSignals include heavy industry companies operating machinery such as circular cutting blades in mills or hydroelectric turbines in power plants. These companies started out by purchasing the first tier of the 3DSignals service that does not use deep learning. Instead, this first tier of service uses software that relies on basic physics modeling of certain machine parts—such as circular cutting saws—to predict when some parts may start to wear out. That allows the clients to begin getting value from day one.

The second tier of the service uses a deep learning algorithm and the sounds coming from the microphones to help detect strange or unusual noises from the machines. The deep learning algorithms train on sound patterns that can signal general problems with the machines. But only the third tier of the service, also using deep learning, can classify the sounds as indicating specific types of problems. Before this can happen, though, the clients need to help train the deep learning algorithm by first labeling certain sound patterns as belonging to specific types of problems.

After a while, we can not only say when problem type A happens, but we can say before it happens, you’re going to have problem type A in five hours,” Lavi says. “Some problems don’t happen instantly; there’s a deterioration.

When trained, the 3DSignals deep learning algorithms are able to identify predict specific problems in advance with 98 percent accuracy. But the current clients using the 3DSignals system have not yet begun taking advantage of this classification capability; they are still building their training datasets by having people manually label specific sound signatures as belonging to specific problems.

The one-year-old startup has just 15 employees, but it has grown fairly fast and raised $3.3 million so far from investors such as Dov Moran, the Israeli entrepreneur credited with being one of the first to invent USB flash drives. Lavi and his fellow co-founders are already eying several big markets that include automobiles and the energy sector beyond hydroelectric power plants. A series A funding round to attract venture capital is planned for sometime in 2017.

If all goes well, 3DSignals could expand its lead in the growing market for providing “predictive maintenance” to factories, power plants, and car owners. The impending arrival of driverless cars may put even more responsibility on the metaphorical shoulders of a deep learning AI that could listen for problems while the human passengers tune out from the driving experience. On top of all this, 3DSignals has the chance to pioneer the advancement of deep learning in listening to general sounds. Not bad for a small startup.

“It’s important for us to be specialists in general acoustic deep learning, because the research literature does not cover it,” Lavi says.

The Current State of Machine Intelligence 3.0

By Hugo Angel,


(originally published by O’Reilly here, this year in collaboration with my amazing partner James Cham! If you’re interested in enterprise implications of this chart please refer to Harvard Business Review’s The Competitive Landscape for Machine Intelligence)
Almost a year ago, we published our now-annual landscape of machine intelligence companies, and goodness have we seen a lot of activity since then. This year’s landscape has a third more companies than our first one did two years ago, and it feels even more futile to try to be comprehensive, since this just scratches the surface of all of the activity out there.
As has been the case for the last couple of years, our fund still obsesses over “problem first” machine intelligence—we’ve invested in 35 machine intelligence companies solving 35 meaningful problems in areas from security to recruiting to software development. (Our fund focuses on the future of work, so there are some machine intelligence domains where we invest more than others.)
At the same time, the hype around machine intelligence methods continues to grow: the words “deep learning” now equally represent a series of meaningful breakthroughs (wonderful) but also a hyped phrase like “big data” (not so good!). We care about whether a founder uses the right method to solve a problem, not the fanciest one. We favor those who apply technology thoughtfully.
What’s the biggest change in the last year? We are getting inbound inquiries from a different mix of people. For v1.0, we heard almost exclusively from founders and academics. Then came a healthy mix of investors, both private and public. Now overwhelmingly we have heard from existing companies trying to figure out how to transform their businesses using machine intelligence.
For the first time, a “one stop shop” of the machine intelligence stack is coming into view—even if it’s a year or two off from being neatly formalized. The maturing of that stack might explain why more established companies are more focused on building legitimate machine intelligence capabilities. Anyone who has their wits about them is still going to be making initial build-and-buy decisions, so we figured an early attempt at laying out these technologies is better than no attempt.
Ready player world
Many of the most impressive looking feats we’ve seen have been in the gaming world, from DeepMind beating Atari classics and the world’s best at Go, to the OpenAI gym, which allows anyone to train intelligent agents across an array of gaming environments.
The gaming world offers a perfect place to start machine intelligence work (e.g., constrained environments, explicit rewards, easy-to-compare results, looks impressive)—especially for reinforcement learning. And it is much easier to have a self-driving car agent go a trillion miles in a simulated environment than on actual roads. Now we’re seeing the techniques used to conquer the gaming world moving to the real world. A newsworthy example of game-tested technology entering the real world was when DeepMind used neural networks to make Google’s data centers more efficient. This begs questions: What else in the world looks like a game? Or what else in the world can we reconfigure to make it look more like a game?
Early attempts are intriguing. Developers are dodging meter maids (brilliant—a modern day Paper Boy), categorizing cucumbers, sorting trash, and recreating the memories of loved ones as conversational bots. Otto’s self-driving trucks delivering beer on their first commercial ride even seems like a bonus level from Grand Theft Auto. We’re excited to see what new creative applications come in the next year.
Why even bot-her?
Ah, the great chatbot explosion of 2016, for better or worse—we liken it to the mobile app explosion we saw with the launch of iOS and Android. The dominant platforms (in the machine intelligence case, Facebook, Slack, Kik) race to get developers to build on their platforms. That means we’ll get some excellent bots but also many terrible ones—the joys of public experimentation.
The danger here, unlike the mobile app explosion (where we lacked expectations for what these widgets could actually do), is that we assume anything with a conversation interface will converse with us at near-human level. Most do not. This is going to lead to disillusionment over the course of the next year but it will clean itself up fairly quickly thereafter.
When our fund looks at this emerging field, we divide each technology into two components: the conversational interface itself and the “agent” behind the scenes that’s learning from data and transacting on a user’s behalf. While you certainly can’t drop the ball on the interface, we spend almost all our time thinking about that behind-the-scenes agent and whether it is actually solving a meaningful problem.
We get a lot of questions about whether there will be “one bot to rule them all.” To be honest, as with many areas at our fund, we disagree on this. We certainly believe there will not be one agent to rule them all, even if there is one interface to rule them all. For the time being, bots will be idiot savants: stellar for very specific applications.
We’ve written a bit about this, and the framework we use to think about how agents will evolve is a CEO and her support staff. Many Fortune 500 CEOs employ a scheduler, handler, a research team, a copy editor, a speechwriter, a personal shopper, a driver, and a professional coach. Each of these people performs a dramatically different function and has access to very different data to do their job. The bot / agent ecosystem will have a similar separation of responsibilities with very clear winners, and they will divide fairly cleanly along these lines. (Note that some CEO’s have a chief of staff who coordinates among all these functions, so perhaps we will see examples of “one interface to rule them all.”)
You can also see, in our landscape, some of the corporate functions machine intelligence will re-invent (most often in interfaces other than conversational bots).
On to 11111000001
Successful use of machine intelligence at a large organization is surprisingly binary, like flipping a stubborn light switch. It’s hard to do, but once machine intelligence is enabled, an organization sees everything through the lens of its potential. Organizations like Google, Facebook, Apple, Microsoft, Amazon, Uber, and Bloomberg (our sole investor) bet heavily on machine intelligence and have its capabilities pervasive throughout all of their products.
Other companies are struggling to figure out what to do, as many boardrooms did on “what to do about the Internet” in 1997. Why is this so difficult for companies to wrap their heads around? Machine intelligence is different from traditional software. Unlike with big data, where you could buy a new capability, machine intelligence depends on deeper organizational and process changes. Companies need to decide whether they will trust machine intelligence analysis for one-off decisions or if they will embed often-inscrutable machine intelligence models in core processes. Teams need to figure out how to test newfound capabilities, and applications need to change so they offer more than a system of record; they also need to coach employees and learn from the data they enter.
Unlike traditional hard-coded software, machine intelligence gives only probabilistic outputs. We want to ask machine intelligence to make subjective decisions based on imperfect information (eerily like what we trust our colleagues to do?). As a result, this new machine intelligence software will make mistakes, just like we do, and we’ll need to be thoughtful about when to trust it and when not to.
The idea of this new machine trust is daunting and makes machine intelligence harder to adopt than traditional software. We’ve had a few people tell us that the biggest predictor of whether a company will successfully adopt machine intelligence is whether they have a C-Suite executive with an advanced math degree. These executives understand it isn’t magic—it is just (hard) math.
Machine intelligence business models are going to be different from licensed and subscription software, but we don’t know how. Unlike traditional software, we still lack frameworks for management to decide where to deploy machine intelligence. Economists like Ajay Agrawal, Joshua Gans, and Avi Goldfarb have taken the first steps toward helping managers understand the economics of machine intelligence and predict where it will be most effective. But there is still a lot of work to be done.
In the next few years, the danger here isn’t what we see in dystopian sci-fi movies. The real danger of machine intelligence is that executives will make bad decisions about what machine intelligence capabilities to build.
Peter Pan’s never-never land
We’ve been wondering about the path to grow into a large machine intelligence company. Unsurprisingly, there have been many machine intelligence acquisitions (Nervana by Intel, Magic Pony by Twitter, Turi by Apple, Metamind by Salesforce, Otto by Uber, Cruise by GM, SalesPredict by Ebay, Viv by Samsung). Many of these happened fairly early in a company’s life and at quite a high price. Why is that?
Established companies struggle to understand machine intelligence technology, so it’s painful to sell to them, and the market for buyers who can use this technology in a self-service way is small. Then, if you do understand how this technology can supercharge your organization, you realize it’s so valuable that you want to hoard it. Businesses are saying to machine intelligence companies, “forget you selling this technology to others, I’m going to buy the whole thing.”
This absence of a market today makes it difficult for a machine intelligence startup, especially horizontal technology providers, to “grow up”—hence the Peter Pans. Companies we see successfully entering a long-term trajectory can package their technology as a new problem-specific application for enterprise or simply transform an industry themselves as a new entrant (love this). We flagged a few of the industry categories where we believe startups might “go the distance” in this year’s landscape.
Inspirational machine intelligence
Once we do figure it out, machine intelligence can solve much more interesting problems than traditional software. We’re thrilled to see so many smart people applying machine intelligence for good.
Established players like Conservation Metrics and Vulcan Conservation have been using deep learning to protect endangered animal species; the ever-inspiring team at Thorn is constantly coming up with creative algorithmic techniques to protect our children from online exploitation. The philanthropic arms of the tech titans joined in, enabling nonprofits with free storage, compute, and even developer time. Google partnered with nonprofits to found Global Fishing Watch to detect illegal fishing activity using satellite data in near real time, satellite intelligence startup Orbital Insight (in which we are investors) partnered with Global Forest Watch to detect illegal logging and other causes of global forest degradation. Startups are getting into the action, too. The Creative Destruction Lab machine intelligence accelerator (with whom we work closely) has companies working on problems like earlier diseasedetection and injury prevention. One area where we have seen some activity but would love to see more is machine intelligence to assist the elderly.
In talking to many people using machine intelligence for good, they all cite the critical role of open source technologies. In the last year, we’ve seen the launch of OpenAI, which offers everyone access to world class research and environments, and better and better releases of TensorFlow and Keras. Non-profits are always trying to do more with less, and machine intelligence has allowed them to extend the scope of their missions without extending budget. Algorithms allow non-profits to inexpensively scale what would not be affordable to do with people.
We also saw growth in universities and corporate think tanks, where new centers like USC’s Center for AI in Society, Berkeley’s Center for Human Compatible AI, and the multiple-corporation Partnership on AI study the ways in which machine intelligence can help humanity. The White House even got into the act: after a series of workshops around the U.S., they published a 48-page report outlining their recommendations for applying machine intelligence to safely and fairly address broad social problems.
On a lighter note, we’ve also heard whispers of more artisanal versions of machine intelligence. Folks are doing things like using computer vision algorithms to help them choose the best cocoa beans for high-grade chocolate, write poetry, cook steaks, and generate musicals.
Curious minds want to know. If you’re working on a unique or important application of machine intelligence we’d love to hear from you.
Looking forward
We see all this activity only continuing to accelerate. The world will give us more open sourced and commercially available machine intelligence building blocks, there will be more data, there will be more people interested in learning these methods, and there will always be problems worth solving. We still need ways of explaining the difference between machine intelligence and traditional software, and we’re working on that. The value of code is different from data, but what about the value of the model that code improves based on that data?
Once we understand machine intelligence deeply, we might look back on the era of traditional software and think it was just a prologue to what’s happening now. We look forward to seeing what the next year brings.
A massive thank you to the Bloomberg Beta team, David Klein, Adam Gibson, Ajay Agrawal, Alexandra Suich, Angela Tranyens, Anthony Goldblum, Avi Goldfarb, Beau Cronin, Ben Lorica, Chris Nicholson, Doug Fulop, Dror Berman, Dylan Tweney, Gary Kazantsev, Gideon Mann, Gordon Ritter, Jack Clark, John Lilly, Jon Lehr, Joshua Gans, Lauren Barless, Matt Turck, Matthew Granade, Mickey Graham, Nick Adams, Roger Magoulas, Sean Gourley, Shruti Gandhi, Steve Jurvetson, Vijay Sundaram, Zavain Dar, and for the help and fascinating conversations that led to this year’s report!
Landscape designed by Heidi Skinner.
Disclosure: Bloomberg Beta is an investor in Alation, Arimo, Aviso, Brightfunnel, Context Relevant, Deep Genomics, Diffbot, Digital Genius, Domino Data Labs, Drawbridge, Gigster, Gradescope, Graphistry, Gridspace, Howdy, Kaggle,, Mavrx, Motiva, PopUpArchive, Primer, Sapho, Shield.AI, Textio, and Tule.
The Current State of Machine Intelligence 2.0
A year ago, I published my original attempt at mapping the machine intelligence ecosystem. So much has happened since. I spent the last 12 months geeking out on every company and nibble of information I can find, chatting with hundreds of academics, entrepreneurs, and investors about machine intelligence. This year, given the explosion of activity, my focus is on highlighting areas of innovation, rather than on trying to be comprehensive. Figure 1 showcases the new landscape of machine intelligence as we enter 2016:
Despite the noisy hype, which sometimes distracts, machine intelligence is already being used in several valuable ways. Machine intelligence already helps us get the important business information we need more quickly, monitors critical systems, feeds our population more efficiently, reduces the cost of health care, detects disease earlier, and so on.
The two biggest changes I’ve noted since I did this analysis last year are (1) the emergence of autonomous systems in both the physical and virtual world and (2) startups shifting away from building broad technology platforms to focusing on solving specific business problems.
Reflections on the landscape
With the focus moving from “machine intelligence as magic box” to delivering real value immediately, there are more ways to bring a machine intelligence startup to market. (There are as many ways to go to market as there are business problems to solve. I lay out many of the optionshere.)Most of these machine intelligence startups take well-worn machine intelligence techniques, some more than a decade old, and apply them to new data sets and workflows. It’s still true that big companies, with their massive data sets and contact with their customers, have inherent advantages — though startups are finding a way to enter.
Achieving autonomy
In last year’s roundup, the focus was almost exclusively on machine intelligence in the virtual world. This time we’re seeing it in the physical world, in the many flavors of autonomous systems: self-driving cars, autopilot drones, robots that can perform dynamic tasks without every action being hard coded. It’s still very early days — most of these systems are just barely useful, though we expect that to change quickly.
These physical systems are emerging because they meld many now-maturing research avenues in machine intelligence. Computer vision, the combination of deep learning and reinforcement learning, natural language interfaces, and question-answering systems are all building blocks to make a physical system autonomous and interactive. Building these autonomous systems today is as much about integrating these methods as inventing new ones.
The new (in)human touch
The virtual world is becoming more autonomous, too. Virtual agents, sometimes called bots, use conversational interfaces (think of Her, without the charm). Some of these virtual agents are entirely automated, others are a “human-in-the-loop” system, where algorithms take “machine-like” subtasks and a human adds creativity or execution. (In some, the human is training the bot while she or he works.) The user interacts with the system by either typing in natural language or speaking, and the agent responds in kind.
These services sometimes give customers confusing experiences, like mine the other day when I needed to contact customer service about my cell phone. I didn’t want to talk to anyone, so I opted for online chat. It was the most “human” customer service experience of my life, so weirdly perfect I found myself wondering whether I was chatting with a person, a bot, or some hybrid. Then I wondered if it even mattered. I had a fantastic experience and my issue was resolved. I felt gratitude to whatever it was on the other end, even if it was a bot.
On one hand, these agents can act utterly professional, helping us with customer support, research, project management, scheduling, and e-commerce transactions. On the other hand, they can be quite personal and maybe we are getting closer to Her — with Microsoft’s romantic chatbotXiaoice, automated emotional support is already here.
As these technologies warm up, they could transform new areas like education, psychiatry, and elder care, working alongside human beings to close the gap in care for students, patients, and the elderly.
50 shades of grey markets
At least I make myself laugh. 😉
Many machine intelligence technologies will transform the business world by starting in regulatory grey areas. On the short list: health care (automated diagnostics, early disease detection based on genomics, algorithmic drug discovery); agriculture (sensor- and vision-based intelligence systems, autonomous farming vehicles); transportation and logistics (self-driving cars, drone systems, sensor-based fleet management); and financial services (advanced credit decisioning).
To overcome the difficulties of entering grey markets, we’re seeing some unusual strategies:
Startups are making a global arbitrage (e.g., health care companies going to market in emerging markets, drone companies experimenting in the least regulated countries).
The “fly under the radar” strategy. Some startups are being very careful to stay on the safest side of the grey area, keep a low profile, and avoid the regulatory discussion as long as possible.
Big companies like Google, Apple, and IBM are seeking out these opportunities because they have the resources to be patient and are the most likely to be able to effect regulatory change — their ability to affect regulation is one of their advantages.
Startups are considering beefing up funding earlier than they would have, to fight inevitable legal battles and face regulatory hurdles sooner.
What’s your (business) problem?
A year ago, enterprises were struggling to make heads or tails of machine intelligence services (some of the most confusing were in the “platform” section of this landscape). When I spoke to potential enterprise customers, I often heard things like, “these companies are trying to sell me snake oil” or, “they can’t even explain to me what they do.”
The corporates wanted to know what current business problems these technologies could solve. They didn’t care about the technology itself. The machine intelligence companies, on the other hand, just wanted to talk about their algorithms and how their platform could solve hundreds of problems (this was often true, but that’s not the point!).
Two things have happened that are helping to create a more productive middle ground:
Enterprises have invested heavily in becoming “machine intelligence literate.” I’ve had roughly 100 companies reach out to get perspective on how they should think about machine intelligence. Their questions have been thoughtful, they’ve been changing their organizations to make use of these new technologies, and many different roles across the organization care about this topic (from CEOs to technical leads to product managers).
Many machine intelligence companies have figured out that they need to speak the language of solving a business problem. They are packaging solutions to specific business problems as separate products and branding them that way. They often work alongside a company to create a unique solution instead of just selling the technology itself, being one part educator and one part executor. Once businesses learn what new questions can be answered with machine intelligence, these startups may make a more traditional technology sale.
The great verticalization
I remember reading Who Says Elephants Can’t Dance and being blown away by the ability of a technology icon like IBM to risk it all. (This was one of the reasons I went to work for them out of college.) Now IBM seems poised to try another risk-it-all transformation — moving from a horizontal technology provider to directly transforming a vertical. And why shouldn’t Watson try to be a doctor or a concierge? It’s a brave attempt.
It’s not just IBM: you could probably make an entire machine intelligence landscape just of Google projects. (If anyone takes a stab, I’d love to see it!)
Your money is nice, but tell me more about your data
In the machine intelligence world, founders are selling their companies, as I suggested last year — but it’s about more than just money. I’ve heard from founders that they are only interested in an acquisition if the acquirer has the right data set to make their product work. We’re hearing things like, “I’m not taking conversations but, given our product, if X came calling it’d be hard to turn down.” “X” is most often Slack (!), Google, Facebook, Twitter in these conversations — the companies that have the data.
Until recently, there’s been one secret in machine intelligence talent:Canada!During the “AI winter,” when this technology fell out of favor in the 80s and 90s, the Canadian government was one of a few entities funding AI research. This support sustained the formidable trio of Geoffrey Hinton,Yoshua Bengio, and Yann LeCun, the godfathers of deep learning.
Canada continues to be central to the machine intelligence frontier. As an unapologetically proud Canadian, it’s been a pleasure to work with groups like AICML to commercialize advanced research, the Machine Learning Creative Destruction Lab to support startups, and to bring the machine intelligence world together at events like this one.
So what now?
Machine intelligence is even more of a story than last year, in large companies as well as startups. In the next year, the practical side of these technologies will flourish. Most new entrants will avoid generic technology solutions, and instead have a specific business purpose to which to put machine intelligence.
I can’t wait to see more combinations of the practical and eccentric. A few years ago, a company like Orbital Insight would have seemed farfetched — wait, you’re going to use satellites and computer vision algorithms to tell me what the construction growth rate is in China!? — and now it feels familiar.
Similarly, researchers are doing things that make us stop and say, “Wait, really?” They are tackling important problems we may not have imagined were possible, like creating fairy godmother drones to help the elderly, computer vision that detects the subtle signs of PTSD, autonomous surgical robots that remove cancerous lesions, and fixing airplane WiFi (just kidding, not even machine intelligence can do that).
Overall, agents will become more eloquent, autonomous systems more pervasive, machine intelligence more…intelligent. I expect more magic in the years to come.
Many thanks to those who helped me with this! Special thanks to Adam Spector, Ajay Agrawal, Angela Tran Kingyens, Beau Cronin, Chris Michel, Chris Nicholson, Dan Strickland, David Beyer, David Klein, Doug Fulop, Dror Berman, Jack Clark, James Cham, James Rattner, Jeffrey Chung, Jon Lehr, Karin Klein, Lauren Barless, Lynda Ting, Matt Turck, Mike Dauber, Morgan Polotan, Nick Adams, Pete Skomoroch, Roy Bahat, Sean Gourley, Shruti Gandhi, Zavain Dar, and Heidi Skinner (who designed this graphic).
Disclosure: Bloomberg Beta is an investor in Alation, Adatao, Aviso, BrightFunnel, Context Relevant, Deep Genomics, Diffbot, Domino Data Lab, Gigster, Graphistry, Howdy, Kaggle, Mavrx, Orbital Insight, Primer, Sapho, Textio, and Tule.
Machine Intelligence in the Real World
(this pieces was originally posted on Tech Crunch) .
I’ve been laser-focused on machine intelligence in the past few years. I’ve talked to hundreds of entrepreneurs, researchers and investors about helping machines make us smarter.
In the months since I shared my landscape of machine intelligence companies, folks keep asking me what I think of them — as if they’re all doing more or less the same thing. (I’m guessing this is how people talked about “dot coms” in 1997.)
On average, people seem most concerned about how to interact with these technologies once they are out in the wild. This post will focus on how these companies go to market, not on the methods they use.
In an attempt to explain the differences between how these companies go to market, I found myself using (admittedly colorful) nicknames. It ended up being useful, so I took a moment to spell them out in more detail so, in case you run into one or need a handy way to describe yours, you have the vernacular.
The categories aren’t airtight — this is a complex space — but this framework helps our fund (which invests in companies that make work better) be more thoughtful about how we think about and interact with machine intelligence companies.
“Panopticons” Collect A Broad Dataset
Machine intelligence starts with the data computers analyze, so the companies I call “panopticons” are assembling enormous, important new datasets. Defensible businesses tend to be global in nature. “Global” is very literal in the case of a company like Planet Labs, which has satellites physically orbiting the earth. Or it’s more metaphorical, in the case of a company like Premise, which is crowdsourcing data from many countries.
With many of these new datasets we can automatically get answers to questions we have struggled to answer before. There are massive barriers to entry because it’s difficult to amass a global dataset of significance.
However, it’s important to ask whether there is a “good enough” dataset that might provide a cheaper alternative, since data license businesses are at risk of being commoditized. Companies approaching this space should feel confident that either (1) no one else can or will collect a “good enough” alternative, or (2) they can successfully capture the intelligence layer on top of their own dataset and own the end user.
Examples include Planet Labs, Premise and Diffbot.
“Lasers” Collect A Focused Dataset
The companies I like to call “lasers” are also building new datasets, but in niches, to solve industry-specific problems with laser-like focus. Successful companies in this space provide more than just the dataset — they also must own the algorithms and user interface. They focus on narrower initial uses and must provide more value than just data to win customers.
The products immediately help users answer specific questions like, “how much should I water my crops?” or “which applicants are eligible for loans?” This category may spawn many, many companies — a hundred or more — because companies in it can produce business value right away.
With these technologies, many industries will be able to make decisions in a data-driven way for the first time. The power for good here is enormous: We’ve seen these technologies help us feed the world more efficiently, improve medical diagnostics, aid in conservation projects and provide credit to those in the world that didn’t have access to it before.
But to succeed, these companies need to find a single “killer” (meant in the benevolent way) use case to solve, and solve that problem in a way that makes the user’s life simpler, not more complex.
Examples include Tule Technologies, Enlitic, InVenture, Conservation Metrics, Red Bird, Mavrx and Watson Health.
“Alchemists” Promise To Turn Your Data Into Gold
These companies have a simple pitch: Let me work with your data, and I will return gold. Rather than creating their own datasets, they use novel algorithms to enrich and draw insights from their customers’ data. They come in three forms:
Self-service API-based solutions.
Service providers who work on top of their customers’ existing stacks.
Full-stack solutions that deliver their own hardware-optimized stacks.
Because the alchemists see across an array of data types, they’re likely to get early insight into powerful applications of machine intelligence. If they go directly to customers to solve problems in a hands-on way (i.e., with consulting services), they often become trusted partners.
But be careful. This industry is nascent, and those using an API-based approach may struggle to scale as revenue sources can only go as far as the still-small user base. Many of the self-service companies have moved toward a more hands-on model to address this problem (and those people-heavy consulting services can sometimes be harder to scale).
Examples include Nervana Systems, Context Relevant, IBM Watson, Metamind, AlchemyAPI (acquired by IBM Watson), Skymind, and Citrine.
“Gateways” Create New Use Cases From Specific Data Types
These companies allow enterprises to unlock insights from a type of data they had trouble dealing with before (e.g., image, audio, video, genomic data). They don’t collect their own data, but rather work with client data and/or a third-party data provider. Unlike the Alchemists, who tend to do analysis across an array of data types and use cases, these are specialists.
What’s most exciting here is that this is genuinely new intelligence. Enterprises have generally had this data, but they either weren’t storing it or didn’t have the ability to interpret it economically. All of that “lost” data can now be used.
Still, beware the “so what” problem. Just because we have the methods to extract new insights doesn’t make them valuable. We’ve seen companies that begin with the problem they want to solve, and others blinded by the magic of the method. The latter category struggles to get funding.
Examples include Clarifai, Gridspace, Orbital Insight, Descartes Labs, Deep Genomics and Atomwise.
“Magic Wands” Seamlessly Fix A Workflow
These are SaaS tools that make work more effective, not just by extracting insights from the data you provide but by seamlessly integrating those insights into your daily workflow, creating a level of machine intelligence assistance that feels like “magic.” They are similar to the Lasers in that they have an interface that helps the user solve a specific problem — but they tend to rely on a user’s or enterprise’s data rather than creating their own new dataset from scratch.
For example, Textio is a text editor that recommends improvements to job descriptions as you type. With it, I can go from a 40th percentile job description to a 90th percentile one in just a few minutes, all thanks to a beautifully presented machine learning algorithm.
I believe that in five years we all will be using these tools across different use cases. They make the user look like an instant expert by codifying lessons found in domain-specific data. They can aggregate intelligence and silently bake it into products. We expect this space to heat up, and can’t wait to see more Magic Wands.
The risk is that by relying on such tools, humans will lose expertise (in the same way that the autopilot created the risk that pilots’ core skills may decay). To offset this, makers of these products should create UI in a way that will actually fortify the user’s knowledge rather than replace it (e.g., educating the user during the process of making a recommendation or using a double-blind interface).
Examples include Textio, RelateIQ (acquired by Salesforce), InboxVudu, Sigopt and The Grid
“Navigators” Create Autonomous Systems For The Physical World
Machine intelligence plays a huge role in enabling autonomous systems like self-driving cars, drones and robots to augment processes in warehouses, agriculture and elderly care. This category is a mix of early stage companies and large established companies like Google, Apple, Uber and Amazon.
Such technologies give us the ability to rethink transportation and logistics entirely, especially in emerging market countries that lack robust physical infrastructure. We also can use them to complete tasks that were historically very dangerous for humans.
Before committing to this kind of technology, companies should feel confident that they can raise large amounts of capital and recruit the best minds in some of the most sought-after fields. Many of these problems require experts across varied specialties, like hardware, robotics, vision and audio. They also will have to deal with steep regulatory hurdles (e.g., self-driving car regulations).
Examples include Blue River Technologies, Airware, Clearpath Robotics, Kiva Systems (acquired by Amazon), 3DR, Skycatch, Cruise Automation and the self-driving car groups at Google, Uber, Apple and Tesla.
“Agents” Create Cyborgs And Bots To Help With Virtual Tasks
Sometimes the best way to use machine intelligence is to pair it with human intelligence. Cyborgs and bots are similar in that they help you complete tasks, but the difference is a cyborg appears as if it’s a human (it blends human and machine intelligence behind the scenes, has a proper name and attempts to interact like a person would), whereas a bot is explicitly non-human and relies on you to provide the human-level guidance to instruct it what to do.
Cyborgs most often complete complex tasks, like customer service via real-time chat or meeting scheduling via email (e.g., Clara from Clara Labs or Amy from Bots tend to help you perform basic research, complete online transactions and help your team stay on top of tasks (e.g., Howdy, the project management bot).
In both cases, this is the perfect blending of humans and machines: The computers take the transactional grunt work pieces of the task and interact with us for the higher-level decision-making and creativity.
Cyborg-based companies start as mostly manual services and, over time, become more machine-driven as technology matures. The risk is whether they can make that transition quickly enough. For both cyborgs and bots, privacy and security will be an ongoing concern, as we trust more and more of our data (e.g., calendars, email, documents, credit cards) to them.
Examples include Clara,, Facebook M, Digital Genius, Kasisto and Howdy.
“Pioneers” Are Very Smart
Some machine intelligence companies begin life as academic projects. When the teams — professors and graduate students with years of experience in the field — discover they have something marketable, they (or their universities) spin them out into companies.
Aggregating a team like that is, in itself, a viable market strategy, because there are so few people with 8-10 years of experience in this field. Their brains are so valuable that investors are willing to take the risk on the basis of the team alone — even if the business models still need some work.
In fact, there are many extremely important problems to solve that don’t line up with short-term use cases. These teams are the ones solving the problems that seem impossible, and they are among the few who can potentially make them possible!
This approach can work brilliantly if the team has a problem they are truly devoted to working on, but it is tough to keep the team together if they are banding together for the sake of solidarity and the prospect of an acqui-hire. They also need funders who are aligned with their longer-term vision.
Examples include DeepMind (acquired by Google), DNN Research (acquired by Google), Numenta, Vicarious, NNaiSense and Curious AI.
As you can see, it’s clear that machine intelligence is a very active space. There are many companies out there that may not fit into one of these categories, but these are the ones we see most often.
The obvious question for all of these categories is which are most attractive for investment? Individual startups are outliers by definition, so it’s hard to make it black and white, and we’re so excited about this space that it’s really just different degrees of optimism. That said, I’m particularly excited about the Lasers and Magic Wands, because they can turn new types of data into actionable intelligence right now, and because they can take advantage of well-worn SaaS techniques.
More on these to come. Stay tuned.
Disclosure: Bloomberg Beta is an investor in Diffbot, Tule Technologies, Mavrx, Gridspace, Orbital Insight, Textio, Howdy and several other machine intelligence companies that are not mentioned in this article.
The Current State of Machine Intelligence
I spent the last three months learning about every artificial intelligence, machine learning, or data related startup I could find — my current list has 2,529 of them to be exact. Yes, I should find better things to do with my evenings and weekends but until then…
Why do this?
A few years ago, investors and startups were chasing “big data” (I helped put together a landscape on that industry). Now we’re seeing a similar explosion of companies calling themselves artificial intelligence, machine learning, or somesuch — collectively I call these “machine intelligence” (I’ll get into the definitions in a second). Our fund, Bloomberg Beta, which is focused on the future of work, has been investing in these approaches. I created this landscape to start to put startups into context. I’m a thesis-oriented investor and it’s much easier to identify crowded areas and see white space once the landscape has some sort of taxonomy.
What is “machine intelligence,” anyway?
I mean “machine intelligence” as a unifying term for what others call machine learning and artificial intelligence. (Some others have used the term before, without quite describing it or understanding how laden this field has been with debates over descriptions.)
I would have preferred to avoid a different label but when I tried either “artificial intelligence” or “machine learning” both proved to too narrow: when I called it “artificial intelligence” too many people were distracted by whether certain companies were “true AI,” and when I called it “machine learning,” many thought I wasn’t doing justice to the more “AI-esque” like the various flavors of deep learning. People have immediately grasped “machine intelligence” so here we are. ☺
Computers are learning to think, read, and write. They’re also picking up human sensory function, with the ability to see and hear (arguably to touch, taste, and smell, though those have been of a lesser focus). Machine intelligence technologies cut across a vast array of problem types (from classification and clustering to natural language processing and computer vision) and methods (from support vector machines to deep belief networks). All of these technologies are reflected on this landscape.
What this landscape doesn’t include, however important, is “big data” technologies. Some have used this term interchangeably with machine learning and artificial intelligence, but I want to focus on the intelligence methods rather than data, storage, and computation pieces of the puzzle for this landscape (though of course data technologies enable machine intelligence).
Which companies are on the landscape?
I considered thousands of companies, so while the chart is crowded it’s still a small subset of the overall ecosystem. “Admissions rates” to the chart were fairly in line with those of Yale or Harvard, and perhaps equally arbitrary. ☺
I tried to pick companies that used machine intelligence methods as a defining part of their technology. Many of these companies clearly belong in multiple areas but for the sake of simplicity I tried to keep companies in their primary area and categorized them by the language they use to describe themselves (instead of quibbling over whether a company used “NLP” accurately in its self-description).
If you want to get a sense for innovations at the heart of machine intelligence, focus on the core technologies layer. Some of these companies have APIs that power other applications, some sell their platforms directly into enterprise, some are at the stage of cryptic demos, and some are so stealthy that all we have is a few sentences to describe them.
The most exciting part for me was seeing how much is happening the the application space. These companies separated nicely into those that reinvent the enterprise, industries, and ourselves.
If I were looking to build a company right now, I’d use this landscape to help figure out what core and supporting technologies I could package into a novel industry application. Everyone likes solving the sexy problems but there are an incredible amount of ‘unsexy’ industry use cases that have massive market opportunities and powerful enabling technologies that are begging to be used for creative applications (e.g., Watson Developer Cloud, AlchemyAPI).
Reflections on the landscape:
We’ve seen a few great articles recently outlining why machine intelligence is experiencing a resurgence, documenting the enabling factors of this resurgence. (Kevin Kelly, for example chalks it up to cheap parallel computing, large datasets, and better algorithms.) I focused on understanding the ecosystem on a company-by-company level and drawing implications from that.
Yes, it’s true, machine intelligence is transforming the enterprise, industries and humans alike.
On a high level it’s easy to understand why machine intelligence is important, but it wasn’t until I laid out what many of these companies are actually doing that I started to grok how much it is already transforming everything around us. As Kevin Kelly more provocatively put it, “the business plans of the next 10,000 startups are easy to forecast: Take X and add AI”. In many cases you don’t even need the X — machine intelligence will certainly transform existing industries, but will also likely create entirely new ones.
Machine intelligence is enabling applications we already expect like automated assistants (Siri), adorable robots (Jibo), and identifying people in images (like the highly effective but unfortunately named DeepFace). However, it’s also doing the unexpected: protecting children from sex trafficking, reducing the chemical content in the lettuce we eat, helping us buy shoes online that fit our feet precisely, and destroying 80’s classic video games.
Many companies will be acquired.
I was surprised to find that over 10% of the eligible (non-public) companies on the slide have been acquired. It was in stark contrast to big data landscape we created, which had very few acquisitions at the time.No jaw will drop when I reveal that Google is the number one acquirer, though there were more than 15 different acquirers just for the companies on this chart. My guess is that by the end of 2015 almost another 10% will be acquired. For thoughts on which specific ones will get snapped up in the next year you’ll have to twist my arm…
Big companies have a disproportionate advantage, especially those that build consumer products.
The giants in search (Google, Baidu), social networks (Facebook, LinkedIn, Pinterest), content (Netflix, Yahoo!), mobile (Apple) and e-commerce (Amazon) are in an incredible position. They have massive datasets and constant consumer interactions that enable tight feedback loops for their algorithms (and these factors combine to create powerful network effects) — and they have the most to gain from the low hanging fruit that machine intelligence bears.
Best-in-class personalization and recommendation algorithms have enabled these companies’ success (it’s both impressive and disconcerting that Facebook recommends you add the person you had a crush on in college and Netflix tees up that perfect guilty pleasure sitcom). Now they are all competing in a new battlefield: the move to mobile. Winning mobile will require lots of machine intelligence: state of the art natural language interfaces (like Apple’s Siri), visual search (like Amazon’s “FireFly”), and dynamic question answering technology that tells you the answer instead of providing a menu of links (all of the search companies are wrestling with this).Large enterprise companies (IBM and Microsoft) have also made incredible strides in the field, though they don’t have the same human-facing requirements so are focusing their attention more on knowledge representation tasks on large industry datasets, like IBM Watson’s application to assist doctors with diagnoses.
The talent’s in the New (AI)vy League.
In the last 20 years, most of the best minds in machine intelligence (especially the ‘hardcore AI’ types) worked in academia. They developed new machine intelligence methods, but there were few real world applications that could drive business value.
Now that real world applications of more complex machine intelligence methods like deep belief nets and hierarchical neural networks are starting to solve real world problems, we’re seeing academic talent move to corporate settings. Facebook recruited NYU professors Yann LeCun and Rob Fergus to their AI Lab, Google hired University of Toronto’s Geoffrey Hinton, Baidu wooed Andrew Ng. It’s important to note that they all still give back significantly to the academic community (one of LeCun’s lab mandates is to work on core research to give back to the community, Hinton spends half of his time teaching, Ng has made machine intelligence more accessible through Coursera) but it is clear that a lot of the intellectual horsepower is moving away from academia.
For aspiring minds in the space, these corporate labs not only offer lucrative salaries and access to the “godfathers” of the industry, but, the most important ingredient: data. These labs offer talent access to datasets they could never get otherwise (the ImageNet dataset is fantastic, but can’t compare to what Facebook, Google, and Baidu have in house).
As a result, we’ll likely see corporations become the home of many of the most important innovations in machine intelligence and recruit many of the graduate students and postdocs that would have otherwise stayed in academia.
There will be a peace dividend.
Big companies have an inherent advantage and it’s likely that the ones who will win the machine intelligence race will be even more powerful than they are today. However, the good news for the rest of the world is that the core technology they develop will rapidly spill into other areas, both via departing talent and published research.
Similar to the big data revolution, which was sparked by the release of Google’s BigTable and BigQuery papers, we will see corporations release equally groundbreaking new technologies into the community. Those innovations will be adapted to new industries and use cases that the Googles of the world don’t have the DNA or desire to tackle.
Opportunities for entrepreneurs:
“My company does deep learning for X”
Few words will make you more popular in 2015. That is, if you can credibly say them.Deep learning is a particularly popular method in the machine intelligence field that has been getting a lot of attention. Google, Facebook, and Baidu have achieved excellent results with the method for vision and language based tasks and startups like Enlitic have shown promising results as well.
Yes, it will be an overused buzzword with excitement ahead of results and business models, but unlike the hundreds of companies that say they do “big data”, it’s much easier to cut to the chase in terms of verifying credibility here if you’re paying attention.The most exciting part about the deep learning method is that when applied with the appropriate levels of care and feeding, it can replace some of the intuition that comes from domain expertise with automatically-learned features. The hope is that, in many cases, it will allow us to fundamentally rethink what a best-in-class solution is.
As an investor who is curious about the quirkier applications of data and machine intelligence, I can’t wait to see what creative problems deep learning practitioners try to solve. I completely agree with Jeff Hawkins when he says a lot of the killer applications of these types of technologies will sneak up on us. I fully intend to keep an open mind.
“Acquihire as a business model”
People say that data scientists are unicorns in short supply. The talent crunch in machine intelligence will make it look like we had a glut of data scientists. In the data field, many people had industry experience over the past decade. Most hardcore machine intelligence work has only been in academia. We won’t be able to grow this talent overnight.
This shortage of talent is a boon for founders who actually understand machine intelligence. A lot of companies in the space will get seed funding because there are early signs that the acquihire price for a machine intelligence expert is north of 5x that of a normal technical acquihire (take, for example Deep Mind, where price per technical head was somewhere between $5–10M, if we choose to consider it in the acquihire category). I’ve had multiple friends ask me, only semi-jokingly, “Shivon, should I just round up all of my smartest friends in the AI world and call it a company?” To be honest, I’m not sure what to tell them. (At Bloomberg Beta, we’d rather back companies building for the long term, but that doesn’t mean this won’t be a lucrative strategy for many enterprising founders.)
A good demo is disproportionately valuable in machine intelligence
I remember watching Watson play Jeopardy. When it struggled at the beginning I felt really sad for it. When it started trouncing its competitors I remember cheering it on as if it were the Toronto Maple Leafs in the Stanley Cup finals (disclaimers: (1) I was an IBMer at the time so was biased towards my team (2) the Maple Leafs have not made the finals during my lifetime — yet — so that was purely a hypothetical).
Why do these awe-inspiring demos matter? The last wave of technology companies to IPO didn’t have demos that most of us would watch, so why should machine intelligence companies? The last wave of companies were very computer-like: database companies, enterprise applications, and the like. Sure, I’d like to see a 10x more performant database, but most people wouldn’t care. Machine intelligence wins and loses on demos because 1) the technology is very human, enough to inspire shock and awe, 2) business models tend to take a while to form, so they need more funding for longer period of time to get them there, 3) they are fantastic acquisition bait.Watson beat the world’s best humans at trivia, even if it thought Toronto was a US city. DeepMind blew people away by beating video games. Vicarious took on CAPTCHA. There are a few companies still in stealth that promise to impress beyond that, and I can’t wait to see if they get there.
Demo or not, I’d love to talk to anyone using machine intelligence to change the world. There’s no industry too unsexy, no problem too geeky. I’d love to be there to help so don’t be shy.I hope this landscape chart sparks a conversation. The goal to is make this a living document and I want to know if there are companies or categories missing. I welcome feedback and would like to put together a dynamic visualization where I can add more companies and dimensions to the data (methods used, data types, end users, investment to date, location, etc.) so that folks can interact with it to better explore the space.
Questions and comments: Please email me. Thank you to Andrew Paprocki, Aria Haghighi, Beau Cronin, Ben Lorica, Doug Fulop, David Andrzejewski, Eric Berlow, Eric Jonas, Gary Kazantsev, Gideon Mann, Greg Smithies, Heidi Skinner, Jack Clark, Jon Lehr, Kurt Keutzer, Lauren Barless, Pete Skomoroch, Pete Warden, Roger Magoulas, Sean Gourley, Stephen Purpura, Wes McKinney, Zach Bogue, the Quid team, and the Bloomberg Beta team for your ever-helpful perspectives!
Disclaimer: Bloomberg Beta is an investor in Adatao, Alation, Aviso, Context Relevant, Mavrx, Newsle, Orbital Insights, Pop Up Archive, and two others on the chart that are still undisclosed. We’re also investors in a few other machine intelligence companies that aren’t focusing on areas that were a fit for this landscape, so we left them off.
For the full resolution version of the landscape please click here.

The Competitive Landscape for Machine Intelligence

By Hugo Angel,

Three years ago, our venture capital firm began studying startups in artificial intelligence. AI felt misunderstood, burdened by expectations from science fiction, and so for the last two years we’ve tried to capture the most-important startups in the space in a one-page landscape. (We prefer the more neutral term “machine intelligence” over “AI.”)
In past years, we heard mostly from startup founders and academics — people who pay attention to early, far-reaching trends in technology. But this year was different. This year we’ve heard more from Fortune 500 executives with questions about machine intelligence than from startup founders.
These executives are asking themselves what to do. Over the past year, machine intelligence has exploded, with $5 billion in venture investment, a few big acquisitions, and hundreds of thousands of people reading our earlier research. As with the internet in the 1990s, executives are realizing that this new technology could change everything, but nobody knows exactly how or when.
If this year’s landscape shows anything, it’s that the impact of machine intelligence is already here. Almost every industry is already being affected, from agriculture to transportation. Every employee can use machine intelligence to become more productive with tools that exist today. Companies have at their disposal, for the first time, the full set of building blocks to begin embedding machine intelligence in their businesses.
And unlike with the internet, where latecomers often bested those who were first to market, the companies that get started immediately with machine intelligence could enjoy a lasting advantage.
So what should the Fortune 500 and other companies be doing to get started?
Make Talent More Productive
One way to immediately begin getting the value of machine intelligence is to support your talent with readily available machine intelligence productivity tools. Some of the earliest wins have been productivity tools tuned to specific areas of knowledge work — what we call “Enterprise Functions” in our landscape. With these tools, every employee can get some of the powers previously available only to CEOs.
These tools can aid with monitoring and predicting (e.g., companies like Clari forecasting client-by-client sales to help prioritize deals) and with coaching and training (Textio’s* predictive text-editing platform to help employees write more-effective documents).
Find Entirely New Sources of Data
The next step is to use machine intelligence to realize value from new sources of data, which we highlight in the “Enterprise Intelligence” section of the landscape. These new sources are now accessible because machine intelligence software can rapidly review enormous amounts of data in a way that would have been too difficult and expensive for people to do.
Imagine if you could afford to have someone listen to every audio recording of your salespeople and predict their performance, or have a team look at every satellite image taken from space and determine what macroeconomic indicators could be gleaned from them. These data sources might already be owned by your company (e.g., transcripts of customer service conversations or sensor data predicting outages and required maintenance), or they might be newly available in the outside world (data on the open web providing competitive information).
Rethink How You Build Software
Let’s say you’ve tried some new productivity tools and started to mine new sources of data for insight. The next frontier in capturing machine intelligence’s value is building a lasting competitive advantage based on this new kind of software.
But machine intelligence is not just about better software; it requires entirely new processes and a different mindset. Machine intelligence is a new discipline for managers to learn, one that demands a new class of software talent and a new organizational structure.
Most IT groups think in terms of applications and data. New machine intelligence IT groups will think about applications, data, and models. Think of software as the combination of code, data, and a model. “Model” here means business rules, like rules for approving loans or adjusting power consumption in data centers. In traditional software, programmers created these rules by hand. Today machine intelligence can use data and new algorithms to generate a model too complex for any human programmer to write.
With traditional software, the model changes only when programmers explicitly rewrite it. With machine intelligence, companies can create models that evolve much more regularly, allowing you to build a lasting advantage that strengthens over time as the model “learns.”
Think of these models as narrowly focused employees with great memories and not-so-great social skills — idiot savants. They can predict how best to grow the business, make customers happier, or cut costs. But they’ll often fail miserably if you try to apply them to something new, or, worse, they may degrade invisibly as your business and data change.
All of this means that the discipline of creating machine intelligence software differs from traditional software, and companies need to staff accordingly. Luckily, though finding the right talent may be hard, the tools that developers need to build this software is readily available.
How robotics and machine learning are changing business.

For the first time, there is a maturing “Stack” (see our landscape) of building blocks that companies can use to practice the new discipline of machine intelligence. Many of these tools are available as free, open-source libraries from technology companies such as

  • Google (TensorFlow),
  • Microsoft (CNTK), or
  • Amazon (DSSTNE).

Others make it easier for data scientists to collaborate(see “Data Science”) and manage machine intelligence models (“Machine Learning”).

If your CEO is struggling to answer the question of how machine intelligence will change your industry, take a look at the range of markets in our landscape. The startups in these sections give a sense of how different industries may be altered. Machine intelligence’s first useful applications in an industry tend to use data that previously had lain dormant. Health care is a prime example: We’re seeing predictive models that run on patient data and computer vision that diagnoses disease from medical images and gleans lifesaving insights from genomic data. Next up will be finance, transportation, and agriculture because of the volume of data available and their sheer economic value.
Your company will still need to decide how much to trust these models and how much power to grant them in making business decisions. In some cases the risk of an error will be too great to justify the speed and new capabilities. Your company will also need to decide how often and with how much oversight to revise your models. But the companies that decide to invest in the right models and successfully embed machine intelligence in their organization will improve by default as their models learn from experience.
Economists have long wondered why the so-called computing revolution has failed to deliver productivity gains. Machine intelligence will finally realize computing’s promise. The C-suites and boardrooms that recognize that fact first — and transform their ways of working accordingly — will outrun and outlast their competitors.
*The authors’ fund has invested in this company.
Shivon Zilis is a partner and founding member of Bloomberg Beta, which invests heavily in the future of work. She focuses on early-stage data and machine intelligence investments.
James Cham is a Partner at Bloomberg Beta where he invests in data-centric and machine learning-related companies.
%d bloggers like this: