Category: Image Recognition


Google Unveils Neural Network with “Superhuman” Ability to Determine the Location of Almost Any Image

By Hugo Angel,

Guessing the location of a randomly chosen Street View image is hard, even for well-traveled humans. But Google’s latest artificial-intelligence machine manages it with relative ease.
Here’s a tricky task. Pick a photograph from the Web at random. Now try to work out where it was taken using only the image itself. If the image shows a famous building or landmark, such as the Eiffel Tower or Niagara Falls, the task is straightforward. But the job becomes significantly harder when the image lacks specific location cues or is taken indoors or shows a pet or food or some other detail.Nevertheless, humans are surprisingly good at this task. To help, they bring to bear all kinds of knowledge about the world such as the type and language of signs on display, the types of vegetation, architectural styles, the direction of traffic, and so on. Humans spend a lifetime picking up these kinds of geolocation cues.So it’s easy to think that machines would struggle with this task. And indeed, they have.

Today, that changes thanks to the work of Tobias Weyand, a computer vision specialist at Google, and a couple of pals. These guys have trained a deep-learning machine to work out the location of almost any photo using only the pixels it contains.

Their new machine significantly outperforms humans and can even use a clever trick to determine the location of indoor images and pictures of specific things such as pets, food, and so on that have no location cues.

Their approach is straightforward, at least in the world of machine learning.

  • Weyand and co begin by dividing the world into a grid consisting of over 26,000 squares of varying size that depend on the number of images taken in that location.
    So big cities, which are the subjects of many images, have a more fine-grained grid structure than more remote regions where photographs are less common. Indeed, the Google team ignored areas like oceans and the polar regions, where few photographs have been taken.

 

  • Next, the team created a database of geolocated images from the Web and used the location data to determine the grid square in which each image was taken. This data set is huge, consisting of 126 million images along with their accompanying Exif location data.
  • Weyand and co used 91 million of these images to teach a powerful neural network to work out the grid location using only the image itself. Their idea is to input an image into this neural net and get as the output a particular grid location or a set of likely candidates. 
  • They then validated the neural network using the remaining 34 million images in the data set.
  • Finally they tested the network—which they call PlaNet—in a number of different ways to see how well it works.

The results make for interesting reading. To measure the accuracy of their machine, they fed it 2.3 million geotagged images from Flickr to see whether it could correctly determine their location. “PlaNet is able to localize 3.6 percent of the images at street-level accuracy and 10.1 percent at city-level accuracy,” say Weyand and co. What’s more, the machine determines the country of origin in a further 28.4 percent of the photos and the continent in 48.0 percent of them.

That’s pretty good. But to show just how good, Weyand and co put PlaNet through its paces in a test against 10 well-traveled humans. For the test, they used an online game that presents a player with a random view taken from Google Street View and asks him or her to pinpoint its location on a map of the world.

Anyone can play at www.geoguessr.com. Give it a try—it’s a lot of fun and more tricky than it sounds.

GeoGuesser Screen Capture Example

Needless to say, PlaNet trounced the humans. “In total, PlaNet won 28 of the 50 rounds with a median localization error of 1131.7 km, while the median human localization error was 2320.75 km,” say Weyand and co. “[This] small-scale experiment shows that PlaNet reaches superhuman performance at the task of geolocating Street View scenes.

An interesting question is how PlaNet performs so well without being able to use the cues that humans rely on, such as vegetation, architectural style, and so on. But Weyand and co say they know why: “We think PlaNet has an advantage over humans because it has seen many more places than any human can ever visit and has learned subtle cues of different scenes that are even hard for a well-traveled human to distinguish.

They go further and use the machine to locate images that do not have location cues, such as those taken indoors or of specific items. This is possible when images are part of albums that have all been taken at the same place. The machine simply looks through other images in the album to work out where they were taken and assumes the more specific image was taken in the same place.

That’s impressive work that shows deep neural nets flexing their muscles once again. Perhaps more impressive still is that the model uses a relatively small amount of memory unlike other approaches that use gigabytes of the stuff. “Our model uses only 377 MB, which even fits into the memory of a smartphone,” say Weyand and co.

That’s a tantalizing idea—the power of a superhuman neural network on a smartphone. It surely won’t be long now!

Ref: arxiv.org/abs/1602.05314 : PlaNet—Photo Geolocation with Convolutional Neural Networks

ORIGINAL: TechnoplogyReview
by Emerging Technology from the arXiv
February 24, 2016

Xnor.ai – Bringing Deep Learning AI to the Devices at the Edge of the Network

By Hugo Angel,

 

Photo  – The Xnor.ai Team

 

Today we announced our funding of Xnor.ai. We are excited to be working with Ali Farhadi, Mohammad Rastegari and their team on this new company. We are also looking forward to working with Paul Allen’s team at the Allen Institute for AI and in particular our good friend and CEO of AI2, Dr. Oren Etzioni who is joining the board of Xnor.ai. Machine Learning and AI have been a key investment theme for us for the past several years and bringing deep learning capabilities such as image and speech recognition to small devices is a huge challenge.

Mohammad and Ali and their team have developed a platform that enables low resource devices to perform tasks that usually require large farms of GPUs in cloud environments. This, we believe, has the opportunity to change how we think about certain types of deep learning use cases as they get extended from the core to the edge. Image and voice recognition are great examples. These are broad areas of use cases out in the world – usually with a mobile device, but right now they require the device to be connected to the internet so those large farms of GPUs can process all the information your device is capturing/sending and having the core transmit back the answer. If you could do that on your phone (while preserving battery life) it opens up a new world of options.

It is just these kinds of inventions that put the greater Seattle area at the center of the revolution in machine learning and AI that is upon us. Xnor.ai came out of the outstanding work the team was doing at the Allen Institute for Artificial Intelligence (AI2.) and Ali is a professor at the University of Washington. Between Microsoft, Amazon, the University of Washington and research institutes such as AI2, our region is leading the way as new types of intelligent applications takes shape. Madrona is energized to play our role as company builder and support for these amazing inventors and founders.

ORIGINAL: Madrona
By Matt McIlwain

AI acceleration startup Xnor.ai collects $2.6M in funding

I was excited by the promise of Xnor.ai and its technique that drastically reduces the computing power necessary to perform complex operations like computer vision. Seems I wasn’t the only one: the company, just officially spun off from the Allen Institute for AI (AI2), has attracted $2.6 million in seed funding from its parent company and Madrona Venture Group.

The specifics of the product and process you can learn about in detail in my previous post, but the gist is this: machine learning models for things like object and speech recognition are notoriously computation-heavy, making them difficult to implement on smaller, less powerful devices. Xnor.ai’s researchers use a bit of mathematical trickery to reduce that computing load by an order of magnitude or two — something it’s easy to see the benefit of.

Related Articles

McIlwain will join AI2 CEO Oren Etzioni on the board of Xnor.ai; Ali Farhadi, who led the original project, will be the company’s CEO, and Mohammad Rastegari is CTO.
The new company aims to facilitate commercial applications of its technology (it isn’t quite plug and play yet), but the research that led up to it is, like other AI2 work, open source.

 

AI2 Repository:  https://github.com/allenai/

ORIGINAL: TechCrunch
by
2017/02/03

Top 10 Hot Artificial Intelligence (AI) Technologies

By Hugo Angel,

forrester-ai-technologiesThe market for artificial intelligence (AI) technologies is flourishing. Beyond the hype and the heightened media attention, the numerous startups and the internet giants racing to acquire them, there is a significant increase in investment and adoption by enterprises. A Narrative Science survey found last year that 38% of enterprises are already using AI, growing to 62% by 2018. Forrester Research predicted a greater than 300% increase in investment in artificial intelligence in 2017 compared with 2016. IDC estimated that the AI market will grow from $8 billion in 2016 to more than $47 billion in 2020.

Coined in 1955 to describe a new computer science sub-discipline, “Artificial Intelligence” today includes a variety of technologies and tools, some time-tested, others relatively new. To help make sense of what’s hot and what’s not, Forrester just published a TechRadar report on Artificial Intelligence (for application development professionals), a detailed analysis of 13 technologies enterprises should consider adopting to support human decision-making.

Based on Forrester’s analysis, here’s my list of the 10 hottest AI technologies:

  1. Natural Language Generation: Producing text from computer data. Currently used in customer service, report generation, and summarizing business intelligence insights. Sample vendors:
    • Attivio,
    • Automated Insights,
    • Cambridge Semantics,
    • Digital Reasoning,
    • Lucidworks,
    • Narrative Science,
    • SAS,
    • Yseop.
  2. Speech Recognition: Transcribe and transform human speech into format useful for computer applications. Currently used in interactive voice response systems and mobile applications. Sample vendors:
    • NICE,
    • Nuance Communications,
    • OpenText,
    • Verint Systems.
  3. Virtual Agents: “The current darling of the media,” says Forrester (I believe they refer to my evolving relationships with Alexa), from simple chatbots to advanced systems that can network with humans. Currently used in customer service and support and as a smart home manager. Sample vendors:
    • Amazon,
    • Apple,
    • Artificial Solutions,
    • Assist AI,
    • Creative Virtual,
    • Google,
    • IBM,
    • IPsoft,
    • Microsoft,
    • Satisfi.
  4. Machine Learning Platforms: Providing algorithms, APIs, development and training toolkits, data, as well as computing power to design, train, and deploy models into applications, processes, and other machines. Currently used in a wide range of enterprise applications, mostly `involving prediction or classification. Sample vendors:
    • Amazon,
    • Fractal Analytics,
    • Google,
    • H2O.ai,
    • Microsoft,
    • SAS,
    • Skytree.
  5. AI-optimized Hardware: Graphics processing units (GPU) and appliances specifically designed and architected to efficiently run AI-oriented computational jobs. Currently primarily making a difference in deep learning applications. Sample vendors:
    • Alluviate,
    • Cray,
    • Google,
    • IBM,
    • Intel,
    • Nvidia.
  6. Decision Management: Engines that insert rules and logic into AI systems and used for initial setup/training and ongoing maintenance and tuning. A mature technology, it is used in a wide variety of enterprise applications, assisting in or performing automated decision-making. Sample vendors:
    • Advanced Systems Concepts,
    • Informatica,
    • Maana,
    • Pegasystems,
    • UiPath.
  7. Deep Learning Platforms: A special type of machine learning consisting of artificial neural networks with multiple abstraction layers. Currently primarily used in pattern recognition and classification applications supported by very large data sets. Sample vendors:
    • Deep Instinct,
    • Ersatz Labs,
    • Fluid AI,
    • MathWorks,
    • Peltarion,
    • Saffron Technology,
    • Sentient Technologies.
  8. Biometrics: Enable more natural interactions between humans and machines, including but not limited to image and touch recognition, speech, and body language. Currently used primarily in market research. Sample vendors:
    • 3VR,
    • Affectiva,
    • Agnitio,
    • FaceFirst,
    • Sensory,
    • Synqera,
    • Tahzoo.
  9. Robotic Process Automation: Using scripts and other methods to automate human action to support efficient business processes. Currently used where it’s too expensive or inefficient for humans to execute a task or a process. Sample vendors:
    • Advanced Systems Concepts,
    • Automation Anywhere,
    • Blue Prism,
    • UiPath,
    • WorkFusion.
  10. Text Analytics and NLP: Natural language processing (NLP) uses and supports text analytics by facilitating the understanding of sentence structure and meaning, sentiment, and intent through statistical and machine learning methods. Currently used in fraud detection and security, a wide range of automated assistants, and applications for mining unstructured data. Sample vendors:
    • Basis Technology,
    • Coveo,
    • Expert System,
    • Indico,
    • Knime,
    • Lexalytics,
    • Linguamatics,
    • Mindbreeze,
    • Sinequa,
    • Stratifyd,
    • Synapsify.

There are certainly many business benefits gained from AI technologies today, but according to a survey Forrester conducted last year, there are also obstacles to AI adoption as expressed by companies with no plans of investing in AI:

There is no defined business case 42%
Not clear what AI can be used for 39%
Don’t have the required skills 33%
Need first to invest in modernizing data mgt platform 29%
Don’t have the budget 23%
Not certain what is needed for implementing an AI system 19%
AI systems are not proven 14%
Do not have the right processes or governance 13%
AI is a lot of hype with little substance 11%
Don’t own or have access to the required data 8%
Not sure what AI means 3%
Once enterprises overcome these obstacles, Forrester concludes, they stand to gain from AI driving accelerated transformation in customer-facing applications and developing an interconnected web of enterprise intelligence.

Follow me on Twitter @GilPress or Facebook or Google+

Show and Tell: image captioning open sourced in TensorFlow

By Hugo Angel,

 In 2014, research scientists on the Google Brain team trained a machine learning system to automatically produce captions that accurately describe images. Further development of that system led to its success in the Microsoft COCO 2015 image captioning challenge, a competition to compare the best algorithms for computing accurate image captions, where it tied for first place.
Today, we’re making the latest version of our image captioning system available as an open source model in TensorFlow.
This release contains significant improvements to the computer vision component of the captioning system, is much faster to train, and produces more detailed and accurate descriptions compared to the original system. These improvements are outlined and analyzed in the paper Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge, published in IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatically captioned by our system.
So what’s new? 
Our 2014 system used the Inception V1 image classification model to initialize the image encoder, which
produces the encodings that are useful for recognizing different objects in the images. This was the best image model available at the time, achieving 89.6% top-5 accuracy on the benchmark ImageNet 2012 image classification task. We replaced this in 2015 with the newer Inception V2 image classification model, which achieves 91.8% accuracy on the same task.The improved vision component gave our captioning system an accuracy boost of 2 points in the BLEU-4 metric (which is commonly used in machine translation to evaluate the quality of generated sentences) and was an important factor of its success in the captioning challenge.Today’s code release initializes the image encoder using the Inception V3 model, which achieves 93.9% accuracy on the ImageNet classification task. Initializing the image encoder with a better vision model gives the image captioning system a better ability to recognize different objects in the images, allowing it to generate more detailed and accurate descriptions. This gives an additional 2 points of improvement in the BLEU-4 metric over the system used in the captioning challenge.Another key improvement to the vision component comes from fine-tuning the image model. This step addresses the problem that the image encoder is initialized by a model trained to classify objects in images, whereas the goal of the captioning system is to describe the objects in images using the encodings produced by the image model.  For example, an image classification model will tell you that a dog, grass and a frisbee are in the image, but a natural description should also tell you the color of the grass and how the dog relates to the frisbee.  In the fine-tuning phase, the captioning system is improved by jointly training its vision and language components on human generated captions. This allows the captioning system to transfer information from the image that is specifically useful for generating descriptive captions, but which was not necessary for classifying objects. In particular,  after fine-tuning it becomes better at correctly describing the colors of objects. Importantly, the fine-tuning phase must occur after the language component has already learned to generate captions – otherwise, the noisiness of the randomly initialized language component causes irreversible corruption to the vision component. For more details, read the full paper here.
Left: the better image model allows the captioning model to generate more detailed and accurate descriptions. Right: after fine-tuning the image model, the image captioning system is more likely to describe the colors of objects correctly.
Until recently our image captioning system was implemented in the DistBelief software framework. The TensorFlow implementation released today achieves the same level of accuracy with significantly faster performance: time per training step
is just 0.7 seconds in TensorFlow compared to 3 seconds in DistBelief on an Nvidia K20 GPU, meaning that total training time is just 25% of the time previously required.
A natural question is whether our captioning system can generate novel descriptions of previously unseen contexts and interactions. The system is trained by showing it hundreds of thousands of images that were captioned manually by humans, and it often re-uses human captions when presented with scenes similar to what it’s seen before.
When the model is presented with scenes similar to what it’s seen before, it will often re-use human generated captions.
So does it really understand the objects and their interactions in each image? Or does it always regurgitate descriptions from the training data? Excitingly, our model does indeed develop the ability to generate accurate new captions when presented with completely new scenes, indicating a deeper understanding of the objects and context in the images. Moreover, it learns how to express that knowledge in natural-sounding English phrases despite receiving no additional language training other than reading the human captions.
 

Our model generates a completely new caption using concepts learned from similar scenes in the training set
We hope that sharing this model in TensorFlow will help push forward image captioning research and applications, and will also
allow interested people to learn and have fun. To get started training your own image captioning system, and for more details on the neural network architecture, navigate to the model’s home-page here. While our system uses the Inception V3 image classification model, you could even try training our system with the recently released Inception-ResNet-v2 model to see if it can do even better!

ORIGINAL: Google Blog

by Chris Shallue, Software Engineer, Google Brain Team
September 22, 2016

How a Japanese cucumber farmer is using deep learning and TensorFlow.

By Hugo Angel,

by Kaz Sato, Developer Advocate, Google Cloud Platform
August 31, 2016
It’s not hyperbole to say that use cases for machine learning and deep learning are only limited by our imaginations. About one year ago, a former embedded systems designer from the Japanese automobile industry named Makoto Koike started helping out at his parents’ cucumber farm, and was amazed by the amount of work it takes to sort cucumbers by size, shape, color and other attributes.
Makoto’s father is very proud of his thorny cucumber, for instance, having dedicated his life to delivering fresh and crispy cucumbers, with many prickles still on them. Straight and thick cucumbers with a vivid color and lots of prickles are considered premium grade and command much higher prices on the market.
But Makoto learned very quickly that sorting cucumbers is as hard and tricky as actually growing them.Each cucumber has different color, shape, quality and freshness,” Makoto says.
Cucumbers from retail stores
Cucumbers from Makoto’s farm
In Japan, each farm has its own classification standard and there’s no industry standard. At Makoto’s farm, they sort them into nine different classes, and his mother sorts them all herself — spending up to eight hours per day at peak harvesting times.
The sorting work is not an easy task to learn. You have to look at not only the size and thickness, but also the color, texture, small scratches, whether or not they are crooked and whether they have prickles. It takes months to learn the system and you can’t just hire part-time workers during the busiest period. I myself only recently learned to sort cucumbers well,” Makoto said.
Distorted or crooked cucumbers are ranked as low-quality product
There are also some automatic sorters on the market, but they have limitations in terms of performance and cost, and small farms don’t tend to use them.
Makoto doesn’t think sorting is an essential task for cucumber farmers. “Farmers want to focus and spend their time on growing delicious vegetables. I’d like to automate the sorting tasks before taking the farm business over from my parents.
Makoto Koike, center, with his parents at the family cucumber farm
Makoto Koike, family cucumber farm
The many uses of deep learning
Makoto first got the idea to explore machine learning for sorting cucumbers from a completely different use case: Google AlphaGo competing with the world’s top professional Go player.
When I saw the Google’s AlphaGo, I realized something really serious is happening here,” said Makoto. “That was the trigger for me to start developing the cucumber sorter with deep learning technology.
Using deep learning for image recognition allows a computer to learn from a training data set what the important “features” of the images are. By using a hierarchy of numerous artificial neurons, deep learning can automatically classify images with a high degree of accuracy. Thus, neural networks can recognize different species of cats, or models of cars or airplanes from images. Sometimes neural networks can exceed the performance of the human eye for certain applications. (For more information, check out my previous blog post Understanding neural networks with TensorFlow Playground.)

TensorFlow democratizes the power of deep learning
But can computers really learn mom’s art of cucumber sorting? Makoto set out to see whether he could use deep learning technology for sorting using Google’s open source machine learning library, TensorFlow.
Google had just open sourced TensorFlow, so I started trying it out with images of my cucumbers,” Makoto said. “This was the first time I tried out machine learning or deep learning technology, and right away got much higher accuracy than I expected. That gave me the confidence that it could solve my problem.
With TensorFlow, you don’t need to be knowledgeable about the advanced math models and optimization algorithms needed to implement deep neural networks. Just download the sample code and read the tutorials and you can get started in no time. The library lowers the barrier to entry for machine learning significantly, and since Google open-sourced TensorFlow last November, many “non ML” engineers have started playing with the technology with their own datasets and applications.

Cucumber sorting system design
Here’s a systems diagram of the cucumber sorter that Makoto built. The system uses Raspberry Pi 3 as the main controller to take images of the cucumbers with a camera, and 

  • in a first phase, runs a small-scale neural network on TensorFlow to detect whether or not the image is of a cucumber
  • It then forwards the image to a larger TensorFlow neural network running on a Linux server to perform a more detailed classification.
Systems diagram of the cucumber sorter
Makoto used the sample TensorFlow code Deep MNIST for Experts with minor modifications to the convolution, pooling and last layers, changing the network design to adapt to the pixel format of cucumber images and the number of cucumber classes.
Here’s Makoto’s cucumber sorter, which went live in July:
Here’s a close-up of the sorting arm, and the camera interface:

And here is the cucumber sorter in action:

Pushing the limits of deep learning
One of the current challenges with deep learning is that you need to have a large number of training datasets. To train the model, Makoto spent about three months taking 7,000 pictures of cucumbers sorted by his mother, but it’s probably not enough.
When I did a validation with the test images, the recognition accuracy exceeded 95%. But if you apply the system with real use cases, the accuracy drops down to about 70%. I suspect the neural network model has the issue of “overfitting” (the phenomenon in neural network where the model is trained to fit only to the small training dataset) because of the insufficient number of training images.
The second challenge of deep learning is that it consumes a lot of computing power. The current sorter uses a typical Windows desktop PC to train the neural network model. Although it converts the cucumber image into 80 x 80 pixel low-resolution images, it still takes two to three days to complete training the model with 7,000 images.
Even with this low-res image, the system can only classify a cucumber based on its shape, length and level of distortion. It can’t recognize color, texture, scratches and prickles,” Makoto explained. Increasing image resolution by zooming into the cucumber would result in much higher accuracy, but would also increase the training time significantly.
To improve deep learning, some large enterprises have started doing large-scale distributed training, but those servers come at an enormous cost. Google offers Cloud Machine Learning (Cloud ML), a low-cost cloud platform for training and prediction that dedicates hundreds of cloud servers to training a network with TensorFlow. With Cloud ML, Google handles building a large-scale cluster for distributed training, and you just pay for what you use, making it easier for developers to try out deep learning without making a significant capital investment.
These specialized servers were used in the AlphaGo match
Makoto is eagerly awaiting Cloud ML. “I could use Cloud ML to try training the model with much higher resolution images and more training data. Also, I could try changing the various configurations, parameters and algorithms of the neural network to see how that improves accuracy. I can’t wait to try it.

IBM, Local Motors debut Olli, the first Watson-powered self-driving vehicle

By Hugo Angel,

Olli hits the road in the Washington, D.C. area and later this year in Miami-Dade County and Las Vegas.
Local Motors CEO and co-founder John B. Rogers, Jr. with “Olli” & IBM, June 15, 2016.Rich Riggins/Feature Photo Service for IBM

IBM, along with the Arizona-based manufacturer Local Motors, debuted the first-ever driverless vehicle to use the Watson cognitive computing platform. Dubbed “Olli,” the electric vehicle was unveiled at Local Motors’ new facility in National Harbor, Maryland, just outside of Washington, D.C.

Olli, which can carry up to 12 passengers, taps into four Watson APIs (

  • Speech to Text, 
  • Natural Language Classifier, 
  • Entity Extraction and 
  • Text to Speech

) to interact with its riders. It can answer questions like “Can I bring my children on board?” and respond to basic operational commands like, “Take me to the closest Mexican restaurant.” Olli can also give vehicle diagnostics, answering questions like, “Why are you stopping?

Olli learns from data produced by more than 30 sensors embedded throughout the vehicle, which will added and adjusted to meet passenger needs and local preferences.
While Olli is the first self-driving vehicle to use IBM Watson Internet of Things (IoT), this isn’t Watson’s first foray into the automotive industry. IBM launched its IoT for Automotive unit in September of last year, and in March, IBM and Honda announced a deal for Watson technology and analytics to be used in the automaker’s Formula One (F1) cars and pits.
IBM demonstrated its commitment to IoT in March of last year, when it announced it was spending $3B over four years to establish a separate IoT business unit, whch later became the Watson IoT business unit.
IBM says that starting Thursday, Olli will be used on public roads locally in Washington, D.C. and will be used in Miami-Dade County and Las Vegas later this year. Miami-Dade County is exploring a pilot program that would deploy several autonomous vehicles to shuttle people around Miami.
ORIGINAL: ZDnet
By Stephanie Condon for Between the Lines
June 16, 2016

Former NASA chief unveils $100 million neural chip maker KnuEdge

By Hugo Angel,

Daniel Goldin
It’s not all that easy to call KnuEdge a startup. Created a decade ago by Daniel Goldin, the former head of the National Aeronautics and Space Administration, KnuEdge is only now coming out of stealth mode. It has already raised $100 million in funding to build a “neural chip” that Goldin says will make data centers more efficient in a hyperscale age.
Goldin, who founded the San Diego, California-based company with the former chief technology officer of NASA, said he believes the company’s brain-like chip will be far more cost and power efficient than current chips based on the computer design popularized by computer architect John von Neumann. In von Neumann machines, memory and processor are separated and linked via a data pathway known as a bus. Over the years, von Neumann machines have gotten faster by sending more and more data at higher speeds across the bus as processor and memory interact. But the speed of a computer is often limited by the capacity of that bus, leading to what some computer scientists to call the “von Neumann bottleneck.” IBM has seen the same problem, and it has a research team working on brain-like data center chips. Both efforts are part of an attempt to deal with the explosion of data driven by artificial intelligence and machine learning.
Goldin’s company is doing something similar to IBM, but only on the surface. Its approach is much different, and it has been secretly funded by unknown angel investors. And Goldin said in an interview with VentureBeat that the company has already generated $20 million in revenue and is actively engaged in hyperscale computing companies and Fortune 500 companies in the aerospace, banking, health care, hospitality, and insurance industries. The mission is a fundamental transformation of the computing world, Goldin said.
It all started over a mission to Mars,” Goldin said.

Above: KnuEdge’s first chip has 256 cores.Image Credit: KnuEdge
Back in the year 2000, Goldin saw that the time delay for controlling a space vehicle would be too long, so the vehicle would have to operate itself. He calculated that a mission to Mars would take software that would push technology to the limit, with more than tens of millions of lines of code.
Above: Daniel Goldin, CEO of KnuEdge.
Image Credit: KnuEdge
I thought, Former NASA chief unveils $100 million neural chip maker KnuEdge

It’s not all that easy to call KnuEdge a startup. Created a decade ago by Daniel Goldin, the former head of the National Aeronautics and Space Administration, KnuEdge is only now coming out of stealth mode. It has already raised $100 million in funding to build a “neural chip” that Goldin says will make data centers more efficient in a hyperscale age.
Goldin, who founded the San Diego, California-based company with the former chief technology officer of NASA, said he believes the company’s brain-like chip will be far more cost and power efficient than current chips based on the computer design popularized by computer architect John von Neumann. In von Neumann machines, memory and processor are separated and linked via a data pathway known as a bus. Over the years, von Neumann machines have gotten faster by sending more and more data at higher speeds across the bus as processor and memory interact. But the speed of a computer is often limited by the capacity of that bus, leading to what some computer scientists to call the “von Neumann bottleneck.” IBM has seen the same problem, and it has a research team working on brain-like data center chips. Both efforts are part of an attempt to deal with the explosion of data driven by artificial intelligence and machine learning.
Goldin’s company is doing something similar to IBM, but only on the surface. Its approach is much different, and it has been secretly funded by unknown angel investors. And Goldin said in an interview with VentureBeat that the company has already generated $20 million in revenue and is actively engaged in hyperscale computing companies and Fortune 500 companies in the aerospace, banking, health care, hospitality, and insurance industries. The mission is a fundamental transformation of the computing world, Goldin said.
It all started over a mission to Mars,” Goldin said.

Above: KnuEdge’s first chip has 256 cores.Image Credit: KnuEdge
Back in the year 2000, Goldin saw that the time delay for controlling a space vehicle would be too long, so the vehicle would have to operate itself. He calculated that a mission to Mars would take software that would push technology to the limit, with more than tens of millions of lines of code.
Above: Daniel Goldin, CEO of KnuEdge.
Image Credit: KnuEdge
I thought, holy smokes,” he said. “It’s going to be too expensive. It’s not propulsion. It’s not environmental control. It’s not power. This software business is a very big problem, and that nation couldn’t afford it.
So Goldin looked further into the brains of the robotics, and that’s when he started thinking about the computing it would take.
Asked if it was easier to run NASA or a startup, Goldin let out a guffaw.
I love them both, but they’re both very different,” Goldin said. “At NASA, I spent a lot of time on non-technical issues. I had a project every quarter, and I didn’t want to become dull technically. I tried to always take on a technical job doing architecture, working with a design team, and always doing something leading edge. I grew up at a time when you graduated from a university and went to work for someone else. If I ever come back to this earth, I would graduate and become an entrepreneur. This is so wonderful.
Back in 1992, Goldin was planning on starting a wireless company as an entrepreneur. But then he got the call to “go serve the country,” and he did that work for a decade. He started KnuEdge (previously called Intellisis) in 2005, and he got very patient capital.
When I went out to find investors, I knew I couldn’t use the conventional Silicon Valley approach (impatient capital),” he said. “It is a fabulous approach that has generated incredible wealth. But I wanted to undertake revolutionary technology development. To build the future tools for next-generation machine learning, improving the natural interface between humans and machines. So I got patient capital that wanted to see lightning strike. Between all of us, we have a board of directors that can contact almost anyone in the world. They’re fabulous business people and technologists. We knew we had a ten-year run-up.
But he’s not saying who those people are yet.
KnuEdge’s chips are part of a larger platform. KnuEdge is also unveiling KnuVerse, a military-grade voice recognition and authentication technology that unlocks the potential of voice interfaces to power next-generation computing, Goldin said.
While the voice technology market has exploded over the past five years due to the introductions of Siri, Cortana, Google Home, Echo, and ViV, the aspirations of most commercial voice technology teams are still on hold because of security and noise issues. KnuVerse solutions are based on patented authentication techniques using the human voice — even in extremely noisy environments — as one of the most secure forms of biometrics. Secure voice recognition has applications in industries such as banking, entertainment, and hospitality.
KnuEdge says it is now possible to authenticate to computers, web and mobile apps, and Internet of Things devices (or everyday objects that are smart and connected) with only a few words spoken into a microphone — in any language, no matter how loud the background environment or how many other people are talking nearby. In addition to KnuVerse, KnuEdge offers Knurld.io for application developers, a software development kit, and a cloud-based voice recognition and authentication service that can be integrated into an app typically within two hours.
And KnuEdge is announcing KnuPath with LambdaFabric computing. KnuEdge’s first chip, built with an older manufacturing technology, has 256 cores, or neuron-like brain cells, on a single chip. Each core is a tiny digital signal processor. The LambdaFabric makes it possible to instantly connect those cores to each other — a trick that helps overcome one of the major problems of multicore chips, Goldin said. The LambdaFabric is designed to connect up to 512,000 devices, enabling the system to be used in the most demanding computing environments. From rack to rack, the fabric has a latency (or interaction delay) of only 400 nanoseconds. And the whole system is designed to use a low amount of power.
All of the company’s designs are built on biological principles about how the brain gets a lot of computing work done with a small amount of power. The chip is based on what Goldin calls “sparse matrix heterogeneous machine learning algorithms.” And it will run C++ software, something that is already very popular. Programmers can program each one of the cores with a different algorithm to run simultaneously, for the “ultimate in heterogeneity.” It’s multiple input, multiple data, and “that gives us some of our power,” Goldin said.

Above: KnuEdge’s KnuPath chip.
Image Credit: KnuEdge
KnuEdge is emerging out of stealth mode to aim its new Voice and Machine Learning technologies at key challenges in IoT, cloud based machine learning and pattern recognition,” said Paul Teich, principal analyst at Tirias Research, in a statement. “Dan Goldin used his experience in transforming technology to charter KnuEdge with a bold idea, with the patience of longer development timelines and away from typical startup hype and practices. The result is a new and cutting-edge path for neural computing acceleration. There is also a refreshing surprise element to KnuEdge announcing a relevant new architecture that is ready to ship… not just a concept or early prototype.”
Today, Goldin said the company is ready to show off its designs. The first chip was ready last December, and KnuEdge is sharing it with potential customers. That chip was built with a 32-nanometer manufacturing process, and even though that’s an older technology, it is a powerful chip, Goldin said. Even at 32 nanometers, the chip has something like a two-times to six-times performance advantage over similar chips, KnuEdge said.
The human brain has a couple of hundred billion neurons, and each neuron is connected to at least 10,000 to 100,000 neurons,” Goldin said. “And the brain is the most energy efficient and powerful computer in the world. That is the metaphor we are using.”
KnuEdge has a new version of its chip under design. And the company has already generated revenue from sales of the prototype systems. Each board has about four chips.
As for the competition from IBM, Goldin said, “I believe we made the right decision and are going in the right direction. IBM’s approach is very different from what we have. We are not aiming at anyone. We are aiming at the future.
In his NASA days, Goldin had a lot of successes. There, he redesigned and delivered the International Space Station, tripled the number of space flights, and put a record number of people into space, all while reducing the agency’s planned budget by 25 percent. He also spent 25 years at TRW, where he led the development of satellite television services.
KnuEdge has 100 employees, but Goldin said the company outsources almost everything. Goldin said he is planning to raised a round of funding late this year or early next year. The company collaborated with the University of California at San Diego and UCSD’s California Institute for Telecommunications and Information Technology.
With computers that can handle natural language systems, many people in the world who can’t read or write will be able to fend for themselves more easily, Goldin said.
I want to be able to take machine learning and help people communicate and make a living,” he said. “This is just the beginning. This is the Wild West. We are talking to very large companies about this, and they are getting very excited.
A sample application is a home that has much greater self-awareness. If there’s something wrong in the house, the KnuEdge system could analyze it and figure out if it needs to alert the homeowner.
Goldin said it was hard to keep the company secret.
I’ve been biting my lip for ten years,” he said.
As for whether KnuEdge’s technology could be used to send people to Mars, Goldin said. “This is available to whoever is going to Mars. I tried twice. I would love it if they use it to get there.
ORIGINAL: Venture Beat

holy smokes

,” he said. “It’s going to be too expensive. It’s not propulsion. It’s not environmental control. It’s not power. This software business is a very big problem, and that nation couldn’t afford it.

So Goldin looked further into the brains of the robotics, and that’s when he started thinking about the computing it would take.
Asked if it was easier to run NASA or a startup, Goldin let out a guffaw.
I love them both, but they’re both very different,” Goldin said. “At NASA, I spent a lot of time on non-technical issues. I had a project every quarter, and I didn’t want to become dull technically. I tried to always take on a technical job doing architecture, working with a design team, and always doing something leading edge. I grew up at a time when you graduated from a university and went to work for someone else. If I ever come back to this earth, I would graduate and become an entrepreneur. This is so wonderful.
Back in 1992, Goldin was planning on starting a wireless company as an entrepreneur. But then he got the call to “go serve the country,” and he did that work for a decade. He started KnuEdge (previously called Intellisis) in 2005, and he got very patient capital.
When I went out to find investors, I knew I couldn’t use the conventional Silicon Valley approach (impatient capital),” he said. “It is a fabulous approach that has generated incredible wealth. But I wanted to undertake revolutionary technology development. To build the future tools for next-generation machine learning, improving the natural interface between humans and machines. So I got patient capital that wanted to see lightning strike. Between all of us, we have a board of directors that can contact almost anyone in the world. They’re fabulous business people and technologists. We knew we had a ten-year run-up.
But he’s not saying who those people are yet.
KnuEdge’s chips are part of a larger platform. KnuEdge is also unveiling KnuVerse, a military-grade voice recognition and authentication technology that unlocks the potential of voice interfaces to power next-generation computing, Goldin said.
While the voice technology market has exploded over the past five years due to the introductions of Siri, Cortana, Google Home, Echo, and ViV, the aspirations of most commercial voice technology teams are still on hold because of security and noise issues. KnuVerse solutions are based on patented authentication techniques using the human voice — even in extremely noisy environments — as one of the most secure forms of biometrics. Secure voice recognition has applications in industries such as banking, entertainment, and hospitality.
KnuEdge says it is now possible to authenticate to computers, web and mobile apps, and Internet of Things devices (or everyday objects that are smart and connected) with only a few words spoken into a microphone — in any language, no matter how loud the background environment or how many other people are talking nearby. In addition to KnuVerse, KnuEdge offers Knurld.io for application developers, a software development kit, and a cloud-based voice recognition and authentication service that can be integrated into an app typically within two hours.
And KnuEdge is announcing KnuPath with LambdaFabric computing. KnuEdge’s first chip, built with an older manufacturing technology, has 256 cores, or neuron-like brain cells, on a single chip. Each core is a tiny digital signal processor. The LambdaFabric makes it possible to instantly connect those cores to each other — a trick that helps overcome one of the major problems of multicore chips, Goldin said. The LambdaFabric is designed to connect up to 512,000 devices, enabling the system to be used in the most demanding computing environments. From rack to rack, the fabric has a latency (or interaction delay) of only 400 nanoseconds. And the whole system is designed to use a low amount of power.
All of the company’s designs are built on biological principles about how the brain gets a lot of computing work done with a small amount of power. The chip is based on what Goldin calls “sparse matrix heterogeneous machine learning algorithms.” And it will run C++ software, something that is already very popular. Programmers can program each one of the cores with a different algorithm to run simultaneously, for the “ultimate in heterogeneity.” It’s multiple input, multiple data, and “that gives us some of our power,” Goldin said.

Above: KnuEdge’s KnuPath chip.
Image Credit: KnuEdge
KnuEdge is emerging out of stealth mode to aim its new Voice and Machine Learning technologies at key challenges in IoT, cloud based machine learning and pattern recognition,” said Paul Teich, principal analyst at Tirias Research, in a statement. “Dan Goldin used his experience in transforming technology to charter KnuEdge with a bold idea, with the patience of longer development timelines and away from typical startup hype and practices. The result is a new and cutting-edge path for neural computing acceleration. There is also a refreshing surprise element to KnuEdge announcing a relevant new architecture that is ready to ship… not just a concept or early prototype.”
Today, Goldin said the company is ready to show off its designs. The first chip was ready last December, and KnuEdge is sharing it with potential customers. That chip was built with a 32-nanometer manufacturing process, and even though that’s an older technology, it is a powerful chip, Goldin said. Even at 32 nanometers, the chip has something like a two-times to six-times performance advantage over similar chips, KnuEdge said.
The human brain has a couple of hundred billion neurons, and each neuron is connected to at least 10,000 to 100,000 neurons,” Goldin said. “And the brain is the most energy efficient and powerful computer in the world. That is the metaphor we are using.”
KnuEdge has a new version of its chip under design. And the company has already generated revenue from sales of the prototype systems. Each board has about four chips.
As for the competition from IBM, Goldin said, “I believe we made the right decision and are going in the right direction. IBM’s approach is very different from what we have. We are not aiming at anyone. We are aiming at the future.
In his NASA days, Goldin had a lot of successes. There, he redesigned and delivered the International Space Station, tripled the number of space flights, and put a record number of people into space, all while reducing the agency’s planned budget by 25 percent. He also spent 25 years at TRW, where he led the development of satellite television services.
KnuEdge has 100 employees, but Goldin said the company outsources almost everything. Goldin said he is planning to raised a round of funding late this year or early next year. The company collaborated with the University of California at San Diego and UCSD’s California Institute for Telecommunications and Information Technology.
With computers that can handle natural language systems, many people in the world who can’t read or write will be able to fend for themselves more easily, Goldin said.
I want to be able to take machine learning and help people communicate and make a living,” he said. “This is just the beginning. This is the Wild West. We are talking to very large companies about this, and they are getting very excited.
A sample application is a home that has much greater self-awareness. If there’s something wrong in the house, the KnuEdge system could analyze it and figure out if it needs to alert the homeowner.
Goldin said it was hard to keep the company secret.
I’ve been biting my lip for ten years,” he said.
As for whether KnuEdge’s technology could be used to send people to Mars, Goldin said. “This is available to whoever is going to Mars. I tried twice. I would love it if they use it to get there.
ORIGINAL: Venture Beat

See The Difference One Year Makes In Artificial Intelligence Research

By Hugo Angel,

AN IMPROVED WAY OF LEARNING ABOUT NEURAL NETWORKS

Google/ Geometric IntelligenceThe difference between Google’s generated images of 2015, and the images generated in 2016.

Last June, Google wrote that it was teaching its artificial intelligence algorithms to generate images of objects, or “dream.” The A.I. tried to generate pictures of things it had seen before, like dumbbells. But it ran into a few problems. It was able to successfully make objects shaped like dumbbells, but each had disembodied arms sticking out from the handles, because arms and dumbbells were closely associated. Over the course of a year, this process has become incredibly refined, meaning these algorithms are learning much more complete ideas about the world.

New research shows that even when trained on a standardized set of images,, A.I. can generate increasingly realistic images of objects that it’s seen before. Through this, the researchers were also able to sequence the images and make low-resolution videos of actions like skydiving and playing violin. The paper, from the University of Wyoming, Albert Ludwigs University of Freiburg, and Geometric Intelligence, focuses on deep generator networks, which not only create these images but are able to show how each neuron in the network affects the entire system’s understanding.
Looking at generated images from a model is important because it gives researchers a better idea about how their models process data. It’s a way to take a look under the hood of algorithms that usually act independent of human intervention as they work. By seeing what computation each neuron in the network does, they can tweak the structure to be faster or more accurate.
With real images, it is unclear which of their features a neuron has learned,” the team wrote. “For example, if a neuron is activated by a picture of a lawn mower on grass, it is unclear if it ‘cares about’ the grass, but if an image…contains grass, we can be more confident the neuron has learned to pay attention to that context.”
They’re researching their research—and this gives a valuable tool to continue doing so.

Screenshot
Take a look at some other examples of images the A.I. was able to produce.
ORIGINAL: Popular Science
May 31, 2016

Next Rembrandt

By Hugo Angel,

01 GATHERING THE DATA
To distill the artistic DNA of Rembrandt, an extensive database of his paintings was built and analyzed, pixel by pixel.
FUN FACT:
150 Gigabytes of digitally rendered graphics

BUILDING AN EXTENSIVE POOL OF DATA
t’s been almost four centuries since the world lost the talent of one its most influential classical painters, Rembrandt van Rijn. To bring him back, we distilled the artistic DNA from his work and used it to create The Next Rembrandt.
We examined the entire collection of Rembrandt’s work, studying the contents of his paintings pixel by pixel. To get this data, we analyzed a broad range of materials like high resolution 3D scans and digital files, which were upscaled by deep learning algorithms to maximize resolution and quality. This extensive database was then used as the foundation for creating The Next Rembrandt.
Data is used by many people today to help them be more efficient and knowledgeable about their daily work, and about the decisions they need to make. But in this project it’s also used to make life itself more beautiful. It really touches the human soul.
– Ron Augustus, Microsoft
02 DETERMINING THE SUBJECT
Data from Rembrandt’s body of work showed the way to the subject of the new painting.
FUN FACT:
346 Paintings were studied


DELVING INTO REMBRANDT VAN RIJN
  • 49% FEMALE
  • 51% MALE
Throughout his life, Rembrandt painted a great number of self-portraits, commissioned portraits and group shots, Biblical scenes, and even a few landscapes. He’s known for painting brutally honest and unforgiving portrayals of his subjects, utilizing a limited color palette for facial emphasis, and innovating the use of light and shadows.
“There’s a lot of Rembrandt data available — you have this enormous amount of technical data from all these paintings from various collections. And can we actually create something out of it that looks like Rembrandt? That’s an appealing question.”
– Joris Dik, Technical University Delft
BREAKING DOWN THE DEMOGRAPHICS IN REMBRANDT’S WORK
To create new artwork using data from Rembrandt’s paintings, we had to maximize the data pool from which to pull information. Because he painted more portraits than any other subject, we narrowed down our exploration to these paintings.
Then we found the period in which the majority of these paintings were created: between 1632 and 1642. Next, we defined the demographic segmentation of the people in these works and saw which elements occurred in the largest sample of paintings. We funneled down that selection starting with gender and then went on to analyze everything from age and head direction, to the amount of facial hair present.
After studying the demographics, the data lead us to a conclusive subject: a portrait of a Caucasian male with facial hair, between the ages of thirty and forty, wearing black clothes with a white collar and a hat, facing to the right.
03 GENERATING THE FEATURES
A software system was designed to understand Rembrandt’s style and generate new features.
FUN FACT:
500+ Hours of rendering
MASTERING THE STYLE OF REMBRANDT
In creating the new painting, it was imperative to stay accurate to Rembrandt’s unique style. As “The Master of Light and Shadow,” Rembrandt relied on his innovative use of lighting to shape the features in his paintings. By using very concentrated light sources, he essentially created a “spotlight effect” that gave great attention to the lit elements and left the rest of the painting shrouded in shadows. This resulted in some of the features being very sharp and in focus and others becoming soft and almost blurry, an effect that had to be replicated in the new artwork.
When you want to make a new painting you have some idea of how it’s going to look. But in our case we started from basically nothing — we had to create a whole painting using just data from Rembrandt’s paintings.
– Ben Haanstra, Developer
GENERATING FEATURES BASED ON DATA
To master his style, we designed a software system that could understand Rembrandt based on his use of geometry, composition, and painting materials. A facial recognition algorithm identified and classified the most typical geometric patterns used by Rembrandt to paint human features. It then used the learned principles to replicate the style and generate new facial features for our painting.
CONSTRUCTING A FACE OUT OF THE NEW FEATURES
Once we generated the individual features, we had to assemble them into a fully formed face and bust according to Rembrandt’s use of proportions. An algorithm measured the distances between the facial features in Rembrandt’s paintings and calculated them based on percentages. Next, the features were transformed, rotated, and scaled, then accurately placed within the frame of the face. Finally, we rendered the light based on gathered data in order to cast authentic shadows on each feature.
04 BRINGING IT TO LIFE
CREATING ACCURATE DEPTH AND TEXTURE
Analyses
We now had a digital file true to Rembrandt’s style in content, shapes, and lighting. But paintings aren’t just 2D — they have a remarkable three-dimensionality that comes from brushstrokes and layers of paint. To recreate this texture, we had to study 3D scans of Rembrandt’s paintings and analyze the intricate layers on top of the canvas.
“We looked at a number of Rembrandt paintings, and we scanned their surface texture, their elemental composition, and what kinds of pigments were used. That’s the kind of information you need if you want to generate a painting by Rembrandt virtually.”
– Joris Dik, Technical University Delft
USING A HEIGHT MAP TO PRINT IN 3D
We created a height map using two different algorithms that found texture patterns of canvas surfaces and layers of paint. That information was transformed into height data, allowing us to mimic the brushstrokes used by Rembrandt.
We then used an elevated printing technique on a 3D printer that output multiple layers of paint-based UV ink. The final height map determined how much ink was released onto the canvas during each layer of the printing process. In the end, we printed thirteen layers of ink, one on top of the other, to create a painting texture true to Rembrandt’s style.

ORIGINAL: Next Rembrandt

A Scale-up Synaptic Supercomputer (NS16e): Four Perspectives

By Hugo Angel,

Today, Lawrence Livermore National Lab (LLNL) and IBM announce the development of a new Scale-up Synaptic Supercomputer (NS16e) that highly integrates 16 TrueNorth Chips in a 4×4 array to deliver 16 million neurons and 256 million synapses. LLNL will also receive an end-to-end software ecosystem that consists of a simulator; a programming language; an integrated programming environment; a library of algorithms as well as applications; firmware; tools for composing neural networks for deep learning; a teaching curriculum; and cloud enablement. Also, don’t miss the story in The Wall Street Journal (sign-in required) and the perspective and a video by LLNL’s Brian Van Essen.
To provide insights into what it took to achieve this significant milestone in the history of our project, following are four intertwined perspectives from my colleagues:

  • Filipp Akopyan — First Steps to an Efficient Scalable NeuroSynaptic Supercomputer.
  • Bill Risk and Ben Shaw — Creating an Iconic Enclosure for the NS16e.
  • Jun Sawada — NS16e System as a Neural Network Development Workstation.
  • Brian Taba — How to Program a Synaptic Supercomputer.
The following timeline provides context for today’s milestone in terms of the continued evolution of our project.
Illustration Credit: William Risk

Microsoft Neural Net Shows Deep Learning can get Way Deeper

By Hugo Angel,

Silicon Wafer by Sonic
PAUL TAYLOR/GETTY IMAGES
COMPUTER VISION IS now a part of everyday life. Facebook recognizes faces in the photos you post to the popular social network. The Google Photos app can find images buried in your collection, identifying everything from dogs to birthday parties to gravestones. Twitter can pinpoint pornographic images without help from human curators.
All of this “seeing” stems from a remarkably effective breed of artificial intelligence called deep learning. But as far as this much-hyped technology has come in recent years, a new experiment from Microsoft Research shows it’s only getting started. Deep learning can go so much deeper.
We’re staring at a huge design space, trying to figure out where to go next.‘ 

 

PETER LEE, MICROSOFT RESEARCH
This revolution in computer vision was a long time coming. A key turning point came in 2012, when artificial intelligence researchers from the University of Toronto won a competition called ImageNet. ImageNet pits machines against each other in an image recognition contest—which computer can identify cats or cars or clouds more accurately?—and that year, the Toronto team, including researcher Alex Krizhevsky and professor Geoff Hinton, topped the contest using deep neural nets, a technology that learns to identify images by examining enormous numbers of them, rather than identifying images according to rules diligently hand-coded by humans.
 
Toronto’s win provided a roadmap for the future of deep learning. In the years since, the biggest names on the ‘net—including Facebook, Google, Twitter, and Microsoft—have used similar tech to build computer vision systems that can match and even surpass humans. “We can’t claim that our system ‘sees’ like a person does,” says Peter Lee, the head of research at Microsoft. “But what we can say is that for very specific, narrowly defined tasks, we can learn to be as good as humans.
Roughly speaking, neural nets use hardware and software to approximate the web of neurons in the human brain. This idea dates to the 1980s, but in 2012, Krizhevsky and Hinton advanced the technology by running their neural nets atop graphics processing units, or GPUs. These specialized chips were originally designed to render images for games and other highly graphical software, but as it turns out, they’re also suited to the kind of math that drives neural nets. Google, Facebook, Twitter, Microsoft, and so many others now use GPU-powered-AI to handle image recognition and so many others tasks, from Internet search to security. Krizhevsky and Hinton joined the staff at Google.
Deep learning can go so much deeper.
Now, the latest ImageNet winner is pointing to what could be another step in the evolution of computer vision—and the wider field of artificial intelligence. Last month, a team of Microsoft researchers took the ImageNet crown using a new approach they call a deep residual network. The name doesn’t quite describe it. They’ve designed a neural net that’s significantly more complex than typical designs—one that spans 152 layers of mathematical operations, compared to the typical six or seven. It shows that, in the years to come, companies like Microsoft will be able to use vast clusters of GPUs and other specialized chips to significantly improve not only image recognition but other AI services, including systems that recognize speech and even understand language as we humans naturally speak it.
In other words, deep learning is nowhere close to reaching its potential. “We’re staring at a huge design space,” Lee says, “trying to figure out where to go next.
Layers of Neurons
Deep neural networks are arranged in layers. Each layer is a different set of mathematical operations—aka algorithms. The output of one layer becomes the input of the next. Loosely speaking, if a neural network is designed for image recognition, one layer will look for a particular set of features in an image—edges or angles or shapes or textures or the like—and the next will look for another set. These layers are what make these neural networks deep. “Generally speaking, if you make these networks deeper, it becomes easier for them to learn,” says Alex Berg, a researcher at the University of North Carolina who helps oversee the ImageNet competition.
Constructing this kind of mega-neural net is flat-out difficult.
Today, a typical neural network includes six or seven layers. Some might extend to 20 or even 30. But the Microsoft team, led by researcher Jian Sun, just expanded that to 152. In essence, this neural net is better at recognizing images because it can examine more features. “There is a lot more subtlety that can be learned,” Lee says.
In the past, according Lee and researchers outside of Microsoft, this sort of very deep neural net wasn’t feasible. Part of the problem was that as your mathematical signal moved from layer to layer, it became diluted and tended to fade. As Lee explains, Microsoft solved this problem by building a neural net that skips certain layers when it doesn’t need them, but uses them when it does. “When you do this kind of skipping, you’re able to preserve the strength of the signal much further,” Lee says, “and this is turning out to have a tremendous, beneficial impact on accuracy.
Berg says that this is an notable departure from previous systems, and he believes that others companies and researchers will follow suit.
Deep Difficulty
The other issue is that constructing this kind of mega-neural net is tremendously difficult. Landing on a particular set of algorithms—determining how each layer should operate and how it should talk to the next layer—is an almost epic task. But Microsoft has a trick here, too. It has designed a computing system that can help build these networks.
As Jian Sun explains it, researchers can identify a promising arrangement for massive neural networks, and then the system can cycle through a range of similar possibilities until it settles on this best one. “In most cases, after a number of tries, the researchers learn [something], reflect, and make a new decision on the next try,” he says. “You can view this as ‘human-assisted search.’”
Microsoft has designed a computing system that can help build these networks.
According to Adam Gibson—the chief researcher at deep learning startup Skymind—this kind of thing is getting more common. It’s called “hyper parameter optimization.” “People can just spin up a cluster [of machines], run 10 models at once, find out which one works best and use that,” Gibson says. “They can input some baseline parameter—based on intuition—and the machines kind of homes in on what the best solution is.” As Gibson notes, last year Twitter acquired a company, Whetlab, that offers similar ways of “optimizing” neural networks.

‘A Hardware Problem’
As Peter Lee and Jian Sun describe it, such an approach isn’t exactly “brute forcing” the problem. “With very very large amounts of compute resources, one could fantasize about a gigantic ‘natural selection’ setup where evolutionary forces help direct a brute-force search through a huge space of possibilities,” Lee says. “The world doesn’t have those computing resources available for such a thing…For now, we will still depend on really smart researchers like Jian.
But Lee does say that, thanks to new techniques and computer data centers filled with GPU machines, the realm of possibilities for deep learning are enormous. A big part of the company’s task is just finding the time and the computing power needed to explore these possibilities. “This work as dramatically exploded the design space. The amount of ground to cover, in terms of scientific investigation, has become exponentially larger,” Lee says. And this extends well beyond image recognition, into speech recognition, natural language understanding, and other tasks.
As Lee explains, that’s one reason Microsoft is not only pushing to improve the power of its GPUs clusters, but exploring the use of other specialized processors, including FPGAs—chips that can programmed for particular tasks, such as deep learning. “There has also been an explosion in demand for much more experimental hardware platforms from our researchers,” he says. And this work is sending ripples across the wider of world of tech and artificial intelligence. This past summer, in its largest ever acquisition deal, Intel agreed to buy Altera, which specializes in FPGAs.
Indeed, Gibson says that deep learning has become more of “a hardware problem.” Yes, we still need top researchers to guide the creation of neural networks, but more and more, finding new paths is a matter of brute-forcing new algorithms across ever more powerful collections of hardware. As Gibson point out, though these deep neural nets work extremely well, we don’t quite know why they work. The trick lies in finding the complex combination of algorithms that work the best. More and better hardware can shorten the path.
The end result is that the companies that can build the most powerful networks of hardware are the companies will come out ahead. That would be Google and Facebook and Microsoft. Those that are good at deep learning today will only get better.
ORIGINAL: Wired

NVIDIA DRIVE PX 2. NVIDIA Accelerates Race to Autonomous Driving at CES 2016

By Hugo Angel,

NVIDIA today shifted its autonomous-driving leadership into high gear.
At a press event kicking off CES 2016, we unveiled artificial-intelligence technology that will let cars sense the world around them and pilot a safe route forward.
Dressed in his trademark black leather jacket, speaking to a crowd of some 400 automakers, media and analysts, NVIDIA CEO Jen-Hsun Huang revealed DRIVE PX 2, an automotive supercomputing platform that processes 24 trillion deep learning operations a second. That’s 10 times the performance of the first-generation DRIVE PX, now being used by more than 50 companies in the automotive world.
The new DRIVE PX 2 delivers 8 teraflops of processing power. It has the processing power of 150 MacBook Pros. And it’s the size of a lunchbox in contrast to earlier autonomous-driving technology being used today, which takes up the entire trunk of a mid-sized sedan.
Self-driving cars will revolutionize society,” Huang said at the beginning of his talk. “And NVIDIA’s vision is to enable them.
 
Volvo to Deploy DRIVE PX in Self-Driving SUVs
As part of its quest to eliminate traffic fatalities, Volvo will be the first automaker to deploy DRIVE PX 2.
Huang announced that Volvo – known worldwide for safety and reliability – will be the first automaker to deploy DRIVE PX 2.
In the world’s first public trial of autonomous driving, the Swedish automaker next year will lease 100 XC90 luxury SUVs outfitted with DRIVE PX 2 technology. The technology will help the vehicles drive autonomously around Volvo’s hometown of Gothenburg, and semi-autonomously elsewhere.
DRIVE PX 2 has the power to harness a host of sensors to get a 360 degree view of the environment around the car.
The rear-view mirror is history,” Jen-Hsun said.
Drive Safely, by Not Driving at All
Not so long ago, pundits had questioned the safety of technology in cars. Now, with Volvo incorporating autonomous vehicles into its plan to end traffic fatalities, that script has been flipped. Autonomous cars may be vastly safer than human-piloted vehicles.
Car crashes – an estimated 93 percent of them caused by human error kill 1.3 million drivers each year. More American teenagers die from texting while driving than any other cause, including drunk driving.
There’s also a productivity issue. Americans waste some 5.5 billion hours of time each year in traffic, costing the U.S. about $121 billion, according to an Urban Mobility Report from Texas A&M. And inefficient use of roads by cars wastes even vaster sums spent on infrastructure.
Deep Learning Hits the Road
Self-driving solutions based on computer vision can provide some answers. But tackling the infinite permutations that a driver needs to react to – stray pets, swerving cars, slashing rain, steady road construction crews – is far too complex a programming challenge.
Deep learning enabled by NVIDIA technology can address these challenges. A highly trained deep neural network – residing on supercomputers in the cloud – captures the experience of many tens of thousands of hours of road time.
Huang noted that a number of automotive companies are already using NVIDIA’s deep learning technology to power their efforts, getting speedup of 30-40X in training their networks compared with other technology. BMW, Daimler and Ford are among them, along with innovative Japanese startups like Preferred Networks and ZMP. And Audi said it was able in four hours to do training that took it two years with a competing solution.
  NVIDIA DRIVE PX 2 is part of an end-to-end platform that brings deep learning to the road.
NVIDIA’s end-to-end solution for deep learning starts with NVIDIA DIGITS, a supercomputer that can be used to train digital neural networks by exposing them to data collected during that time on the road. On the other end is DRIVE PX 2, which draws on this training to make inferences to enable the car to progress safely down the road. In the middle is NVIDIA DriveWorks, a suite of software tools, libraries and modules that accelerates development and testing of autonomous vehicles.
DriveWorks enables sensor calibration, acquisition of surround data, synchronization, recording and then processing streams of sensor data through a complex pipeline of algorithms running on all of the DRIVE PX 2’s specialized and general-purpose processors.
During the event, Huang reminded the audience that machines are already beating humans at tasks once considered impossible for computers, such as image recognition. Systems trained with deep learning can now correctly classify images more than 96 percent of the time, exceeding what humans can do on similar tasks.
He used the event to show what deep learning can do for autonomous vehicles.
A series of demos drove this home, showing in three steps how DRIVE PX 2 harnesses a host of sensors – lidar, radar and cameras and ultrasonic – to understand the world around it, in real time, and plan a safe and efficient path forward.
The World’s Biggest Infotainment System
 
The highlight of the demos was what Huang called the world’s largest car infotainment system — an elegant block the size of a medium-sized bedroom wall mounted with a long horizontal screen and a long vertical one.
While a third larger screen showed the scene that a driver would take in, the wide demo screen showed how the car — using deep learning and sensor fusion — “viewed” the very same scene in real-time, stitched together from its array of sensors. On its right, the huge portrait-oriented screen shows a highly precise map that marked the car’s progress.
It’s a demo that will leave an impression on an audience that’s going to be hear a lot about the future of driving in the week ahead.
Photos from Our CES 2016 Press Event
NVIDIA Drive PX-2
ORIGINAL: Nvidia
By Bob Sherbin on January 3, 2016
%d bloggers like this: