Category: Text to Speech

WaveNet: A Generative Model for Raw Audio by Google DeepMind

By Hugo Angel,

WaveNet: A Generative Model for Raw Audio
This post presents WaveNet, a deep generative model of raw audio waveforms. We show that WaveNets are able to generate speech which mimics any human voice and which sounds more natural than the best existing Text-to-Speech systems, reducing the gap with human performance by over 50%.
We also demonstrate that the same network can be used to synthesize other audio signals such as music, and present some striking samples of automatically generated piano pieces.
Talking Machines
Allowing people to converse with machines is a long-standing dream of human-computer interaction. The ability of computers to understand natural speech has been revolutionised in the last few years by the application of deep neural networks (e.g.,Google Voice Search). However, generating speech with computers — a process usually referred to as speech synthesis or text-to-speech (TTS) — is still largely based on so-called concatenative TTS, where a very large database of short speech fragments are recorded from a single speaker and then recombined to form complete utterances. This makes it difficult to modify the voice (for example switching to a different speaker, or altering the emphasis or emotion of their speech) without recording a whole new database.
This has led to a great demand for parametric TTS, where all the information required to generate the data is stored in the parameters of the model, and the contents and characteristics of the speech can be controlled via the inputs to the model. So far, however, parametric TTS has tended to sound less natural than concatenative, at least for syllabic languages such as English. Existing parametric models typically generate audio signals by passing their outputs through signal processing algorithms known asvocoders.
WaveNet changes this paradigm by directly modelling the raw waveform of the audio signal, one sample at a time. As well as yielding more natural-sounding speech, using raw waveforms means that WaveNet can model any kind of audio, including music.


Wave animation


Researchers usually avoid modelling raw audio because it ticks so quickly: typically 16,000 samples per second or more, with important structure at many time-scales. Building a completely autoregressive model, in which the prediction for every one of those samples is influenced by all previous ones (in statistics-speak, each predictive distribution is conditioned on all previous observations), is clearly a challenging task.
However, our PixelRNN and PixelCNN models, published earlier this year, showed that it was possible to generate complex natural images not only one pixel at a time, but one colour-channel at a time, requiring thousands of predictions per image. This inspired us to adapt our two-dimensional PixelNets to a one-dimensional WaveNet.
Architecture animation


 The above animation shows how a WaveNet is structured. It is a fully convolutional neural network, where the convolutional layers have various dilation factors that allow its receptive field to grow exponentially with depth and cover thousands of timesteps.At training time, the input sequences are real waveforms recorded from human speakers. After training, we can sample the network to generate synthetic utterances. At each step during sampling a value is drawn from the probability distribution computed by the network. This value is then fed back into the input and a new prediction for the next step is made. Building up samples one step at a time like this is computationally expensive, but we have found it essential for generating complex, realistic-sounding audio.
Improving the State of the Art
We trained WaveNet using some of Google’s TTS datasets so we could evaluate its performance. The following figure shows the quality of WaveNets on a scale from 1 to 5, compared with Google’s current best TTS systems (parametric and concatenative), and with human speech using Mean Opinion Scores (MOS). MOS are a standard measure for subjective sound quality tests, and were obtained in blind tests with human subjects (from over 500 ratings on 100 test sentences). As we can see, WaveNets reduce the gap between the state of the art and human-level performance by over 50% for both US English and Mandarin Chinese.
For both Chinese and English, Google’s current TTS systems are considered among the best worldwide, so improving on both with a single model is a major achievement.


Here are some samples from all three systems so you can listen and compare yourself:

US English:

Mandarin Chinese:

Knowing What to Say

In order to use WaveNet to turn text into speech, we have to tell it what the text is. We do this by transforming the text into a sequence of linguistic and phonetic features (which contain information about the current phoneme, syllable, word, etc.) and by feeding it into WaveNet. This means the network’s predictions are conditioned not only on the previous audio samples, but also on the text we want it to say.
If we train the network without the text sequence, it still generates speech, but now it has to make up what to say. As you can hear from the samples below, this results in a kind of babbling, where real words are interspersed with made-up word-like sounds:


Notice that non-speech sounds, such as breathing and mouth movements, are also sometimes generated by WaveNet; this reflects the greater flexibility of a raw-audio model.
As you can hear from these samples, a single WaveNet is able to learn the characteristics of many different voices, male and female. To make sure it knew which voice to use for any given utterance, we conditioned the network on the identity of the speaker. Interestingly, we found that training on many speakers made it better at modelling a single speaker than training on that speaker alone, suggesting a form of transfer learning.
By changing the speaker identity, we can use WaveNet to say the same thing in different voices:


Similarly, we could provide additional inputs to the model, such as emotions or accents, to make the speech even more diverse and interesting.
Making Music
Since WaveNets can be used to model any audio signal, we thought it would also be fun to try to generate music. Unlike the TTS experiments, we didn’t condition the networks on an input sequence telling it what to play (such as a musical score); instead, we simply let it generate whatever it wanted to. When we trained it on a dataset of classical piano music, it produced fascinating samples like the ones below:


WaveNets open up a lot of possibilities for TTS, music generation and audio modelling in general. The fact that directly generating timestep per timestep with deep neural networks works at all for 16kHz audio is really surprising, let alone that it outperforms state-of-the-art TTS systems. We are excited to see what we can do with them next.
For more details, take a look at our paper.

ORIGINAL: Google DeepMind
Aäron van den Oord. Research Scientist, DeepMind
Heiga Zen. Research Scientist, Google
Sander Dieleman. Research Scientist, DeepMind
8 September 2016

© 2016 DeepMind Technologies Limited


IBM, Local Motors debut Olli, the first Watson-powered self-driving vehicle

By Hugo Angel,

Olli hits the road in the Washington, D.C. area and later this year in Miami-Dade County and Las Vegas.
Local Motors CEO and co-founder John B. Rogers, Jr. with “Olli” & IBM, June 15, 2016.Rich Riggins/Feature Photo Service for IBM

IBM, along with the Arizona-based manufacturer Local Motors, debuted the first-ever driverless vehicle to use the Watson cognitive computing platform. Dubbed “Olli,” the electric vehicle was unveiled at Local Motors’ new facility in National Harbor, Maryland, just outside of Washington, D.C.

Olli, which can carry up to 12 passengers, taps into four Watson APIs (

  • Speech to Text, 
  • Natural Language Classifier, 
  • Entity Extraction and 
  • Text to Speech

) to interact with its riders. It can answer questions like “Can I bring my children on board?” and respond to basic operational commands like, “Take me to the closest Mexican restaurant.” Olli can also give vehicle diagnostics, answering questions like, “Why are you stopping?

Olli learns from data produced by more than 30 sensors embedded throughout the vehicle, which will added and adjusted to meet passenger needs and local preferences.
While Olli is the first self-driving vehicle to use IBM Watson Internet of Things (IoT), this isn’t Watson’s first foray into the automotive industry. IBM launched its IoT for Automotive unit in September of last year, and in March, IBM and Honda announced a deal for Watson technology and analytics to be used in the automaker’s Formula One (F1) cars and pits.
IBM demonstrated its commitment to IoT in March of last year, when it announced it was spending $3B over four years to establish a separate IoT business unit, whch later became the Watson IoT business unit.
IBM says that starting Thursday, Olli will be used on public roads locally in Washington, D.C. and will be used in Miami-Dade County and Las Vegas later this year. Miami-Dade County is exploring a pilot program that would deploy several autonomous vehicles to shuttle people around Miami.
By Stephanie Condon for Between the Lines
June 16, 2016

“AI & The Future Of Civilization” A Conversation With Stephen Wolfram

By Hugo Angel,

“AI & The Future Of Civilization” A Conversation With Stephen Wolfram [3.1.16]
Stephen Wolfram
What makes us different from all these things? What makes us different is the particulars of our history, which gives us our notions of purpose and goals. That’s a long way of saying when we have the box on the desk that thinks as well as any brain does, the thing it doesn’t have, intrinsically, is the goals and purposes that we have. Those are defined by our particulars—our particular biology, our particular psychology, our particular cultural history.

The thing we have to think about as we think about the future of these things is the goals. That’s what humans contribute, that’s what our civilization contributes—execution of those goals; that’s what we can increasingly automate. We’ve been automating it for thousands of years. We will succeed in having very good automation of those goals. I’ve spent some significant part of my life building technology to essentially go from a human concept of a goal to something that gets done in the world.

There are many questions that come from this. For example, we’ve got these great AIs and they’re able to execute goals, how do we tell them what to do?…

STEPHEN WOLFRAM, distinguished scientist, inventor, author, and business leader, is Founder & CEO, Wolfram Research; Creator, Mathematica, Wolfram|Alpha & the Wolfram Language; Author, A New Kind of Science. Stephen Wolfram’s EdgeBio Page


Some tough questions. One of them is about the future of the human condition. That’s a big question. I’ve spent some part of my life figuring out how to make machines automate stuff. It’s pretty obvious that we can automate many of the things that we humans have been proud of for a long time. What’s the future of the human condition in that situation?

More particularly, I see technology as taking human goals and making them able to be automatically executed by machines. The human goals that we’ve had in the past have been things like moving objects from here to there and using a forklift rather than our own hands. Now, the things that we can do automatically are more intellectual kinds of things that have traditionally been the professions’ work, so to speak. These are things that we are going to be able to do by machine. The machine is able to execute things, but something or someone has to define what its goals should be and what it’s trying to execute.

People talk about the future of the intelligent machines, and whether intelligent machines are going to take over and decide what to do for themselves. What one has to figure out, while given a goal, how to execute it into something that can meaningfully be automated, the actual inventing of the goal is not something that in some sense has a path to automation.

How do we figure out goals for ourselves? How are goals defined? They tend to be defined for a given human by their own personal history, their cultural environment, the history of our civilization. Goals are something that are uniquely human. It’s something that almost doesn’t make any sense. We ask, what’s the goal of our machine? We might have given it a goal when we built the machine.

The thing that makes this more poignant for me is that I’ve spent a lot of time studying basic science about computation, and I’ve realized something from that. It’s a little bit of a longer story, but basically, if we think about intelligence and things that might have goals, things that might have purposes, what kinds of things can have intelligence or purpose? Right now, we know one great example of things with intelligence and purpose and that’s us, and our brains, and our own human intelligence. What else is like that? The answer, I had at first assumed, is that there are the systems of nature. They do what they do, but human intelligence is far beyond anything that exists naturally in the world. It’s something that’s the result of all of this elaborate process of evolution. It’s a thing that stands apart from the rest of what exists in the universe. What I realized, as a result of a whole bunch of science that I did, was that is not the case.

Forward to the Future: Visions of 2045

By Hugo Angel,

DARPA asked the world and our own researchers what technologies they expect to see 30 years from now—and received insightful, sometimes funny predictions
Today—October 21, 2015—is famous in popular culture as the date 30 years in the future when Marty McFly and Doc Brown arrive in their time-traveling DeLorean in the movie “Back to the Future Part II.” The film got some things right about 2015, including in-home videoconferencing and devices that recognize people by their voices and fingerprints. But it also predicted trunk-sized fusion reactors, hoverboards and flying cars—game-changing technologies that, despite the advances we’ve seen in so many fields over the past three decades, still exist only in our imaginations.
A big part of DARPA’s mission is to envision the future and make the impossible possible. So ten days ago, as the “Back to the Future” day approached, we turned to social media and asked the world to predict: What technologies might actually surround us 30 years from now? We pointed people to presentations from DARPA’s Future Technologies Forum, held last month in St. Louis, for inspiration and a reality check before submitting their predictions.
Well, you rose to the challenge and the results are in. So in honor of Marty and Doc (little known fact: he is a DARPA alum) and all of the world’s innovators past and future, we present here some highlights from your responses, in roughly descending order by number of mentions for each class of futuristic capability:
  • Space: Interplanetary and interstellar travel, including faster-than-light travel; missions and permanent settlements on the Moon, Mars and the asteroid belt; space elevators
  • Transportation & Energy: Self-driving and electric vehicles; improved mass transit systems and intercontinental travel; flying cars and hoverboards; high-efficiency solar and other sustainable energy sources
  • Medicine & Health: Neurological devices for memory augmentation, storage and transfer, and perhaps to read people’s thoughts; life extension, including virtual immortality via uploading brains into computers; artificial cells and organs; “Star Trek”-style tricorder for home diagnostics and treatment; wearable technology, such as exoskeletons and augmented-reality glasses and contact lenses
  • Materials & Robotics: Ubiquitous nanotechnology, 3-D printing and robotics; invisibility and cloaking devices; energy shields; anti-gravity devices
  • Cyber & Big Data: Improved artificial intelligence; optical and quantum computing; faster, more secure Internet; better use of data analytics to improve use of resources
A few predictions inspired us to respond directly:
  • Pizza delivery via teleportation”—DARPA took a close look at this a few years ago and decided there is plenty of incentive for the private sector to handle this challenge.
  • Time travel technology will be close, but will be closely guarded by the military as a matter of national security”—We already did this tomorrow.
  • Systems for controlling the weather”—Meteorologists told us it would be a job killer and we didn’t want to rain on their parade.
  • Space colonies…and unlimited cellular data plans that won’t be slowed by your carrier when you go over a limit”—We appreciate the idea that these are equally difficult, but they are not. We think likable cell-phone data plans are beyond even DARPA and a total non-starter.
So seriously, as an adjunct to this crowd-sourced view of the future, we asked three DARPA researchers from various fields to share their visions of 2045, and why getting there will require a group effort with players not only from academia and industry but from forward-looking government laboratories and agencies:

Pam Melroy, an aerospace engineer, former astronaut and current deputy director of DARPA’s Tactical Technologies Office (TTO), foresees technologies that would enable machines to collaborate with humans as partners on tasks far more complex than those we can tackle today:
Justin Sanchez, a neuroscientist and program manager in DARPA’s Biological Technologies Office (BTO), imagines a world where neurotechnologies could enable users to interact with their environment and other people by thought alone:
Stefanie Tompkins, a geologist and director of DARPA’s Defense Sciences Office, envisions building substances from the atomic or molecular level up to create “impossible” materials with previously unattainable capabilities.
Check back with us in 2045—or sooner, if that time machine stuff works out—for an assessment of how things really turned out in 30 years.
# # #
Associated images posted on and video posted at may be reused according to the terms of the DARPA User Agreement, available here:
Tweet @darpa

Here’s What Developers Are Doing with Google’s AI Brain

By Hugo Angel,

Google Tensor Flow. Jeff Dean
Researchers outside Google are testing the software that the company uses to add artificial intelligence to many of its products.
Tech companies are racing to set the standard for machine learning, and to attract technical talent.
Jeff Dean speaks at a Google event in 2007. Credit: Photo by Niall Kennedy / CC BY-NC 2.0
An artificial intelligence engine that Google uses in many of its products, and that it made freely available last month, is now being used by others to perform some neat tricks, including 
  • translating English into Chinese, 
  • reading handwritten text, and 
  • even generating original artwork.
The AI software, called Tensor Flow, provides a straightforward way for users to train computers to perform tasks by feeding them large amounts of data. The software incorporates various methods for efficiently building and training simulated “deep learning” neural networks across different computer hardware.
Deep learning is an extremely effective technique for training computers to recognize patterns in images or audio, enabling machines to perform with human-like competence useful tasks such as recognizing faces or objects in images. Recently, deep learning also has shown significant promise for parsing natural language, by enabling machines to respond to spoken or written queries in meaningful ways.
Speaking at the Neural Information Processing Society (NIPS) conference in Montreal this week, Jeff Dean, the computer scientist at Google who leads the Tensor Flow effort, said that the software is being used for a growing number of experimental projects outside the company.
These include software that generates captions for images and code that translates the documentation for Tensor Flow into Chinese. Another project uses Tensor Flow to generate artificial artwork. “It’s still pretty early,” Dean said after the talk. “People are trying to understand what it’s best at.
Tensor Flow grew out of a project at Google, called Google Brain, aimed at applying various kinds of neural network machine learning to products and services across the company. The reach of Google Brain has grown dramatically in recent years. Dean said that the number of projects at Google that involve Google Brain has grown from a handful in early 2014 to more than 600 today.
Most recently, the Google Brain helped develop Smart Reply, a system that automatically recommends a quick response to messages in Gmail after it scans the text of an incoming message. The neural network technique used to develop Smart Reply was presented by Google researchers at the NIPS conference last year.
Dean expects deep learning and machine learning to have a similar impact on many other companies. “There is a vast array of ways in which machine learning is influencing lots of different products and industries,” he said. For example, the technique is being tested in many industries that try to make predictions from large amounts of data, ranging from retail to insurance.
Google was able to give away the code for Tensor Flow because the data it owns is a far more valuable asset for building a powerful AI engine. The company hopes that the open-source code will help it establish itself as a leader in machine learning and foster relationships with collaborators and future employees. Tensor Flow “gives us a common language to speak, in some sense,” Dean said. “We get benefits from having people we hire who have been using Tensor Flow. It’s not like it’s completely altruistic.
A neural network consists of layers of virtual neurons that fire in a cascade in response to input. A network “learns” as the sensitivity of these neurons is tuned to match particular input and output, and having many layers makes it possible to recognize more abstract features, such as a face in a photograph.
Tensor Flow is now one of several open-source deep learning software libraries, and its performance currently lags behind some other libraries for certain tasks. However, it is designed to be easy to use, and it can easily be ported between different hardware. And Dean says his team is hard at work trying to improve its performance.
In the race to dominate machine learning and attract the best talent, however, other companies may release competing AI engines of their own.
December 8, 2015

Robot Demonstrates Self-Awareness

By admin,

photo credit: The robot on the right was able to pass a self-awareness test. RAIR Lab/YouTube
A king is seeking a new advisor, and to do so he invites three wise men to his castle. He tells them he will place a hat on each of their heads that will be either white or blue, and at least one of the hats will be blue. The wise men must work out the color of their own hat they are wearing without talking to each other to become the advisor. After a few minutes of sitting in silence, one of the wise men stands up and guesses correctly.
This riddle (you can read the solution here) is a famous test of logic and self-awareness, and a group of researchers have now recreated a similar test in robots to prove the ability of artificial intelligence to be self-aware – within, of course, limitations.
Three humanoid Nao robots were programmed to think that two of them had been given a “dumbing pill” that prevented them from speaking. All of them were asked “which pill did you receive?” but as two of them were mute, only one was able to answer, saying: “I don’t know.” It then works out that, as it can talk, it must not have been given the pill, so it changes its answer to: “Sorry, I know now. I was able to prove that I was not given a dumbing pill.

Results of the test, carried out by the Rensselaer Artificial Intelligence and Reasoning (RAIR) Laboratory, will be presented in a paper at RO-MAN 2015 later this year. Selmer Bringsjor from the Rensselaer Polytechnic Institute, one of the test’s administrators, told Vice that it showed that a “logical and a mathematical correlate to self-consciousness” was possible, suggesting that robots can be designed in such a way that their actions and decisions resemble a degree of self-awareness.

Before you start preparing for an onslaught of Terminator-style killer robots, though, it should be noted that this test was obviously rather limited. Nonetheless, it suggests that self-awareness is something that can be programmed, and may open up new avenues for artificial intelligence. Just being able to understand the question and hear their own voice to solve the puzzle is an important skill for robots to demonstrate.

There are myriad additional steps that need to ultimately be taken,” the researchers write in their paper, “but one step at a time is the only way forward.


by Jonathan O’Callaghan
July 17, 2015

It’s No Myth: Robots and Artificial Intelligence Will Erase Jobs in Nearly Every Industry

By admin,

With the unemployment rate falling to 5.3 percent, the lowest in seven years, policy makers are heaving a sigh of relief. Indeed, with the technology boom in progress, there is a lot to be optimistic about.

  • Manufacturing will be returning to U.S. shores with robots doing the job of Chinese workers; 
  • American carmakers will be mass-producing self-driving electric vehicles; 
  • technology companies will develop medical devices that greatly improve health and longevity; 
  • we will have unlimited clean energy and 3D print our daily needs. 

The cost of all of these things will plummet and make it possible to provide for the basic needs of every human being.

I am talking about technology advances that are happening now, which will bear fruit in the 2020s.
But policy makers will have a big new problem to deal with: the disappearance of human jobs. Not only will there be fewer jobs for people doing manual work, the jobs of knowledge workers will also be replaced by computers. Almost every industry and profession will be impacted and this will create a new set of social problems — because most people can’t adapt to such dramatic change.
If we can develop the economic structures necessary to distribute the prosperity we are creating, most people will no longer have to work to sustain themselves. They will be free to pursue other creative endeavors. The problem, however, is that without jobs, they will not have the dignity, social engagement, and sense of fulfillment that comes from work. The life, liberty and pursuit of happiness that the constitution entitles us to won’t be through labor, it will have to be through other means.
It is imperative that we understand the changes that are happening and find ways to cushion the impacts.
The technology elite who are leading this revolution will reassure you that there is nothing to worry about because we will create new jobs just as we did in previous centuries when the economy transitioned from agrarian to industrial to knowledge-based. Tech mogul Marc Andreessen has called the notion of a jobless future a “Luddite fallacy,” referring to past fears that machines would take human jobs away. Those fears turned out to be unfounded because we created newer and better jobs and were much better off.
True, we are living better lives. But what is missing from these arguments is the timeframe over which the transitions occurred. The industrial revolution unfolded over centuries. Today’s technology revolutions are happening within years. We will surely create a few intellectually-challenging jobs, but we won’t be able to retrain the workers who lose today’s jobs. They will experience the same unemployment and despair that their forefathers did. It is they who we need to worry about.
The first large wave of unemployment will be caused by self-driving cars. These will provide tremendous benefit by eliminating traffic accidents and congestion, making commuting time more productive, and reducing energy usage. But they will eliminate the jobs of millions of taxi and truck drivers and delivery people. Fully-automated robotic cars are no longer in the realm of science fiction; you can see Google’s cars on the streets of Mountain View, Calif. There are also self-driving trucks on our highways and self-driving tractors on farms. Uber just hired away dozens of engineers from Carnegie Mellon University to build its own robotic cars. It will surely start replacing its human drivers as soon as its technology is ready — later in this decade. As Uber CEO Travis Kalanick reportedly said in an interview, “The reason Uber could be expensive is you’re paying for the other dude in the car. When there is no other dude in the car, the cost of taking an Uber anywhere is cheaper. Even on a road trip.
The dude in the driver’s seat will go away.

Manufacturing will be the next industry to be transformed. Robots have, for many years, been able to perform surgery, milk cows, do military reconnaissance and combat, and assemble goods. But they weren’t dexterous enough to do the type of work that humans do in installing circuit boards. The latest generation of industrial robots by ABB of Switzerland and Rethink Robotics of Boston can do this however. ABB’s robot, Yumi, can even thread a needle. It costs only $40,000.

China, fearing the demise of its industry, is setting up fully-automated robotic factories in the hope that by becoming more price-competitive, it can continue to be the manufacturing capital of the world. But its advantage only holds up as long as the supply chains are in China and shipping raw materials and finished goods over the oceans remains cost-effective. Don’t forget that our robots are as productive as theirs are; they too don’t join labor unions (yet) and will work around the clock without complaining. Supply chains will surely shift and the trickle of returning manufacturing will become a flood.

But there will be few jobs for humans once the new, local factories are built.
With advances in artificial intelligence, any job that requires the analysis of information can be done better by computers. This includes the jobs of physicians, lawyers, accountants, and stock brokers. We will still need some humans to interact with the ones who prefer human contact, but the grunt work will disappear. The machines will need very few humans to help them.
This jobless future will surely create social problems — but it may be an opportunity for humanity to uplift itself. Why do we need to work 40, 50, or 60 hours a week, after all? Just as we were better off leaving the long and hard agrarian and factory jobs behind, we may be better off without the mindless work at the office. What if we could be working 10 or 15 hours per week from anywhere we want and have the remaining time for leisure, social work, or attainment of knowledge?
Yes, there will be a booming tourism and recreation industry and new jobs will be created in these — for some people.
There are as many things to be excited about as to fear. If we are smart enough to develop technologies that solve the problems of disease, hunger, energy, and education, we can — and surely will — develop solutions to our social problems. But we need to start by understanding where we are headed and prepare for the changes. We need to get beyond the claims of a Luddite fallacy — to a discussion about the new future.
ORIGINAL: Singularity Hub

ON JUL 07, 2015

Wadhwa is a fellow at Rock Center for Corporate Governance at Stanford
University, director of research at Center for Entrepreneurship and
Research Commercialization at Duke, and distinguished fellow at
Singularity University.
past appointments include Harvard Law School, University of California
Berkeley, and Emory University. Follow him on Twitter @wadhwa.

IBM Watson Language Translation and Speech Services – General Availability

By admin,

As part of the Watson development platform’s continued expansion, IBM is today introducing the latest set of cognitive services to move into General Availability (GA) that will drive new Watson powered applications. They include the GA release of IBM Watson Language Translation (a merger of Language Identification and Machine Translation), IBM Speech to Text, and IBM Text to Speech.

These cognitive speech and language services are open to anyone, enabling application developers and IBM’s growing ecosystem to develop and commercialize new cognitive computing solutions that can do the following:
  • Translate news, patents, or conversational documents across several languages (Language Translation)
  • Produce transcripts from speech in multi-media files or conversational streams, capturing vast information for a myriad of business uses. This Watson cognitive service also benefits from a recent IBM conversational speech transcription breakthrough to advance the accuracy of speech recognition (Speech to Text)
  • Make their web, mobile, and Internet of Things applications speak with a consistent voice across all Representational State Transfer (REST) – compatible platforms (Text to Speech)
  • There are already organizations building applications with these services, since IBM opened them up in beta mode over the past year on the Watson Developer Cloud on IBM Bluemix. Developers have used these APIs to quickly build prototype applications in only two days at IBM hack-a-thons, demonstrating the versatility and ease of use of the services.
Supported Capabilities
We have made several updates since the beta releases which was inspired by feedback from our user community.
Language Translation now supports:
  • Language Identification – identifies the textual input of the language if it is one of the 62 supported languages
  • The News domain – targeted at news articles and transcripts, it translates English to and from French, Spanish, Portuguese or Arabic
  • The Conversational domain – targeted at conversational colloquialisms, it translates English to and from French, Spanish, Portuguese, or Arabic
  • The Patent domain – targeted at technical and legal terminology, it translates Spanish, Portuguese, Chinese, or Korean to English
Speech to Text now supports:
  • New wideband and narrowband telephony language support – U.S. English, Spanish, and Japanese
  • Broader vocabulary coverage, and improved accuracy for U.S. English
Text to Speech now supports:
  • U.S. English, UK English, Spanish, French, Italian, and German
  • A subset of SSML (Speech Synthesis Markup Language) for U.S. English, U.K. English, French, and German (see the documentation for more details)
  • Improved programming support for applications stored outside of Bluemix
  • Pricing and Freemium Tiers
Trial Bluemix accounts remain free. Please visit to register, and get free instant access to a 30-day trial without a credit card. Use of the Speech to Text, Text to Speech, and Language Translation services are free during this trial period.
After the trial period, pricing for Language Translation will be:
  • $0.02 per thousand characters. The first million characters per month are free.
  • An add-on charge of $3.00 per thousand characters for usage of the Patent model in Language Translation.
After the trial period, pricing for Speech to Text will be:
  • $0.02 per minute. The first thousand minutes per month are free.
  • An add-on charge of $0.02 per minute for usage of narrowband (telephony) models. The first thousand minutes per month are free.
After the trial period, pricing for Text to Speech will be:
  • $0.02 per thousand characters. The first million characters per month are free.
Transition Plan
We look forward to continuing our partnership with the many clients, business partners, and creative developers that have built innovative applications using the beta version of the four services: Speech to Text, Text to Speech, Machine Translation and Language Identification. If you have used these beta services, please migrate your applications to use the GA services by August 10, 2015. After this date the beta plans for these services will no longer be available. For details about upgrading, see:
We’re eager to see the next round of cognitive applications based on the Speech and Translation Services. For questions, join the discussion in our Forum, or send an email to [email protected] with “Speech” or “Translation” in your inquiry.
IBM is placing the power of Watson in the hands of developers and an ecosystem of partners, entrepreneurs, tech enthusiasts and students with a growing platform of Watson services (APIs) to create an entirely new class of apps and businesses that make cognitive computing systems the new computing standard.
JULY 6, 2015