Google has been excelling in the AI sphere for a while, and its Assistant is proof of how far it has come.
Not only does it perform most actions through voice recognition, but it also provides text feedback in a voice that is ever so close to sounding as natural as humans. From stiff and unnatural to smooth and life-like voice generation, Google has come a long way.
After investigating a recently published Google research paper (via Quartz), it looks like we might be closer to this reality than you might think. A research paper published by Google this month talks about a text-to-speech system they call Tacotron 2. In it, the researchers claim the AI can imitate human voice with excellent accuracy.
The system is the second official generation of the technology by Google, which consists of two deep neural networks. The first network translates the text into a spectrogram (pdf), a visual way to represent audio frequencies over time. That spectrogram is then fed into WaveNet, a system from an AI research lab. It reads the chart and generates the corresponding audio elements accordingly.
In the last section, Google provides side-by-side examples of a human voice alongside the AI created one — with outstanding results.
Here’s the AI generated voice and also the human version of the same.
“George Washington was the first President of the United States.”
The Google researchers also demonstrate that Tacotron 2 can handle hard-to-pronounce words and names, as well as improvise it. For instance, capitalized words are stressed, as someone would do when indicating that specific word is an important part of a sentence.
Yet there is still a vast gap between an AI that can read aloud like a human and one that can converse like a human. However, the system is only trained to mimic the one female voice. To speak like a male or different female, Google would need to train the system again.