Google simulates brain networks to recognize speech and images

October 5, 2012

We consider the problem of building high-level, class-specific feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images? Credit: Google Research)

This summer Google set a new landmark in the field of artificial intelligence with software that learned how to recognize cats, people, and other things simply by watching YouTube videos (see “Self-Taught Software“).

That technology, modeled on how brain cells operate, is now being put to work making Google’s products smarter, with speech recognition being the first service to benefit, Technology Review reports.

Google’s learning software is based on simulating groups of connected brain cells that communicate and influence one another. When such a neural network, as it’s called, is exposed to data, the relationships between different neurons can change. That causes the network to develop the ability to react in certain ways to incoming data of a particular kind — and the network is said to have learned something.

Neural networks have been used for decades in areas where machine learning is applied, such as chess-playing software or face detection. Google’s engineers have found ways to put more computing power behind the approach than was previously possible, creating neural networks that can learn without human assistance and are robust enough to be used commercially, not just as research demonstrations.

The company’s neural networks decide for themselves which features of data to pay attention to, and which patterns matter, rather than having humans decide that, say, colors and particular shapes are of interest to software trying to identify objects.

Google is now using these neural networks to recognize speech more accurately, a technology increasingly important to Google’s smartphone operating system, Android, as well as the search app it makes available for Apple devices (see “Google’s Answer to Siri Thinks Ahead“). “We got between 20 and 25 percent improvement in terms of words that are wrong,” says Vincent Vanhoucke, a leader of Google’s speech-recognition efforts. “That means that many more people will have a perfect experience without errors.” The neural net is so far only working on U.S. English, and Vanhoucke says similar improvements should be possible when it is introduced for other dialects and languages.

Other Google products will likely improve over time with help from the new learning software. The company’s image search tools, for example, could become better able to understand what’s in a photo without relying on surrounding text. And Google’s self-driving cars (see “Look, No Hands“) and mobile computer built into a pair of glasses (see “You Will Want Google’s Goggles“) could benefit from software better able to make sense of more real-world data.

The new technology grabbed headlines back in June of this year, when Google engineers published results of an experiment that threw 10 million images grabbed from YouTube videos at their simulated brain cells, running 16,000 processors across a thousand computers for 10 days without pause.

“Most people keep their model in a single machine, but we wanted to experiment with very large neural networks,” says Jeff Dean, an engineer helping lead the research at Google. “If you scale up both the size of the model and the amount of data you train it with, you can learn finer distinctions or more complex features.”

The neural networks that come out of that process are more flexible. “These models can typically take a lot more context,” says Dean, giving an example from the world of speech recognition. If, for example, Google’s system thought it heard someone say “I’m going to eat a lychee,” but the last word was slightly muffled, it could confirm its hunch based on past experience of phrases because “lychee” is a fruit and is used in the same context as “apple” or “orange.”

Dean says his team is also testing models that understand both images and text together. “You give it ‘porpoise’ and it gives you pictures of porpoises,” he says. “If you give it a picture of a porpoise, it gives you ‘porpoise’ as a word.”

A next step could be to have the same model learn the sounds of words as well. Being able to relate different forms of data like that could lead to speech recognition that gathers extra clues from video, for example, and it could boost the capabilities of Google’s self-driving cars by helping them understand their surroundings by combining the many streams of data they collect, from laser scans of nearby obstacles to information from the car’s engine.

Google’s work on making neural networks brings us a small step closer to one of the ultimate goals of AI — creating software that can match animal or perhaps even human intelligence, says Yoshua Bengio, a professor at the University of Montreal who works on similar machine-learning techniques. “This is the route toward making more general artificial intelligence — there’s no way you will get an intelligent machine if it can’t take in a large volume of knowledge about the world,” he says.

In fact, the workings of Google’s neural networks operate in similar ways to what neuroscientists know about the visual cortex in mammals, the part of the brain that processes visual information, says Bengio. “It turns out that the feature learning networks being used [by Google] are similar to the methods used by the brain that are able to discover objects that exist.”

However, he is quick to add that even Google’s neural networks are much smaller than the brain, and that they can’t perform many things necessary to intelligence, such as reasoning with information collected from the outside world.

Dean is also careful not to imply that the limited intelligences he’s building are close to matching any biological brain. But he can’t resist pointing out that if you pick the right contest, Google’s neural networks have humans beat.

“We are seeing better than human-level performance in some visual tasks,” he says, giving the example of labeling, where house numbers appear in photos taken by Google’s Street View car, a job that used to be farmed out to many humans.

“They’re starting to use neural nets to decide whether a patch [in an image] is a house number or not,” says Dean, and they turn out to perform better than humans. It’s a small victory — but one that highlights how far artificial neural nets are behind the ones in your head. “It’s probably that it’s not very exciting, and a computer never gets tired,” says Dean. It takes real intelligence to get bored.