How computers are helping solve information overload by learning to ‘understand’ text

February 23, 2010 by Amara D. Angelica

With tremendous volumes of information appearing online every day in social networks, websites, and blogs (mea culpa), the need to train computers to understand human language is now becoming critical, said Chris Manning, Stanford University associate professor of computer science and linguistics, at the American Association for the Advancement of Science (AAAS) conference in San Diego on Feb. 19.

“The problem of the age is information overload. The fundamental challenge … is how we can get computers to actually understand at least a reasonable amount of what they read.”

The Stanford Natural Language Processing Group is developing a fundamental set of tools to achieve that understanding by parsing sentences (breaking them down into their logical components). A key tool is probabilistic machine learning — reading a large number of sentences, analyzing their structure and elements, compiling statistics about verbs and nouns, and keeping track of what the subject of the sentence is doing.

Based on those statistics, for instance, a computer might conclude that “horse” is likely to be the subject of a sentence and that “hay” is something that horses might eat.

Chris Manning on computers understanding text

Building on that level of understanding, Stanford researchers have created software to sort out ambiguities in language by taking whole sentences into account when deciding what each word means. For instance, “make up” can have at least three meanings: to reconcile after a spat, to concoct a story, or to apply cosmetics. The technical solution, called “joint inference,” is to look for other words in the sentence that are statistically shown to be relevant. If the word “argument” is there, the computer will lean toward “to reconcile.”

Another technology called robust “textual inference” can read a passage of text and determine whether a conclusion about it is supported. That reading comprehension task is important because it’s similar to what people sometimes expect search engines to do; they’ll type in a conclusion (“hotels with free Wi-Fi”) and hope the engine leads them to text that supports it (“Free Wireless High-Speed Internet access in all rooms”).

To get a feel for how sentence parsing works, enter a sentence here.

To create your own natural language processing software, Stanford provides open-source statistical NLP software toolkits in Java for various major computational linguistics problems here.