Kurzweil responds: Don’t underestimate the Singularity

October 20, 2011 by Ray Kurzweil

Last week, Paul Allen and a colleague challenged the prediction that computers will soon exceed human intelligence. Now Ray Kurzweil, the leading proponent of the “Singularity,” offers a rebuttal. — Technology Review, Oct. 10, 2011.

Although Paul Allen paraphrases my 2005 book, The Singularity Is Near, in the title of his essay (cowritten with his colleague Mark Greaves), it appears that he has not actually read the book. His only citation is to an essay I wrote in 2001 (“The Law of Accelerating Returns“) and his article does not acknowledge or respond to arguments I actually make in the book.

When my 1999 book, The Age of Spiritual Machines, was published, and augmented a couple of years later by the 2001 essay, it generated several lines of criticism, such as Moore’s law will come to an end, hardware capability may be expanding exponentially but software is stuck in the mud, the brain is too complicated, there are capabilities in the brain that inherently cannot be replicated in software, and several others. I specifically wrote The Singularity Is Near to respond to those critiques.

I cannot say that Allen would necessarily be convinced by the arguments I make in the book, but at least he could have responded to what I actually wrote. Instead, he offers de novo arguments as if nothing has ever been written to respond to these issues. Allen’s descriptions of my own positions appear to be drawn from my 10-year-old essay. While I continue to stand by that essay, Allen does not summarize my positions correctly even from that essay.

Allen writes that “the Law of Accelerating Returns (LOAR). . . is not a physical law.” I would point out that most scientific laws are not physical laws, but result from the emergent properties of a large number of events at a finer level. A classical example is the laws of thermodynamics (LOT). If you look at the mathematics underlying the LOT, they model each particle as following a random walk. So by definition, we cannot predict where any particular particle will be at any future time. Yet the overall properties of the gas are highly predictable to a high degree of precision according to the laws of thermodynamics. So it is with the law of accelerating returns. Each technology project and contributor is unpredictable, yet the overall trajectory as quantified by basic measures of price-performance and capacity nonetheless follow remarkably predictable paths.

If computer technology were being pursued by only a handful of researchers, it would indeed be unpredictable. But it’s being pursued by a sufficiently dynamic system of competitive projects that a basic measure such as instructions per second per constant dollar follows a very smooth exponential path going back to the 1890 American census. I discuss the theoretical basis for the LOAR extensively in my book, but the strongest case is made by the extensive empirical evidence that I and others present.

Allen writes that “these ‘laws’ work until they don’t.” Here, Allen is confusing paradigms with the ongoing trajectory of a basic area of information technology. If we were examining the trend of creating ever-smaller vacuum tubes, the paradigm for improving computation in the 1950s, it’s true that this specific trend continued until it didn’t. But as the end of this particular paradigm became clear, research pressure grew for the next paradigm. The technology of transistors kept the underlying trend of the exponential growth of price-performance going, and that led to the fifth paradigm (Moore’s law) and the continual compression of features on integrated circuits. There have been regular predictions that Moore’s law will come to an end. The semiconductor industry’s roadmap titled projects seven-nanometer features by the early 2020s. At that point, key features will be the width of 35 carbon atoms, and it will be difficult to continue shrinking them. However, Intel and other chip makers are already taking the first steps toward the sixth paradigm, which is computing in three dimensions to continue exponential improvement in price performance. Intel projects that three-dimensional chips will be mainstream by the teen years. Already three-dimensional transistors and three-dimensional memory chips have been introduced.

This sixth paradigm will keep the LOAR going with regard to computer price performance to the point, later in this century, where a thousand dollars of computation will be trillions of times more powerful than the human brain1. And it appears that Allen and I are at least in agreement on what level of computation is required to functionally simulate the human brain2.

Allen then goes on to give the standard argument that software is not progressing in the same exponential manner of hardware. In The Singularity Is Near, I address this issue at length, citing different methods of measuring complexity and capability in software that demonstrate a similar exponential growth. One recent study (“Report to the President and Congress, Designing a Digital Future: Federally Funded Research and Development in Networking and Information Technology” by the President’s Council of Advisors on Science and Technology) states the following:

“Even more remarkable — and even less widely understood — is that in many areas, performance gains due to improvements in algorithms have vastly exceeded even the dramatic performance gains due to increased processor speed. The algorithms that we use today for speech recognition, for natural language translation, for chess playing, for logistics planning, have evolved remarkably in the past decade … Here is just one example, provided by Professor Martin Grötschel of Konrad-Zuse-Zentrum für Informationstechnik Berlin. Grötschel, an expert in optimization, observes that a benchmark production planning model solved using linear programming would have taken 82 years to solve in 1988, using the computers and the linear programming algorithms of the day. Fifteen years later—in 2003—this same model could be solved in roughly one minute, an improvement by a factor of roughly 43 million. Of this, a factor of roughly 1,000 was due to increased processor speed, whereas a factor of roughly 43,000 was due to improvements in algorithms! Grötschel also cites an algorithmic improvement of roughly 30,000 for mixed integer programming between 1991 and 2008. The design and analysis of algorithms, and the study of the inherent computational complexity of problems, are fundamental subfields of computer science.”

I cite many other examples like this in the book3.

Regarding AI, Allen is quick to dismiss IBM’s Watson as narrow, rigid, and brittle. I get the sense that Allen would dismiss any demonstration short of a valid passing of the Turing test. I would point out that Watson is not so narrow. It deals with a vast range of human knowledge and is capable of dealing with subtle forms of language, including puns, similes, and metaphors. It’s not perfect, but neither are humans, and it was good enough to get a higher score than the best two human Jeopardy! players put together.

Allen writes that Watson was put together by the scientists themselves, building each link of narrow knowledge in specific areas. Although some areas of Watson’s knowledge were programmed directly, according to IBM, Watson acquired most of its knowledge on its own by reading natural language documents such as encyclopedias. That represents its key strength. It not only is able to understand the convoluted language in Jeopardy! queries (answers in search of a question), but it acquired its knowledge by reading vast amounts of natural-language documents. IBM is now working with Nuance (a company I originally founded as Kurzweil Computer Products) to have Watson read tens of thousands of medical articles to create a medical diagnostician.

A word on the nature of Watson’s “understanding” is in order here. A lot has been written that Watson works through statistical knowledge rather than “true” understanding. Many readers interpret this to mean that Watson is merely gathering statistics on word sequences. The term “statistical information” in the case of Watson refers to distributed coefficients in self-organizing methods such as Markov models. One could just as easily refer to the distributed neurotransmitter concentrations in the human cortex as “statistical information.” Indeed, we resolve ambiguities in much the same way that Watson does by considering the likelihood of different interpretations of a phrase.

Allen writes: “Every structure [in the brain] has been precisely shaped by millions of years of evolution to do a particular thing, whatever it might be. It is not like a computer, with billions of identical transistors in regular memory arrays that are controlled by a CPU with a few different elements. In the brain, every individual structure and neural circuit has been individually refined by evolution and environmental factors.”

Allen’s statement that every structure and neural circuit is unique is simply impossible. That would mean that the design of the brain would require hundreds of trillions of bytes of information. Yet the design of the brain (like the rest of the body) is contained in the genome. And while the translation of the genome into a brain is not straightforward, the brain cannot have more design information than the genome. Note that epigenetic information (such as the peptides controlling gene expression) do not appreciably add to the amount of information in the genome. Experience and learning do add significantly to the amount of information, but the same can be said of AI systems. I show in The Singularity Is Near that after lossless compression (due to massive redundancy in the genome), the amount of design information in the genome is about 50 million bytes, roughly half of which pertains to the brainsup>4. That’s not simple, but it is a level of complexity we can deal with and represents less complexity than many software systems in the modern world.

How do we get on the order of 100 trillion connections in the brain from only tens of millions of bytes of design information? Obviously, the answer is through redundancy. There are on the order of a billion pattern-recognition mechanisms in the cortex. They are interconnected in intricate ways, but even in the connections there is massive redundancy. The cerebellum also has billions of repeated patterns of neurons. It is true that the massively repeated structures in the brain learn different items of information as we learn and gain experience, but the same thing is true of artificially intelligent systems such as Watson.

Dharmendra S. Modha, manager of cognitive computing for IBM Research, writes: “…neuroanatomists have not found a hopelessly tangled, arbitrarily connected network, completely idiosyncratic to the brain of each individual, but instead a great deal of repeating structure within an individual brain and a great deal of homology across species … The astonishing natural reconfigurability gives hope that the core algorithms of neurocomputation are independent of the specific sensory or motor modalities and that much of the observed variation in cortical structure across areas represents a refinement of a canonical circuit; it is indeed this canonical circuit we wish to reverse engineer.”

Allen articulates what I describe in my book as the “scientist’s pessimism.” Scientists working on the next generation are invariably struggling with that next set of challenges, so if someone describes what the technology will look like in 10 generations, their eyes glaze over. One of the pioneers of integrated circuits was describing to me recently the struggles to go from 10 micron (10,000-nanometer) feature sizes to five-micron (5,000 nanometers) features over 30 years ago. They were cautiously confident of this goal, but when people predicted that someday we would actually have circuitry with feature sizes under one micron (1,000 nanometers), most of the scientists struggling to get to five microns thought that was too wild to contemplate. Objections were made on the fragility of circuitry at that level of precision, thermal effects, and so on. Well, today, Intel is starting to use chips with 22-nanometer gate lengths.

We saw the same pessimism with the genome project. Halfway through the 15-year project, only 1 percent of the genome had been collected, and critics were proposing basic limits on how quickly the genome could be sequenced without destroying the delicate genetic structures. But the exponential growth in both capacity and price performance continued (both roughly doubling every year), and the project was finished seven years later. The project to reverse-engineer the human brain is making similar progress. It is only recently, for example, that we have reached a threshold with noninvasive scanning techniques that we can see individual interneuronal connections forming and firing in real time.

Allen’s “complexity brake” confuses the forest with the trees. If you want to understand, model, simulate, and re-create a pancreas, you don’t need to re-create or simulate every organelle in every pancreatic Islet cell. You would want, instead, to fully understand one Islet cell, then abstract its basic functionality, and then extend that to a large group of such cells. This algorithm is well understood with regard to Islet cells. There are now artificial pancreases that utilize this functional model being tested. Although there is certainly far more intricacy and variation in the brain than in the massively repeated Islet cells of the pancreas, there is nonetheless massive repetition of functions.

Allen mischaracterizes my proposal to learn about the brain from scanning the brain to understand its fine structure. It is not my proposal to simulate an entire brain “bottom up” without understanding the information processing functions. We do need to understand in detail how individual types of neurons work, and then gather information about how functional modules are connected. The functional methods that are derived from this type of analysis can then guide the development of intelligent systems. Basically, we are looking for biologically inspired methods that can accelerate work in AI, much of which has progressed without significant insight as to how the brain performs similar functions. From my own work in speech recognition, I know that our work was greatly accelerated when we gained insights as to how the brain prepares and transforms auditory information.

The way that these massively redundant structures in the brain differentiate is through learning and experience. The current state of the art in AI does, however, enable systems to also learn from their own experience. The Google self-driving cars (which have driven over 140,000 miles through California cities and towns) learn from their own driving experience as well as from Google cars driven by human drivers. As I mentioned, Watson learned most of its knowledge by reading on its own.

It is true that Watson is not quite at human levels in its ability to understand human language (if it were, we would be at the Turing test level now), yet it was able to defeat the best humans. This is because of the inherent speed and reliability of memory that computers have. So when a computer does reach human levels, which I believe will happen by the end of the 2020s, it will be able to go out on the Web and read billions of pages as well as have experiences in online virtual worlds. Combining human-level pattern recognition with the inherent speed and accuracy of computers will be very powerful. But this is not an alien invasion of intelligence machines—we create these tools to make ourselves smarter. I think Allen will agree with me that this is what is unique about the human species: we build these tools to extend our own reach.

1Chapter 2, The Singularity is Near by Ray Kurzweil, Viking, 2005

2See Endnote 2 in “The Singularity Isn’t Near” by Paul G. Allen and Mark Greaves

3Chapter 9, The Singularity is Near

4Chapter 4, The Singularity is Near