The 1990s saw the successful application of new machine learning methods and the rebirth of one such method, artificial neural networks. The key protagonist in this remarkable rebirth was Geoffrey Hinton, the great-great-grandson of George Boole, inventor of Boolean Algebra, first applied by Claude Shannon to switching circuits and later, by others, to all modern computers.
Geoffrey Hinton
Studying physiology, physics, and philosophy at Cambridge University, Hinton graduated in 1970 with a degree in experimental psychology, convinced that none of these disciplines had yet done much to explain human thought. During a brief career shift into carpentry, Hinton was inspired by Donald Hebb’s book about learning and neural networks to develop his own speculations about how the brain worked. In 1972, he decided to pursue a PhD at the University of Edinburgh’s School of Artificial Intelligence, to test his theories.
Unfortunately, the high level of discord between the senior members of the School had become known to the Science Research Council, its main sponsor. The Council invited Sir James Lighthill to review the field of AI, and his report, published in early 1973, concluded that "In no part of the field have the discoveries made so far produced the major impact that was then promised."
The result was the “AI Winter” of the 1970s, just when Hinton started his academic career. To make matters worse, his advisor suddenly switched from connectionist AI to symbolic AI. Just as with other AI researchers at the time, the catalyst for the switch was Minsky and Papert’s book (see part 3 in this series). Hinton, however, fervently believed that “If you think it’s a really good idea and other people tell you it’s complete nonsense then you know you are really onto something.” He felt he knew how the brain worked and how machines could mimic the brain.
When he graduated in 1978, there were few university positions for someone with a PhD in AI, especially someone who wrote his dissertation about neural networks. Luckily for him, there was one group of researchers at the University of San Diego that was working on artificial neural networks and they offered him a postdoc.
Backpropagation breakthrough
The San Diego “parallel distributed processing” (PDP) group—their new label for connectionism—was based in the psychology department and was created by academics responding to a call to arms in the pages of Scientific American in 1979 by Frances Crick (co-discoverer of the structure of the DNA), urging the larger scientific community to at least make an attempt at understanding how the brain worked.
Hinton worked in San Diego with psychologist and mathematician David Rumelhart on advancing the state of artificial neural networks. With members of “the underground network” of connectionists at other universities he also worked on advancing another type of neural network, the Boltzmann Machine, which could create data (e.g., images) and compare it to the data it analyzed.
Luck struck again for Hinton, and in 1981, the leading stronghold of symbolic AI, Carnegie Mellon University, decided to hedge its bets and hired him as an assistant professor in its Computer Science department. He told Allan Newell, the head of the department at the time, that he did not know any computer science but Newell answered: “That’s okay. We have people here who do.”
The most important breakthrough in this new and improved stage in the evolution of artificial neural networks came in 1986 when David Rumelhart, Geoffrey Hinton and Ronald Williams published a pair of landmark papers popularizing “backpropagation” and showing its positive impact on the performance of neural networks. The term reflected a phase in which the algorithm propagated backward through its neurons the measures of the errors produced by the network’s guesses, starting with those directly connected to the outputs and going backward towards the inputs. This allowed networks with intermediate “hidden” neurons between input and output layers to learn efficiently, overcoming the limitations noted by Minsky and Papert in 1969.
Connectionism was reborn again as the next big thing.
In 1988, R. Colin Johnson and Chappell Brown published Cognizers: Neural Networks and Machines That Think, proclaiming that neural networks “can actually learn to recognize objects and understand speech just like the human brain and, best of all, they won’t need the rules, programming, or high-priced knowledge-engineering services that conventional artificial intelligence systems require… Cognizers could very well revolutionize our society and will inevitably lead to a new understanding of our own cognition.”
Also in 1988, Minsky and Papert decided to publish a new edition of Perceptrons, possibly as a reaction to the new-found enthusiasm for connectionist AI, and to the PDP researchers’ statement that their “pessimism about learning in multilayer machines was misplaced." While stressing that both “connectionist learning” and “symbolic reasoning” are “partial and manifestly useful views of a reality of which science is still far from a comprehensive understanding,” Minsky and Papert declared that “little of significance has changed since 1969.”
There was no progress, because of “the lack of adequate basic theories… no one had been able to explain why [Perceptrons] were able to recognize certain kinds of patterns and not others.” There was no progress because “the spirit of connectionism” goes “against the grain of analytical rigor.”
The following year (1989), Yann LeCun joined AT&T Bell Labs after completing a postdoc with Hinton. LeCun and other Bell Labs researchers successfully applied a backpropagation algorithm to a multi-layer convolutional neural network, recognizing handwritten ZIP codes. Convolutional neural networks automatically select the properties or characteristics of the data that are important for the task at hand and are less dependent on humans for their selection compared to other image classification algorithms.
LeCun also pioneered another version of neural networks (“graph transformer networks”) that could recognize printed and handwritten text. In the early 1990s, it was used in a widely deployed system to read numbers written on checks. Automated check clearing was an important application, as millions were processed daily. The technology was licensed by specialist providers of bank systems such as National Cash Register (NCR) and at one point it was reading more than 10% of all the checks written in the U.S.
Still at the fringe
While these were the first successful business applications of artificial neural networks, the lack of adequate, cost-effective computing resources, hindered their widespread adoption and economic impact. Given the hardware limitations at the time, it took about 3 days to train the Bell Labs network to identify ZIP codes. In the academic world, artificial neural networks were still shunned by most researchers, partly because of the continuing influence of Minsky &Co. and partly because of the rise of other machine learning methods.
Hinton moved to the University of Toronto in 1987 because he did not want to get funding for his research from the Pentagon. For his kind of AI, however, there was not much support anyway. Until about 2007, funding for connectionist AI research came almost exclusively from the Canadian Institute for Advanced Research. When connectionist AI triumphed in the 2010s, the popular press called its key three figures—Hinton, LeCun and Bengio (University of Montreal)—“the Canadian Mafia.”
Connectionist AI was eclipsed further in 1992 with the emergence of a new machine learning method, “support-vector machines” (SVM), which proved to be very effective on small datasets. Already exiled from the artificial intelligence community, connectionists once again found themselves on the fringes of the machine learning community.
In the 2000s, “neural” was a bad word and very few published papers mentioned neural networks. In one of his papers, LeCun changed “convolutional neural networks” to “convolutional networks.” Hinton submitted a paper to the annual Neural Information Processing Systems (NIPS) conference, established in the 1980s for researchers investigating biological and artificial networks. His paper was rejected because, the organizers said, they already accepted a paper on neural networks and they thought it would be unseemly to accept two in the same year.
At the 2006 conference celebrating the 50th anniversary of the Dartmouth Workshop (see part 1 in this series), Terry Sejnowski, a professor at the Salk Institute who previously collaborated with Hinton, stood up and asked Marvin Minsky: “Are you the devil?” for halting the progress of artificial neural networks with his book. Minsky explained the limitations of neural networks and pointed out they had never done what they were supposed to do. Sejnowski asked again: “Are you the devil?” Exasperated, Minsky finally answered: “Yes, I am.”
When their papers were refused at the 2007 NIPS, a small band of connectionists (Hinton, LeCun, Bengio, etc.) organized an offshoot conference, transporting participants to a different location to discuss their work, an approach that the proponents of the dominant machine learning and AI methods at the time still considered archaic and alchemistic.
But at that offshoot meeting, Hinton rebranded their efforts as “deep learning,” a brilliant marketing move similar to the one performed by McCarthy (with “artificial intelligence”) half a century before him.
The next installment of this brief overview of the history of the two major AI paradigms will describe the triumph of GPU-supported deep learning, rebranded after its early successes as “AI.”