Inventing Machine Learning
AI and statistical analysis
Arthur Samuel went on TV in 1956 to demonstrate how the IBM 701 mainframe computer plays checkers. He was interviewed on a live morning news program, sitting remotely at the 701, with Will Rogers Jr. at the TV studio and a checkers expert who played with the computer for about an hour. Three years later, in 1959, Samuel published “Some Studies in Machine Learning Using the Game of Checkers,” in the IBM Journal of Research and Development, coining the term “machine learning.” He defined it as the “programming of a digital computer to behave in a way which, if done by human beings or animals, would be described as involving the process of learning.”
A few months after Samuel’s TV appearance, ten computer scientists convened in Dartmouth, NH, for a workshop on artificial intelligence, defined a year earlier by John McCarthy in the proposal for the workshop as “making a machine behave in ways that would be called intelligent if a human were so behaving.”
In some circles of the emerging discipline of computer science, there was no doubt about the human-like nature of the machines they were creating. Already in 1949, computer pioneer Edmund Berkeley wrote in Giant Brains or Machines that Think: “Recently there have been a good deal of news about strange giant machines that can handle information with vast speed and skill... These machines are similar to what a brain would be if it were made of hardware and wire instead of flesh and nerves… A machine can handle information; it can calculate, conclude, and choose; it can perform reasonable operations with information. A machine, therefore, can think.”
Maurice Wilkes, a prominent developer of one of those giant brains, retorted in 1953: “Berkeley's definition of what is meant by a thinking machine appears to be so wide as to miss the essential point of interest in the question, ‘Can machines think?’” Wilkes attributed this not-very-good human thinking to “a desire to believe that a machine can be something more than a machine.” In the same issue of the Proceedings of the I.R.E that included Wilkes’ article, Samuel published “Computing Bit by Bit or Digital Computers Made Easy.” Reacting to what he called “the fuzzy sensationalism of the popular press regarding the ability of existing digital computers to think,” he wrote: “The digital computer can and does relieve man of much of the burdensome detail of numerical calculations and of related logical operations, but perhaps it is more a matter of definition than fact as to whether this constitutes thinking.”
Samuel’s polite but clear position led Marvin Minsky in 1961 to single him out, according to Eric Weiss in a 1992 article about Samuel in the IEEE Annals of the History of Computing, “as one of the few leaders in the field of artificial intelligence who believed computers could not think and probably never would.” Indeed, he pursued his lifelong hobby of developing checkers-playing computer programs and a professional interest in machine learning not out of a desire to play God but because of his career's specific trajectory.
After working for 18 years at Bell Telephone Laboratories and becoming an internationally recognized authority on microwave tubes, he decided to move on at age 45. He was certain, says Weiss in his review of Samuel’s life and work, that “vacuum tubes soon will be replaced by something else.” The University of Illinois called, asking him to revitalize their EE graduate research program.
In 1948, the project to build the University’s first computer ran out of money. Samuel thought (as he recalled in an unpublished autobiography cited by Weiss) that “it ought to be dead easy to program a computer to play checkers” and that if their program could beat a checkers world champion, the attention it would generate would generate the required funds.
The next year, Samuel started his 17-year tenure with IBM, working as a “senior engineer” on the team developing the IBM 701, IBM’s first mass-produced scientific computer. The chief architect of the entire IBM 700 series was Nathaniel Rochester, later one of the participants in the Dartmouth AI workshop. Rochester was trying to decide the word length and order structure of the IBM 701, and Samuel decided to rewrite his checkers-playing program using the order structure that Rochester was proposing. In his autobiography, Samuel recalled that “I was a bit fearful that everyone in IBM would consider checker-playing program too trivial a matter, so I decided that I would concentrate on the learning aspects of the program. Thus, more or less by accident, I became one of the first people to do any serious programing for the IBM 701 and certainly one of the very first to work in the general field later to become known as ‘artificial intelligence.’ In fact, I became so intrigued with this general problem of writing a program that would appear to exhibit intelligence that it was to occupy my thoughts almost every free moment during the entire duration of my employment by IBM and indeed for some years beyond.”
In those early days of computing, however, IBM did not want to fan the widespread fears that man was losing out to machines, “so the company did not talk about artificial intelligence publicly,” observed Samuel later. Salesmen were not supposed to scare customers with speculation about future computer accomplishments. So IBM, among other activities aimed at dispelling the notion that computers were smarter than humans, sponsored the movie Desk Set, featuring a “methods engineer” (Spencer Tracy) who installs the fictional and ominous-looking “electronic brain” EMERAC, and a corporate librarian (Katharine Hepburn) telling her anxious colleagues in the research department: “They can’t build a machine to do our job—there are too many cross-references in this place.” By the movie's end, she wins a match with the computer and the engineer’s heart.
In his 1959 paper, Samuel described his approach to machine learning as particularly suited for very specific tasks, in distinction to the “Neural-Net approach,” which he thought could lead to the development of general-purpose learning machines. “The computer plays by looking ahead a few moves and by evaluating the resulting board positions much as a human player might do,” wrote Samuel. The program made the move that optimized the value of a scoring function based on the position of the board at any given time, taking into account elements such as the number of pieces on each side. Samuel designed various learning methods by which the program improved, including playing thousands of games against itself as another way of learning.
His approach to machine learning “still would work pretty well as a description of what’s known as ‘reinforcement learning,’ one of the basket of machine-learning techniques that has revitalized the field of artificial intelligence in recent years,” wrote Alexis Madrigal in “How Checkers Was Solved,” a 2017 survey of checkers-playing computer programs. “One of the men who wrote the book Reinforcement Learning, Rich Sutton, called Samuel’s research the ‘earliest’ work that’s ‘now viewed as directly relevant’ to the current AI enterprise.”
Reinforcement learning was indeed relevant to “current AI” in 2017. The year before, Google’s DeepMind combined artificial neural networks with reinforcement learning to beat Go master Lee Sedol in a five-game match. This was followed by AlphaZero, which learned to defeat world champions in three games (chess, shogi, and Go) by using only information about the rules and policies it learned from extensive self-play.
Despite this success, AI researchers still dismissed reinforcement learning as a viable practical method. In his Turing Award lecture in 2019, while lamenting the maltreatment of researchers of artificial neural networks such as himself, Geoffrey Hinton said: “There are two kinds of learning algorithms—actually three, but the third kind doesn’t work very well. That is called reinforcement learning. There is a wonderful reductio ad absurdum of reinforcement learning. It is called DeepMind.” With improved engineering, including from China’s DeepSeek, reinforcement learning became more practical and easier to deploy. Sutton and Andrew Barto, his co-author of Reinforcement Learning, won the Turing Award in 2025.
This recent debate over reinforcement learning is a minor episode in the tumultuous history of “machine learning” and “artificial intelligence,” in which the terms underwent significant metamorphoses. Various approaches to machine learning moved away from Samuel's approach to be based mainly on statistical analysis methods. As the core of the neural network approach, statistical analysis became the major opposition to Symbolic AI, the dominant approach to artificial intelligence. In both cases, however, these new developments were led by computer scientists and programmers, not by statisticians.
For example, with the invention and successful application of “backpropagation” to overcome the limitations of simple neural networks, AI as sophisticated statistical analysis was again on the ascendance in the 1990s. In “Neural Networks and Statistical Models,” Warren Sarle explained in 1994 to his worried and confused fellow statisticians that the ominous-sounding artificial neural networks “are nothing more than nonlinear regression and discriminant models that can be implemented with standard statistical software… like many statistical methods, [artificial neural networks] are capable of processing vast amounts of data and making predictions that are sometimes surprisingly accurate; this does not make them ‘intelligent’ in the usual sense of the word. Artificial neural networks ‘learn’ in much the same way that many statistical algorithms do estimation, but usually much more slowly than statistical algorithms. If artificial neural networks are intelligent, then many statistical methods must also be considered intelligent.”
Sarle provided his colleagues with a handy dictionary translating the terms used by “neural engineers” to the language of statisticians (e.g., “features” are “variables”). In anticipation of today’s “data science” (a more recent assault on traditional statistical analysis led by computer programmers) and predictions of algorithms replacing statisticians (and even scientists), Sarle reassured his fellow statisticians that no “black box” can substitute for human intelligence: “Neural engineers want their networks to be black boxes requiring no human intervention—data in, predictions out. The marketing hype claims that neural networks can be used with no experience and automatically learn whatever is required; this, of course, is nonsense. Doing a simple linear regression requires a nontrivial amount of statistical expertise.”
In a footnote to his mention of neural networks in his 1959 paper, Samuel cited Warren S. McCulloch who “has compared the digital computer to the nervous system of a flatworm,” and declared: “To extend this comparison to the situation under discussion would be unfair to the worm since its nervous system is actually quite highly organized as compared to [the most advanced artificial neural networks of the day].” In 2019, Facebook’s top AI researcher and Turing Award-winner Yann LeCun declared that “Our best AI systems have less common sense than a house cat.” In the sixty years since Samuel first published his seminal machine learning work, artificial intelligence has advanced from being not as bright as a flatworm to having less common sense than a house cat.


