- Turing Post
- Posts
- Fascinating birth of AI, first chatbots and the power of US Department of Defense (History of LLMs #2)
Fascinating birth of AI, first chatbots and the power of US Department of Defense (History of LLMs #2)
Discover the pivotal role of cognitive sciences in the rebirth of NLP and the journey from machine translation to artificial intelligence
Introduction
In the first episode of the series about LLMs history, we spoke about the bright rise and full crash of Machine translation (MT), the pioneering subfield of natural language processing (NLP). The rebirth of NLP in the 1960s has come from an unexpected side – cognitive sciences. The brightest minds gathered at fateful conferences and proposed a fresh set of ideas. This included the inception and definition of the term artificial intelligence (AI) and Chomsky's groundbreaking contributions to linguistic theory.
Those gatherings were followed by the opening of new research centers with improved computer power and memory capacity and the creation of high-level programming languages. Of course, this wouldn't be possible without funding from the Department of Defense, a major investor in NLP research.
Let’s dive in and explore the abundance of AI and computer science research from the late 1950s to around 1970. The second episode of the History of LLMs is here for you →
Fateful Conferences
Dartmouth Summer Conference
Though we mentioned the Dartmouth Summer Conference (1956) in the previous episode, we need to speak more about it as it was the tipping point at which AI was established, and the general trends and differences were clarified. The first participants became the main movers behind the development of AI and everything that allowed it to communicate with computers.
John McCarthy, Marvin Minsky | Claude Shannon and Nathaniel Rochester |
And don’t forget that the name of Artificial intelligence, which we so actively use, was coined by McCarthy in a proposal for this very conference. Authored by John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon, the proposal said, “We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. [...] An attempt will be made to find how to make machines use language, form abstractions, and concepts, solve kinds of problems now reserved for humans, and improve themselves.”
I invented the term "artificial intelligence" when we were trying to get money for a summer study, and I had a previous bad experience. In 1952, when Claude Shannon and I decided to collect a batch of studies that we hoped would contribute to launching this field, Shannon thought that "artificial intelligence" was too flashy a term and might attract unfavorable notice. So we agreed to call it "Automata studies." I was terribly disappointed when the papers we received were about Automata, and very few of them had anything to do with the goal that, at least, I was interested in. So I decided not to fly any false flags anymore, but to say that this is a study aimed at the long-term goal of achieving human-level intelligence.
Some of the participants of the Dartmouth Summer Conference
It gathered some of the brightest minds of that time, including Marvin Minsky, John McCarthy, Claude Shannon, Herbert Simon, Allen Newell, and others. As we’ll see later, these names will appear again, heavily influencing the development of natural language processing (NLP).
Symposium on Information Theory
Another important event that we have to mention is the “Symposium on Information Theory,” organized by Claude Shannon. This gathering sparked the emergence of cognitive science, often referred to as the Cognitive Revolution. Look at the papers and programs written for the event to get an impression of what this symposium meant for history:
George Miller’s paper “Magic Number Seven” established the apparent primacy of seven digits as the number beyond which human short-term memory typically erodes inaccuracy. The computer program “Logic Theorist” was written by Allen Newell, Herbert A. Simon, and Cliff Shaw of the Rand Corporation. It was the first program deliberately engineered to perform automated reasoning and has been described as "the first artificial intelligence program."
The ‘Three Models for the Description of Language’ by Noam Chomsky is worth a separate part of this article. The paper formally proves that some of the ideas that were extensively used in the previous language systems are not valid. Specifically, it was stochastic grammar and n-order statistical approximations. Chomsky also proposed his classification of grammar, as the title of the paper suggests.
As Miller himself later recalls,
I went away from the Symposium with a strong conviction, more intuitive than rational, that human experimental psychology, theoretical linguistics and computer simulation of rational cognitive processes were all pieces of a larger whole, and that the future would see progressive elaboration and coordination of their shared concerns.
The Intersection of Defense, Technology, and Linguistics
The SAGE Project: A Catalyst for Computing Innovation and Social Science Advancements
Nothing could ever be done without the Ministry of Defense. A revolutionary air defense system, The Semi-Automatic Ground Environment (SAGE), was conceived and developed in the post-World War II era, borne out of the palpable fear of nuclear warfare during the Cold War. As the United States and the Soviet Union amassed formidable nuclear arsenals, there was an urgent need for advanced defense systems to protect against potential air attacks. Initiated in the early 1950s and managed by MIT's Lincoln Laboratory, the SAGE project was the technological answer to this threat. Remarkably, this system was one of the first real-time, large-scale computer systems, marking a significant breakthrough in computing history. Despite its initial military application, SAGE's innovative computing technologies had a far-reaching impact, accelerating the broader field of information technology and setting precedents for future computing systems.
Lincoln Laboratory
The implementation of Project SAGE, which had unprecedented memory and storage requirements, attracted substantial funding for computer hardware. This influx of funding also led to the establishment of "hard" social science departments characterized by a strong emphasis on quantitative methods. This included departments in linguistics, led by Noam Chomsky, as well as psychology. Additionally, the Institute's pure science facilities were strengthened and expanded as a result of this significant financial support.
Chomsky's groundbreaking contributions to linguistic theory
One of the most important contributions to the study of language was the theory of formal languages introduced by Noam Chomsky, which has developed as a mathematical study, not a linguistic one, and has strongly influenced computer science.
Claude Shannon and Warren Weaver proposed using the theory of stochastic processes to model natural languages. From the viewpoint of linguistic theory, the most significant mathematical model was the finite state machine introduced by Markov, known as the Markov model. Other researchers took this idea with great enthusiasm, as we reflected in our previous edition of this series. As shown by time and practice, this approach was far from idealistic. But there was no formal proof of the insufficiency of these models.
Chomsky’s paper presented at the symposium, ‘Three Models for the Description of Language,’ did that. In it, Chomsky proved that no finite state Markov process can serve as an English grammar. In simple words, he proved his intuition that the grammar of natural language simply could not be comprehensively modeled as a stochastic process. Furthermore, he proved that n-order statistical approximations, which were previously considered a good tool to describe natural language, must be rejected. The famous example he uses in the book that the sentence “Colorless green ideas sleep furiously” was classified as improbable to the same extent that “Furiously sleep ideas green colorless”; any speaker of English can recognize the former as grammatically correct, and the latter as incorrect, and Chomsky felt the same should be expected of machine models.
The year after the symposium, another signal event was the publication in 1957 of Noam Chomsky's Syntactic Structures, an influential work and an elaboration of his teacher Zellig Harris's model of transformational generative grammar.
According to the authors of the Psycholinguistics Language, Mind and World book, Chomsky's presentation is recognized as one of the most significant studies of the 20th century. As David Lightfoot writes in the introduction to the 2nd edition of the book, Noam Chomsky’s Syntactic Structures was the snowball that began the avalanche of the modern “cognitive revolution.” The book has only 118 pages and contains lecture notes from Chomsky's course for undergraduate MIT students.
The Early Evolution of Programming Languages
At the same time as Chomsky was working on human linguistics, the time came to create special languages for computers. This early research was pioneered by institutions like the Massachusetts Institute of Technology (MIT), the Carnegie Melon University (CMU), and Stanford.
CMU researchers created influential programming languages, including ALGOL by Alan Perlis and the Information Processing Language (IPL) by Allen Newell, Cliff Shaw, and Herbert A. Simon. Their collaboration is very interesting. Allen Newell joined Prof. Herbert A. Simon’s research team as a Ph.D. student in 1955. Just before the Dartmouth Summer Conference, Simon creates a “thinking machine” – enacting a mental process by breaking it down into its simplest steps. And later that year, they develop the aforementioned program Logic Theorist. These languages were critical for building the first models for understanding natural language.
Logo of the LISP programming language | Cover of “The Fortran Automatic Coding System,” the first book about FORTRAN |
At MIT, Marvin Minsky and John McCarthy developed the LISP programming language and founded the MIT Artificial Intelligence Laboratory, contributing to AI's growth. Though Marvin Minsky is currently remembered not that often, he was one of the most influential researchers in the field. He allegedly recommended the terms behind HAL’s acronym for Stanley Kubrik’s Space Odyssey and was an adviser on the film set.
Father of the term “artificial intelligence” also created the language that would be in the core of AI. As Paul Graham wrote:
IBM's John Backus led the development of FORTRAN (formula translation), a breakthrough algorithmic language that made it convenient to have subprograms for common mathematical operations and built libraries of them. The creation of FORTRAN marked a significant stage in the development of computer programming languages. Previous programming was written in machine language or assembly language, which required the programmer to write instructions in binary or hexadecimal arithmetic. FORTRAN enabled the rapid writing of computer programs that ran nearly as efficiently as programs that had been laboriously hand-coded in machine language.
The swift evolution of NLP research
Alongside theoretical development, many prototype systems were developed to demonstrate the effectiveness of particular principles. To replace the concept of translating with primitive word substitution comes language understanding. Logically, the NLP was mainly revived in the form of two big research branches, both related to language understanding. First, in the written form; second – in the spoken one.
Some of the earliest works in AI used networks or circuits of connected units to simulate intelligent behavior. These approaches were called connectionist. But in the late 1950s, most of these approaches were abandoned when researchers began to explore symbolic reasoning, following the success of programs like the Logic Theorist and the General Problem Solver.
Understanding written language
The years following the conferences were marked by extensive research and the emergence of new ideas, particularly in the field of language understanding.
According to The Handbook of Artificial Intelligence, all the models can be divided into four ideological groups:
Early models were severely restricted in terms of input and domain.
The text-based approach was to store a representation of the text itself in the database, using a variety of clever indexing schemes to retrieve material containing specific words or phrases.
The limited logic-based paradigm tried to deal with answers to questions that were not stored explicitly in the database of a given model.
Finally, a knowledge-based approach was created to encounter the relationship between sentences and story structure.
Early models
The earliest natural language programs sought to achieve only limited results in specific, constrained domains. These programs, like Green's BASEBALL, Lindsay's SAD-SAM, Bobrow's STUDENT, and Weizenbaum's ELIZA, used ad hoc data structures to store facts about a limited domain.
Input sentences were restricted to simple declarative and interrogative forms and were scanned by the programs for predeclared keywords or patterns that indicated known objects and relationships. These early systems were able to ignore many of the complexities of language and sometimes achieve impressive results in answering questions.
Introduction from the BASEBALL paper | Description of ELIZA from the original paper |
BASEBALL (1961): The BASEBALL question-answering system, developed by Bert Green and colleagues at MIT's Lincoln Laboratories, operated on an information retrieval program using the IPL-V programming language. It focused on American League games from a single year, processing user input questions that adhered to specific criteria.
SAD-SAM (1963): Created by Robert Lindsay at the Carnegie Institute of Technology, SAD-SAM utilized the IPL-V list-processing language. It accepted English sentences, built a database, and provided answers based on stored facts using a Basic English vocabulary.
SLIP language and ELIZA (1963): Joseph Weizenbaum developed the SLIP language, which later served as the programming language for ELIZA. ELIZA, developed at MIT in 1966, was a chatbot program simulating conversations between a patient and a psychotherapist.
STUDENT (1968): Developed by Daniel Bobrow at MIT, STUDENT was a pattern-matching natural language program designed to solve high-school-level algebra problems.
Text-based approach
Another early approach to NLP called the text-based approach, was to store a representation of the text itself in their databases, using a variety of clever indexing schemes to retrieve material containing specific words or phrases. Though more general than their predecessors, these programs still failed to notice even obvious implications of the sentences in the database.
PROTOSYNTHEX-1 (1966): Designed by Simmons, PROTOSYNTHEX-1 used LISP language and could make a conceptual dictionary that associates with each English word the syntactic information, definitional material, and references to the contexts in which it has been used to define other words. The resulting structure serves as a powerful vehicle for research on the logic of question answering.
Semantic Memory by Quillian (1968): Quillian's work on semantic networks, developed during the SYNTHEX project at the System Development Corporation, was one of the earliest in AI.
Limited logic-based approach
To approach the problem of how to characterize and use the meaning of sentences, a third group of programs was developed during the mid-1960s. In these limited-logic systems, the information in the database was stored in some formal notation, and mechanisms were provided for translating input sentences into this internal form. The overall goal of these systems was to perform inferences on the database to find answers to questions that were not stored explicitly in the database.
Abstract of the original SIR paper | An underlying scheme of the CONVERSE system |
SIR (1964): Written by Bertram Raphael as part of his MIT thesis research, SIR (Semantic Information Retrieval) used LISP and introduced a generalized model and a formal logical system called SIR1.
"English for Computer" and "DEACON" (1966): Thompson presented papers on "English for Computer" and "DEACON" at the AFIPS Fall Joint Computing Conference, exploring the relationship between English and programming languages.
Kellogg's CONVERSE (1968): Kellogg presented an early experimental system called CONVERSE, which focused on natural language compilers for online data management.
Quillian's TLC (1969): Quillian developed the Teachable Language Comprehender (TLC), which aimed to comprehend English text.
Knowledge-based approach
Most works in natural language understanding before 1973 involved parsing individual sentences in isolation. It was clear that the context provided by the structure of the story facilitates sentence comprehension. Researchers started to incorporate some knowledge representation schemes into their programs—representations like logic, procedural semantics, semantic networks, or frames.
Illustration of how SHRDLU works with the “Pick up a big red block” command
SHRDLU (1971): Developed by Terry Winograd at MIT, SHRDLU was a program designed to understand natural language and engage in conversations about the BLOCKS world.
LUNAR (1972): Developed by William Woods at BBN, LUNAR was an experimental information retrieval system that facilitated communication in everyday English.
Minsky's FRAMES (1974): Marvin Minsky proposed the concept of frames as a data structure to represent stereotyped situations, facilitating common-sense thinking in reasoning, language, memory, and perception. It’s one of the methods for knowledge representation.
Understanding spoken language
The development here was not as active and even quite modest. In 1952, Bell Laboratories introduced "Audrey," the Automatic Digit Recognition machine capable of recognizing spoken digits with an impressive 90% accuracy (but only if spoken by its inventor). While it was initially designed to assist toll operators, its high cost and limited ability to recognize different voices and giant size made it impractical for widespread use.
1952 Bell Labs Audrey. Not shown is the six-foot-high rack of supporting electronics.
Only ten years later, in 1962, IBM showcased Shoebox, the system that could recognize and differentiate between 16 words. Despite improvements, users still had to speak slowly and take pauses for the machine to accurately capture their speech.
The serious game started in 1971, when the Advanced Research Projects Agency of the U.S. Department of Defense (DARPA), a major sponsor of AI research, funded a five-year program in speech recognition research to make a breakthrough in understanding connected speech.
In the early 1970s, the Hidden Markov Modeling (HMM) approach to speech & voice recognition was shared with several DARPA contractors, including IBM. A complex mathematical pattern-matching strategy, HMM played a crucial role and was eventually adopted by all the leading speech & voice recognition companies, including Dragon Systems, IBM, Philips, AT&T, and others.
Conclusion
The failure of MT in the late 1960s was disappointing but didn't stop the development of NLP. Now, it was fueled by the cognitive sciences and funding from the Department of Defense. This period witnessed the establishment of AI as a field, the development of programming languages like LISP and FORTRAN, and the emergence of different approaches to language understanding. However, the unfulfilled expectations set during this time eventually led to budget cuts and halted research, which became known as AI winters. This phenomenon changed the narrative around AI for many decades! Therefore, we are working on a bonus episode dedicated to AI winters and what was happening to 'language and machines' during those cold times. Stay tuned!
History of LLMs by Turing Post:
To be continued…
If you liked this issue, subscribe to receive the third episode of the History of LLMs straight to your inbox. Oh, and please, share this article with your friends and colleagues. Because... To fundamentally push the research frontier forward, one needs to thoroughly understand what has been attempted in history and why current models exist in present forms.
Reply