The History of LLMs Series – Recap

Uncovering the origins of contemporary large language models and unveiling ChatGPT's success

We have recently concluded our series of articles delving into the history of Large Language Models (LLMs). Throughout the series, we embarked on a thorough investigation to uncover the origins and factors that contributed to the creation of contemporary LLMs. Today, we present this article as a comprehensive recap, consolidating all editions of the series in one convenient location, allowing you to catch up on this captivating story.

As we will see, over nearly a century, the field of LLMs has undergone significant metamorphoses, transforming from a discipline initially lacking computational resources and a solid theoretical foundation into a vibrant tapestry of diverse methods and approaches. Fast forward to the present, LLMs ignited a wave of exaggerated fears and even made entire countries design new policies about it.

Let’s take a moment to review how it all happened!

Note from the editor: If you want to support our work, please consider upgrading to the Premium tier or suggesting a sponsorship at [email protected]. We appreciate you!

Episode 1 — from 1930s to 1960s. We discover why LLMs, a top-notch technique in natural language processing (NLP), have their roots in the 1930s — a time when computers, as we know them today, did not yet exist. It was the era of mechanical translation, the pioneering subfield of NLP back then. Researchers were making the first attempts to describe natural language in a way that can be transmitted to a primitive punched-cards machine with almost no memory. As a result, computational linguistics emerged as a distinct discipline, driven by the quest to bridge the gap between human language and machine processing. Big dreams and aspirations of that time ended in 1964 and succumbed to the first AI winter—an often overlooked chapter in the annals of AI history.

Episode 2 — from 1960 to 1970. The rebirth of AI after the first winter came from an unexpected side — cognitive science. It was a blend of information theory, psychology, neuroscience, philosophy, linguistics, and the study of artificial intelligence coined at that time that brought fresh ideas to the NLP field:

  • Chomsky’s groundbreaking contributions to linguistic theory

  • The first program to perform automated reasoning

  • Improved computational power and the first higher-level programming languages

During this period, researchers focused on understanding written language and expanded their scope to include spoken language.

Bonus episode — AI winters. In this episode, we gathered the most comprehensive timeline of AI winters you’ll never encounter anywhere else. While AI professionals usually think they know everything about AI winters, we got many comments on how insightful our coverage was.

From our subscribers:

“Very neat stuff! Thanks for sharing this, and all the hard work you’re doing right now.”

“What a great article! I had no idea they have been trying for so long to get here.”

Episode 3 — from the 1970s to 2018. We follow the emergence and evolution of neural networks that were the main heroes of the second AI winter. Other game-changing discoveries of the 1990s: from Hidden Markov Models to Statistical Language Models to sophisticated neural networks like Bayesian Networks and Long Short-Term Memory added to the foundation of LLMs. We talked with Tomas Mikolov, the creator of Word2Vec, about how LLMs became possible. He told us:

“The current LLMs are a direct descendant of (Recurrent Neural NetworkLanguage Modeling) RNNLM which started all of this, but not many people are aware of it.”

The period from the 2000s to 2018 brought us Neural Language Models, Word embeddings, and the famous Attention Mechanism and Transformer architectures. The "pre-training and fine-tuning" learning paradigm emerged thanks to large datasets and available computational power. It offered powerful pre-trained models that can be customized and fine-tuned for various applications.

Episode 4 — from 2018 to present. This is the time when we finally meet LLMs. After experimenting with pre-trained language models researchers quickly discovered that scaling the model allows it to generalize on previously unseen tasks. These new, giant models started to be called LLMs or Foundational Models. Since 2018, there has been an ongoing trend to push the boundaries further by developing increasingly larger models. But this practice appears to exhaust itself as the CEO of OpenAI suggested that this approach “had reached a point of diminishing returns.”

Bonus episode — LLMs leading the 🤗 Open LLM leaderboard. We explored how the leading open-source LLM models: LLaMA, Falcon, Llama 2, and their chatbot counterparts: Falcon-40B-Instruct, Llama 2-Chat, and FreeWilly 2 were created.

Bonus episode — practical advice from experts on leveraging open-source LLMs. We asked five practitioners from Hugging Face, Lightning AI, Microsoft Semantic Kernel, and the Vicuna team to share their practical insights about implementing open-source LLMs. We summarised their opinion on how to effectively utilize existing models, fine-tune and deploy them, and avoid mistakes and obstacles.

Episode 5 — from present to the future. LLMs along with other generative AI models caused a frenzied interest from the public and media, produced exaggerated fears about the AI technology, and attracted huge funding. This made us ask ourselves: are we in the middle of an AI bubble that could burst any second and submerge us in AI winter? In this episode, we try to answer that question.

How did we do?

We are looking forward to your feedback - and we mean it :) Leave a comment!

Login or Subscribe to participate in polls.

What an intense series!

Stay tuned for the next fascinating series that we aim at ML and AI practitioners! Be sure you are subscribed and shared this historical series with everyone who can benefit from it.

Join the conversation

or to participate.