- Turing Post
- Posts
- Token 1.5: From Chain-of-Thoughts to Skeleton-of-Thoughts, and everything in between
Token 1.5: From Chain-of-Thoughts to Skeleton-of-Thoughts, and everything in between
How to distinguish all the СoT-inspired concepts and use them for your projects
Introduction
The groundbreaking paper by Google Brain at NeurIPS 2022 introduced the world to Chain-of-Thought Prompting (CoT). That changed a lot in prompting. But it didn’t stop there; it kicked off a whole new area of study, giving birth to various "chain" spin-offs and related research.
Just to give you an impression of the impact: a search for the keyword “chain-of-thought” pulls up 461 papers on Semantic Scholar and 374 on Arxiv. That’s a lot of papers! But we're not here to give you an exhaustive list you can find on any research platform. We aim to explore the papers with new ideas that have sprung from the original CoT research, map its influence, and decode the novelty and foundational principles of the followers.
In chronological order, we unfold the Chain-of-thought Lineage, with the following terms to explain:
Chain-of-thought prompting (recapping the fundamentals)
Self-consistency
Zero-Shot Chain-of-Thought (Zero-shot-CoT)
Automatic-Chain-of-Thought (Auto-CoT)
Program-of-Thoughts Prompting (PoT)
Multimodal Chain-of-Thought Reasoning (Multimodal-CoT)
Tree-of-Thoughts (ToT)
Graph-of-Thoughts (GoT)
Algorithm-of-Thoughts (AoT)
Skeleton-of-Thought (SoT)
No more confusion around CoT. Please use this up-to-date "dictionary" as a reliable reference point for understanding the complexities of this evolving field.
Recapping the fundamentals
Before diving into the nuanced world of chain-of-thought prompting – a specialized form of basic prompting – it's essential to revisit the foundational terminology in the prompting ecosystem.
Zero-shot prompting
The term "zero-shot prompting" derives from the concept of zero-shot learning*.
*Zero-shot learning is a model's ability to complete a task without having received or used any training examples.
When we apply this intuition to prompting, this means that our prompt doesn’t contain any additional information for the model and provide any examples. In other words, the model can only use the knowledge it acquired during its training to produce the output.
Task: Named Entity Recognition
Prompt: "Identify the name of the person in the sentence: 'Steve Jobs founded Apple.'"
Response from Model: Steve Jobs
In this brief example, the language model identifies the name "Steve Jobs" based on the prompt, without requiring any previous examples for named entity recognition. This effectively demonstrates the power of zero-shot prompting in action.
Zero-shot prompting contributes to the widespread adoption of LLMs. But sometimes it’s just not enough for a desired outcome. Adding a few examples for the model can help improve its output. And this is what we call few-shot prompting.
Few-shot prompting
Similar to its zero-shot counterpart, few-shot prompting also finds its roots in a similarly named learning approach: few-shot learning*.
*Few-shot learning is the process when a pre-trained model is given only a few examples to learn about the new, previously unseen category of data.
Few-shot prompting can be used as a technique to enable in-context learning* where we provide demonstrations in the prompt to steer the model to better performance. The demonstrations serve as conditioning for subsequent examples where we would like the model to generate a response.
*In-context learning (ICL) is a specific method of prompt engineering where demonstrations of the task are provided to the model as part of the prompt (in natural language).
Task: Sentiment Analysis
Prompt:
"The sun is shining." - Positive
"I lost my keys." - Negative
"How is the statement 'The movie was a hit' classified?"
Response from Model: Positive
In this example, the first two statements serve as demonstrations to guide the model's behavior. The model then classifies the third statement as "Positive," using the preceding examples for contextual understanding.
Having established these foundational techniques, we now arrive at the intriguing world of chain-of-thought prompting. Let's talk about it.
Chain-of-thought prompting
For tasks demanding intricate reasoning and sequential thinking, merely providing a few examples proves to be insufficient. To address this, researchers suggested to use of a new technique called chain-of-thought prompting.
This new technique consists of modifying the original few-shot prompting by adding examples of problems and their solutions and a detailed description of intermediate reasoning steps while describing the solution. Consider this example from the original paper:
Image Credit: CoT Original Paper
The authors of this approach showed how complex reasoning abilities emerge naturally in sufficiently large LMs via a chain-of-thought prompting. A series of intermediate reasoning steps for a given task significantly improves the ability of LLMs to perform complex reasoning.
Crucially, chain-of-thought prompting is an emergent ability tied to model scale, as identified by the authors in the paper “Emergent Abilities of Large Language Models.” Chain-of-thought prompting does not positively impact performance for small models and only yields performance gains when used with models of ∼100B parameters.
Chain-of-thought Lineage
While various prompting techniques exist, this review focuses on those intimately connected with chain-of-thought prompting. To ensure coherence, we'll proceed in chronological order to trace the evolution of ideas.
Reply