Turing Post
Posts
Topic 16: What is Whiteboard-of-Thought?

Topic 16: What is Whiteboard-of-Thought?

Explore how MLLMs can visually "think" step-by-step

Alyona Vert.
October 23, 2024

Step-by-step thinking is one of the most effective ways to enhance Large Language Models (LLMs) performance accuracy. It also makes the reasoning process more transparent. In our previous articles, we discussed Chain-of-Thought and Chain(s)-of-Knowledge techniques that focus on text reasoning. However, using words and symbols isn’t enough for the entire range of tasks. Some problems require visual thinking, similar to how humans imagine pictures while reasoning. To make this visual thinking in models a reality, the Whiteboard-of-Thought prompting technique was proposed. It allows models to create simple drawings, enabling them to handle problems that require visual thinking better than using words alone! Today, we will dive into this fascinating idea and explore its capabilities.

In today’s episode, we will cover:

Limitations of LLMs in visual reasoning
Here comes Whiteboard-of-Thought (WoT)
How does WoT work?
Is WoT prompting really good?
Advantages of WoT
Not without limitations
Conclusion
Bonus: Resources

Limitations of LLMs in visual reasoning

LLMs are very powerful for tasks involving logic, like math or symbolic reasoning, by writing out their steps in text, which is known as the Chain-of-Thought technique. However, they struggle with solving problems that require visual thinking, even though they've been trained on many different types of data, including images.

Even top AI models like GPT-4 might make mistakes in their answers as they don't reason about visual details. For example, when asked about “a lowercase letter that looks like a circle with a line coming down on the right side,” they might answer “b” instead of “q.”

Let’s look at how we, humans, solve these visual tasks. We naturally switch to imagining pictures in our minds or drawing things out to help us understand the problem. So why don’t we apply this way of thinking to AI models?

Here comes Whiteboard-of-Thought (WoT)

“Our key idea is that visual reasoning tasks demand visuals.”

The rest of this article, with detailed explanation and library of relevant resources, is available to our Premium users only –>

Thank you for reading 🩶

Reply

or to participate.