Turing Post
Posts
FOD#19: The Convergence of Reasoning and Action in AI

FOD#19: The Convergence of Reasoning and Action in AI

...to elevate AI from a simple data processor to a sophisticated decision-making tool

Ksenia Se
September 11, 2023

_{Some of the articles might be behind the paywall. If you are our paid subscriber, let us know, we will send you a pdf.}

The Convergence of Reasoning and Action in AI

As AI models evolve, the pressure is on to transcend from merely interpreting data to reasoning and actionable decision-making. While the large language models (LLMs) exhibit remarkable prowess in handling tasks like arithmetic and symbolic reasoning, they often falter when translating these skills into direct, environment-specific actions. Against this backdrop, recent advancements like Google Research's ReAct and startups like Imbue offer compelling narratives. Could there be a synergy waiting to be exploited?

The Problem of Reasoning in AI

LLMs, renowned for their aptitude in 'chain-of-thought' prompting and problem decomposition, still often encounter logical and arithmetic errors. Program-Aided Language Models (PaLs) offer a partial solution by generating programs for reasoning tasks and offloading their execution to programmatic runtimes. These models excel in arithmetic and procedural reasoning but struggle with turning reasoning into actionable steps, especially in real-world environments.

Google's ReAct: Bridging the Gap

Enter Google Research's ReAct – an approach that marries reasoning and acting. It aims to improve the shortcomings of existing LLMs by allowing them to generate both verbal reasoning traces and text actions. This integration can foster a more dynamic and effective decision-making mechanism. By combining the two, ReAct not only promotes logical consistency but also allows for feedback loops that enrich the internal state of the AI, facilitating future decision-making tasks.

The Billion-Dollar Company Centering on Reasoning

So reasoning is still very tough, and that might be the reason why other GenAI companies do not focus on that. Imbue that just raised $200 million Series B round, wants to be a unique player in this unfolding drama. With a valuation of $1 billion, the company is dedicated to fostering reasoning-centric AI agents. For companies like Imbue, this is more than just an opportunity; it's a clear call to align their specialized efforts with broader advancements like ReAct. This could usher in a new age of AI – one that not only reasons but acts intelligently in diverse, real-world contexts.

Imbue’s Core Philosophy: to place reasoning at the forefront, optimizing AI agents for decision-making, adaptability, and information gathering.
Explainability: As a business differentiator, Imbue emphasizes transparency and accountability, allowing their AI agents to elucidate their reasoning.
Applications: Targeting enterprise applications like coding, Imbue provides a practical testing ground for models that integrate reasoning and action.
Flexibility: With a business model that could adapt to both consumer and third-party applications, Imbue signals a future where AI is democratized and personalized.

It’s all might be just a PR talk, but as we know from the OpenAI story, brilliantly unfolded in The Wired, to achieve what you want, it’s utterly important to know what you want.

Conclusion

The true power of AI will be unlocked when reasoning and acting capabilities are seamlessly integrated. Emerging technologies like Google's ReAct offer a blueprint for this, and companies like Imbue provide the market focus to make it a reality. This synergy has the potential to elevate AI from a simple data processor to a sophisticated decision-making tool.

Open Source is On Fire

Falcon is soaring even higher. The Technology Innovation Institute (TII) in UAE recently unveiled Falcon 180B, setting a new benchmark in the realm of foundation models. With 180 billion parameters, Falcon 180B was trained on a staggering 3.5 trillion tokens using 4096 GPUs over 7M GPU hours. This behemoth is 2.5 times larger than Llama2 and employed 4 times the computational resources. It excels in tasks like reasoning and coding, topping the open LLM leaderboards. This new release represents a significant escalation from its prior versions, which had 1B, 7B, and 40B parameters. Falcon 180B not only outpaces GPT-3.5 in multiple benchmarks but also reinforces the surging open-source trend in foundation models. The open-source community, propelled initially by Stable Diffusion and later by Llama and Falcon, is narrowing the performance gap with commercial models like GPT-4, making future outperformance by open-source alternatives plausible.
Another open-source model was released last week: Persimmon-8B is fully permissively-licensed language model with <10 billion parameters released under an Apache license for maximum flexibility.

Connected in Time

TIME just unveiled its TIME100 Most Influential People in AI. But the most fascinating thing about it is this:

Andrew Ng on LinkedIn: I’m happy to be on the Time AI 100 list of influential people in AI

…and thrilled that 8 others from my Stanford group or other teams I led are also named.

Congratulations to my former advisees and team members:
- Sam Altman (as undergrad, interned with me on RL research at Stanford)
- Dario Amodei (worked with me on DeepSpeech and scaling laws at Baidu)
- Alison Darcy (AI+mental health at Stanford)
- Geoff Hinton (member of my team at Google Brain)
- Lila Ibrahim (once my COO at Coursera)
- Neal Khosla (undergrad advisee at Stanford)
- Richard Socher (PhD advisee with Chris Manning)
- Ilya Sutskever (postdoc at Stanford)

www.linkedin.com/feed/update/urn:li:activity:7105575064724520960

Conclusion: stay connected, most preferably with Andrew Ng.

News from The Usual Suspects

Nvidia

Nvidia offers early access to its TensorRT-LLM. If you are part of their Developer Program, you should try it out; but even reading the article is worthwhile. You'll learn about their use of inference performance optimization techniques such as tensor parallelism, in-flight batching, the new FP8 quant size, and Hopper Transformer Engine support.
Analytics India Magazine caught up with Huang Jensen when he was in India. Jensen said “that by the end of next year, India will have AI supercomputers that are an order of magnitude faster (i.e. 50 times to 100 times faster). “We are going to bring out the fastest computers in the world. These computers are not even in production [so far]. India will be one of the first countries in the world [to get them].”

Anthropic

They just introduced Claude Pro for $20/ month. I personally use paid version of ChatGPT, and though I like Claude but the free version just seem enough. What about you:

What is your favorite?

Please send a comment clarifying how and what you use!

Midjourney

Midjourney is challenged by Ideogram, a new startup founded by ex-Google Brain researchers and backed by $16.5 million in seed funding. It aims to disrupt the AI image generation market with its focus on reliable text generation within images. So far, the results were terrifying, and… no text on the picture was provided:

IBM

IBM is about to launch launch its Granite series models, leveraging the "Decoder" architecture foundational to LLMs. Targeted at enterprise NLP tasks such as summarization, content generation, and insight extraction, IBM aims to set a new standard in transparency by disclosing both the data sources and the data processing methodologies employed for the Granite series. The series is expected to be available in Q3 2023.

“Manhattan Project” Concerns

Interconnects writes about the challenges and paradoxes facing AI research, particularly critiquing the analogy of AI development to the Manhattan Project. Unlike the atomic bomb, AI's goals are undefined and its risks stem from uncertainty and emergent behavior, complicating safety metrics. The article argues that surveillance and regulation for AI differ fundamentally from nuclear materials, emphasizing the role of trust and communication. It also explores the weakened influence of scientists in political discourse and the transformation of research dissemination via social media algorithms and corporate monopolies. In summary, the article highlights the unique complexities in AI research governance, contrasting it with historical examples to debunk oversimplifications.

TWITTER LIBRARY

5 open-source AI models for code generation and other options

Compare features and pricing of top AI coding assistants available today

www.turingpost.com/p/code-generation-ai

For your kids

Roblox’s new AI chatbot will help you build virtual worlds

Another AI chatbot, but this seems useful.

www.theverge.com/2023/9/8/23863943/roblox-ai-chatbot-assistant-ai-rdc-2023

Other news, categorized for your convenience:

Reinforcement Learning and Human Feedback

Abstract Reinforcement learning from human feedback (RLHF) vs. RL from AI Feedback (RLAIF). This paper compares RLHF and RLAIF techniques for aligning language models to human preferences. Results indicate that both methods yield similar improvements in human evaluations, specifically for summarization tasks →read more

Memory Efficiency in RLHF with Proximal Policy Optimization (PPO). This study dives into the computational overhead of using PPO in RLHF, introducing Hydra-RLHF as a memory-efficient solution that maintains performance. Hydra-RLHF dynamically adjusts LoRA settings during training to save memory →read more

Multi-modal Language Models

CM3Leon: A Retrieval-augmented, Token-based Multi-modal Language Model. The paper introduces CM3Leon, a model capable of both text-to-image and image-to-text generation. It utilizes a recipe adapted from text-only language models and demonstrates high performance and controllability in various tasks →read more

Efficient Models for Computer Vision

Sparse Mixture-of-Experts Models (MoEs) in Vision Transformers (ViTs). This work explores the application of sparse MoEs in Vision Transformers to make them more suitable for resource-constrained environments. It proposes a mobile-friendly design and shows performance gains compared to dense ViTs →read more

AI Safety and Verification

Provably Safe AI: Using Mathematical Proof for AI Safety. The paper by Max Tegmark argues for the use of mathematical proof as a mechanism for ensuring AI safety. It calls for hardware, software, and social systems to carry proofs of formal safety specifications and discusses automated theorem proving's role →read more

In other newsletters

ChinaAI discusses China's implementation gap between large language models (LLMs) developed in labs and their adoption by businesses. Certainly, the best newsletter about China.
Deep dive into the 21 benchmarks that are used to evaluate LLMs by Why Try AI.
Understanding and Using Supervised Fine-Tuning (SFT) for Language Models by Deep (Learning) Focus
LLM Training: RLHF and Its Alternatives by Sebastian Raschka

We are reading

Interview with Ivan Zhang, Cohere’s CTO
Interview with Walter Isaacson, Elon Musk’s latest biographer and Isaacson’s article about Musk in Time.
The profile of OpenAI by Wired’s Steven Levy is exhaustingly long but certainly worth reading, demonstrating their uncrushable faith in AGI (and/or Superintelligence). Also, Reid Hoffman seems to be the one who made it happen for them financially.

Sharing is caring – please forward Turing Post to your friends and colleagues, or use your personalized link to earn point in the referral program (to be announced soon). 🤍

Another week with fascinating innovations! We call this overview “Froth on the Daydream" - or simply, FOD. It’s a reference to the surrealistic and experimental novel by Boris Vian – after all, AI is experimental and feels quite surrealistic, and a lot of writing on this topic is just a froth on the daydream.

How was today's FOD?

Please give us some constructive feedback

Reply

or to participate.