- Turing Post
- Posts
- FOD#19: The Convergence of Reasoning and Action in AI
FOD#19: The Convergence of Reasoning and Action in AI
...to elevate AI from a simple data processor to a sophisticated decision-making tool
Some of the articles might be behind the paywall. If you are our paid subscriber, let us know, we will send you a pdf.
The Convergence of Reasoning and Action in AI
As AI models evolve, the pressure is on to transcend from merely interpreting data to reasoning and actionable decision-making. While the large language models (LLMs) exhibit remarkable prowess in handling tasks like arithmetic and symbolic reasoning, they often falter when translating these skills into direct, environment-specific actions. Against this backdrop, recent advancements like Google Research's ReAct and startups like Imbue offer compelling narratives. Could there be a synergy waiting to be exploited?
The Problem of Reasoning in AI
LLMs, renowned for their aptitude in 'chain-of-thought' prompting and problem decomposition, still often encounter logical and arithmetic errors. Program-Aided Language Models (PaLs) offer a partial solution by generating programs for reasoning tasks and offloading their execution to programmatic runtimes. These models excel in arithmetic and procedural reasoning but struggle with turning reasoning into actionable steps, especially in real-world environments.
Google's ReAct: Bridging the Gap
Enter Google Research's ReAct – an approach that marries reasoning and acting. It aims to improve the shortcomings of existing LLMs by allowing them to generate both verbal reasoning traces and text actions. This integration can foster a more dynamic and effective decision-making mechanism. By combining the two, ReAct not only promotes logical consistency but also allows for feedback loops that enrich the internal state of the AI, facilitating future decision-making tasks.
The Billion-Dollar Company Centering on Reasoning
So reasoning is still very tough, and that might be the reason why other GenAI companies do not focus on that. Imbue that just raised $200 million Series B round, wants to be a unique player in this unfolding drama. With a valuation of $1 billion, the company is dedicated to fostering reasoning-centric AI agents. For companies like Imbue, this is more than just an opportunity; it's a clear call to align their specialized efforts with broader advancements like ReAct. This could usher in a new age of AI – one that not only reasons but acts intelligently in diverse, real-world contexts.
Imbue’s Core Philosophy: to place reasoning at the forefront, optimizing AI agents for decision-making, adaptability, and information gathering.
Explainability: As a business differentiator, Imbue emphasizes transparency and accountability, allowing their AI agents to elucidate their reasoning.
Applications: Targeting enterprise applications like coding, Imbue provides a practical testing ground for models that integrate reasoning and action.
Flexibility: With a business model that could adapt to both consumer and third-party applications, Imbue signals a future where AI is democratized and personalized.
It’s all might be just a PR talk, but as we know from the OpenAI story, brilliantly unfolded in The Wired, to achieve what you want, it’s utterly important to know what you want.
Conclusion
The true power of AI will be unlocked when reasoning and acting capabilities are seamlessly integrated. Emerging technologies like Google's ReAct offer a blueprint for this, and companies like Imbue provide the market focus to make it a reality. This synergy has the potential to elevate AI from a simple data processor to a sophisticated decision-making tool.
Open Source is On Fire
Falcon is soaring even higher. The Technology Innovation Institute (TII) in UAE recently unveiled Falcon 180B, setting a new benchmark in the realm of foundation models. With 180 billion parameters, Falcon 180B was trained on a staggering 3.5 trillion tokens using 4096 GPUs over 7M GPU hours. This behemoth is 2.5 times larger than Llama2 and employed 4 times the computational resources. It excels in tasks like reasoning and coding, topping the open LLM leaderboards. This new release represents a significant escalation from its prior versions, which had 1B, 7B, and 40B parameters. Falcon 180B not only outpaces GPT-3.5 in multiple benchmarks but also reinforces the surging open-source trend in foundation models. The open-source community, propelled initially by Stable Diffusion and later by Llama and Falcon, is narrowing the performance gap with commercial models like GPT-4, making future outperformance by open-source alternatives plausible.
Another open-source model was released last week: Persimmon-8B is fully permissively-licensed language model with <10 billion parameters released under an Apache license for maximum flexibility.
Connected in Time
TIME just unveiled its TIME100 Most Influential People in AI. But the most fascinating thing about it is this:
Conclusion: stay connected, most preferably with Andrew Ng.
News from The Usual Suspects
Nvidia
Nvidia offers early access to its TensorRT-LLM. If you are part of their Developer Program, you should try it out; but even reading the article is worthwhile. You'll learn about their use of inference performance optimization techniques such as tensor parallelism, in-flight batching, the new FP8 quant size, and Hopper Transformer Engine support.
Analytics India Magazine caught up with Huang Jensen when he was in India. Jensen said “that by the end of next year, India will have AI supercomputers that are an order of magnitude faster (i.e. 50 times to 100 times faster). “We are going to bring out the fastest computers in the world. These computers are not even in production [so far]. India will be one of the first countries in the world [to get them].”
Anthropic
They just introduced Claude Pro for $20/ month. I personally use paid version of ChatGPT, and though I like Claude but the free version just seem enough. What about you:
What is your favorite?Please send a comment clarifying how and what you use! |
Midjourney
Midjourney is challenged by Ideogram, a new startup founded by ex-Google Brain researchers and backed by $16.5 million in seed funding. It aims to disrupt the AI image generation market with its focus on reliable text generation within images. So far, the results were terrifying, and… no text on the picture was provided:
IBM
IBM is about to launch launch its Granite series models, leveraging the "Decoder" architecture foundational to LLMs. Targeted at enterprise NLP tasks such as summarization, content generation, and insight extraction, IBM aims to set a new standard in transparency by disclosing both the data sources and the data processing methodologies employed for the Granite series. The series is expected to be available in Q3 2023.
“Manhattan Project” Concerns
Interconnects writes about the challenges and paradoxes facing AI research, particularly critiquing the analogy of AI development to the Manhattan Project. Unlike the atomic bomb, AI's goals are undefined and its risks stem from uncertainty and emergent behavior, complicating safety metrics. The article argues that surveillance and regulation for AI differ fundamentally from nuclear materials, emphasizing the role of trust and communication. It also explores the weakened influence of scientists in political discourse and the transformation of research dissemination via social media algorithms and corporate monopolies. In summary, the article highlights the unique complexities in AI research governance, contrasting it with historical examples to debunk oversimplifications.
TWITTER LIBRARY
For your kids
Other news, categorized for your convenience:
Reinforcement Learning and Human Feedback
Abstract Reinforcement learning from human feedback (RLHF) vs. RL from AI Feedback (RLAIF). This paper compares RLHF and RLAIF techniques for aligning language models to human preferences. Results indicate that both methods yield similar improvements in human evaluations, specifically for summarization tasks →read more
Memory Efficiency in RLHF with Proximal Policy Optimization (PPO). This study dives into the computational overhead of using PPO in RLHF, introducing Hydra-RLHF as a memory-efficient solution that maintains performance. Hydra-RLHF dynamically adjusts LoRA settings during training to save memory →read more
Multi-modal Language Models
CM3Leon: A Retrieval-augmented, Token-based Multi-modal Language Model. The paper introduces CM3Leon, a model capable of both text-to-image and image-to-text generation. It utilizes a recipe adapted from text-only language models and demonstrates high performance and controllability in various tasks →read more
Efficient Models for Computer Vision
Sparse Mixture-of-Experts Models (MoEs) in Vision Transformers (ViTs). This work explores the application of sparse MoEs in Vision Transformers to make them more suitable for resource-constrained environments. It proposes a mobile-friendly design and shows performance gains compared to dense ViTs →read more
AI Safety and Verification
Provably Safe AI: Using Mathematical Proof for AI Safety. The paper by Max Tegmark argues for the use of mathematical proof as a mechanism for ensuring AI safety. It calls for hardware, software, and social systems to carry proofs of formal safety specifications and discusses automated theorem proving's role →read more
In other newsletters
ChinaAI discusses China's implementation gap between large language models (LLMs) developed in labs and their adoption by businesses. Certainly, the best newsletter about China.
Deep dive into the 21 benchmarks that are used to evaluate LLMs by Why Try AI.
Understanding and Using Supervised Fine-Tuning (SFT) for Language Models by Deep (Learning) Focus
LLM Training: RLHF and Its Alternatives by Sebastian Raschka
We are reading
Interview with Ivan Zhang, Cohere’s CTO
Interview with Walter Isaacson, Elon Musk’s latest biographer and Isaacson’s article about Musk in Time.
The profile of OpenAI by Wired’s Steven Levy is exhaustingly long but certainly worth reading, demonstrating their uncrushable faith in AGI (and/or Superintelligence). Also, Reid Hoffman seems to be the one who made it happen for them financially.
Sharing is caring – please forward Turing Post to your friends and colleagues, or use your personalized link to earn point in the referral program (to be announced soon). 🤍
Another week with fascinating innovations! We call this overview “Froth on the Daydream" - or simply, FOD. It’s a reference to the surrealistic and experimental novel by Boris Vian – after all, AI is experimental and feels quite surrealistic, and a lot of writing on this topic is just a froth on the daydream.
How was today's FOD?Please give us some constructive feedback |
Reply