- Turing Post
- Posts
- FOD#74: Sparks of AGI – OpenAI’s plans to get there
FOD#74: Sparks of AGI – OpenAI’s plans to get there
we discuss Sébastien Bubeck at OpenAI, what is going on with SLMs, and offer you a collection of interesting articles, relevant news and research papers. Dive in!
This Week in Turing Post:
Wednesday, AI 101, Models: Mistral family
Friday, AI Unicorns: Perplexity
If you like Turing Post, consider upgrading, exploring this smarter way to research from our partners, or sharing this digest with a friend. It helps!
Main topic: Small language models are one the rise
Two things sparked my curiosity last week: the surge in papers and announcements related to small language models (SLMs) and ’s recent move to OpenAI.
Bubeck is notable for (at least) two achievements:
He co-authored the 155-page research paper, Sparks of Artificial General Intelligence: Early Experiments with GPT-4, published on April 13, 2023.
He was instrumental in developing the Microsoft’s Phi series of SLMs, efficient AI models optimized for edge devices like smartphones and laptops. The first Phi model was introduced in the paper Textbooks Are All You Need, which made waves and sparked big ideas on efficient AI – quality data, less compute, same power!
In his interview with Turing Post, Bubeck explained their intuition behind the approach they took in Textbooks Are All You Need:
"Following the Sparks of AGI paper, we realized that to 'understand' what’s happening in LLMs, we had to try building our own. We had no experience training large transformers and limited data to begin with. Recognizing how hard it could be to evaluate any LLM we trained (given the maze of academic benchmarks), we decided to narrow the scope: coding was our target because of an existing large dataset (The Stack), a simple evaluation metric (OpenAI’s HumanEval), and prior evidence that ~1B parameter networks could handle this task reasonably. With only a few dozen GPUs, we aimed for a high HumanEval score using an SLM and restricted data. Filtering The Stack for 'educational content' (as identified by GPT-4) and creating 'synthetic textbooks' to diversify the data were crucial. After a month, we reached 50% on HumanEval and declared success. Then came the question: could this approach extend beyond coding? That’s when we tackled common-sense reasoning with phi-1.5 and general cognitive ability with phi-2, eventually reaching phi-3!"
Not that long ago, it has been confirmed, that OpenAI is collaborating with designer Jony Ive to develop an AI-powered hardware device aimed at a less socially intrusive computing experience than current smartphones. This project perfectly aligns with Bubeck's vision of integrating AI models into everyday devices!
In the same interview, Bubeck told us:
"I can’t wait for SLMs like Phi-3 to be embedded everywhere. We’re already seeing this with Phi Silica, a derivative of Phi-3-mini, built specifically for the Copilot+ PCs announced on May 20, just before Build 2024. Windows will be the first platform to feature an in-box, state-of-the-art SLM, optimized for the NPU, by the end of this year. Eventually, I’d love to ask my watch to perform actions while I’m running or have an SLM on my phone while I hike, answering questions about what I’m seeing. The applications are endless."
Given Bubeck's background and OpenAI's recent hardware initiatives, it’s reasonable to assume that OpenAI views SLMs as a crucial part of its strategy toward achieving AGI – or at least a major component. Bubeck’s focus at OpenAI will likely center on:
Developing Efficient AI Models for Hardware Integration: Drawing on his SLM expertise, Bubeck may work on compact AI models optimized for OpenAI's new hardware, ensuring peak performance on devices with limited resources.
Enhancing On-Device AI Capabilities: He could contribute to advancing AI features that function directly on consumer devices, decreasing reliance on cloud computing and improving user privacy.
Collaborating on Custom AI Chip Development: With OpenAI’s partnerships with Broadcom and TSMC to develop custom AI chips, Bubeck's insight could help create models tailored for these chips, boosting both efficiency and performance.
OpenAI has no plans to slow down. With last week’s launch of SearchGPT, an AI-powered search engine that integrates real-time web information with conversational capabilities, positioning itself as a direct competitor to established search platforms like Google, and Bubeck on board with his expertise in SLM (and Sparks of AGI), OpenAI is casting an even wider net, getting their hands on the hottest topics.
Other companies accelerating their SLM game:
Meta AI open-sourced MobileLLM, a foundation model optimized for on-device scenarios.
Hugging Face introduced SmolLM v2 “pushing the state-of-the-art performances of LLMs under 2B parameters with three sizes: 135M, 360M and 1.7B parameters.”
Infosys unveiled Infosys Topaz BankingSLM and Infosys Topaz ITOpsSLM. These SLMs are designed to assist businesses in adopting and scaling AI solutions tailored to banking and IT operations.
Moondream, a platform focused on SLMs, emerged from stealth mode, securing $4.5 million in new funding. This investment underscores the growing interest in developing lightweight AI solutions.
It’s also worth noting that Qualcomm's CEO Cristiano Amon said he wants to "break the paradigm of the app construct," signaling a shift from traditional apps to AI agents on your devices. And what could be more efficient for this than SLMs?
To wrap up on SLMs for today, check out this survey of small language models from thirteen reputable universities and AI labs. They offer a taxonomy that provides a structured approach to understanding and evaluating SLMs, focusing on:
How models are optimized (through architectural design, training efficiency, and compression).
Which constraints are prioritized (e.g., compute, memory, energy) based on the intended application environment and deployment needs.
Image Source: The Survey of SLMs
Twitter library
Weekly recommendation from AI practitioner👍🏼
Patchwork – an open-source toolset for merging and transforming datasets. Think of it as your essential toolkit for tidying up data chaos, with flexible, modular utilities designed for swift data integration across projects.
💎 We also recommend our partners SciSpace – your research buddy)
SciSpace has got everything you need – over 280 million+ papers at your fingertips, easier lit reviews, and even AI that chats with your PDFs to break things down for you. Trusted by 3M+ active users!
News from The Usual Suspects ©
Breakthroughs? →
Osmo Labs Digitized Smell
“A fresh summer plum was the first fruit and scent to be fully digitized and reprinted with no human intervention.” →Alex Wiltschko’s twitterDecart & Etched introduced Oasis: A Fully AI-Generated Game World
It’s the first fully playable, real-time, open-world AI game, revolutionizing gaming with AI-generated experiences →Oasis’s GitHub
More news →
Waymo | EMMA: End-to-end Multimodal Driving Model, using Google’s Gemini, combines sensor and language data, excelling in trajectory prediction and object detection. Short-term memory and lack of LiDAR remain limitations →Waymo’s blog
Amazon | AWS Pits Q Developer Against Copilot. Why it might be good? It’s backed by Claude 3.5 – usual programmers’ choice →Amazon’s blog
Google
Gemini’s “Grounding with Google Search” lets apps use live data, improving factual accuracy and trustworthiness →Google’s blog
Big Sleep, Google’s AI tool, found critical flaws in SQLite, showcasing AI’s potential to detect complex vulnerabilities in software →Project Zero’s blog
Google’s new “Learn About” tool turns any search query into a structured, interactive learning experience, powered by Gemini →Learn About playground
Policy
A16z & Microsoft: Teaming Up for AI’s Future. They rally for a balanced AI landscape where both startups and giants can thrive. Their pitch: open-source AI, shared data pools, and policies to empower U.S. innovation – garage dreamers and corporate titans alike →Microsoft’s blog
Anthropic’s Case for Targeted AI Regulation. Anthropic urges swift, focused AI regulation to curb potential risks. Their proposal: adaptive safety policies to keep up with model advancements, aiming for minimal red tape and maximum protection →Anthropic’s blog
AGI, again. Your thoughts:
NVIDIA introduced HOVER (Versatile Neural Whole-Body Controller for Humanoid Robots) →their GitHub
We are reading
The Present Future: AI's Impact Long Before Superintelligence by Ethan Mollick
Why I build open language models – an important read from Nathan Lambert
Leave a review! |
The freshest research papers, categorized for your convenience
Our TOP
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA →read the paper
.@GoogleDeepMind, @GoogleAI and @kaist_ai introduce new methods to turn large LLMs into smaller models:
- Recursive Transformers that reuse layers multiple times
- Relaxed Recursive Transformers with LoRA
- Continuous Depth-wise Batching for speeding up processingDetails 🧵
— TuringPost (@TheTuringPost)
1:24 PM • Oct 30, 2024
Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models
Researchers at Stellenbosch University examined strategies for reducing hallucinations in LLMs through prompt engineering and external tool integration. Testing approaches like self-consistency (SC) and Chain-of-Thought (CoT) on math and trivia tasks, they found SC best reduced hallucinations in reasoning tasks. Meanwhile, simpler prompts and avoiding tool complexity were more effective overall. Tool-using agents like ReAct increased hallucination rates, especially in less powerful LLMs, highlighting tool integration challenges →read the paper
Mind Your Step (By Step): CoT Can Reduce Performance on Tasks Where Thinking Makes Humans Worse
Princeton researchers identify tasks where chain-of-thought (CoT) reasoning degrades LLM performance. Testing across implicit statistical learning, visual recognition, and exception-based classification, CoT reduced accuracy by up to 36%. These reductions mirror human performance issues in similar tasks, linking specific cognitive constraints in humans to LLMs. However, CoT did not impair tasks like spatial reasoning or memory-intensive selection, highlighting cases where human-model constraints diverge →read the paper
Measuring Memorization Through Probabilistic Discoverable Extraction
Researchers from Google DeepMind and Boston University propose a probabilistic method to better measure memorization in LLMs. Current methods, focused on single-attempt extraction via greedy sampling, underestimate memorization. This study introduces the "(𝑛, 𝑝)-discoverable extraction" metric, capturing the probability of extracting memorized data across multiple attempts and sampling schemes →read the paper
Robotics & Embodied AI
Advancing Embodied AI Through Touch And Dexterity provides tools for enhanced touch perception and human-robot interaction.
A Large Recurrent Action Model: xLSTM Enables Fast Inference For Robotics Tasks introduces xLSTM for efficient, real-time robotics actions.
Language Model (LLM) Capabilities & Reasoning
Counting Ability Of Large Language Models And Impact Of Tokenization explores tokenization’s effect on LLM counting abilities.
On Memorization Of Large Language Models In Logical Reasoning examines memorization vs. reasoning in LLM logical tasks.
What Happened In LLM Layers When Trained For Fast Vs. Slow Thinking studies gradient stability in LLMs trained for detailed thinking.
Language Models Can Self-Lengthen To Generate Long Texts introduces a method for extending LLM responses.
Optimization & Preference Tuning
Hybrid Preferences: Learning To Route Instances For Human Vs. AI Feedback balances human and AI feedback for improved preference tasks.
LongReward: Improving Long-Context Large Language Models With AI Feedback uses AI feedback to enhance long-context LLMs.
Accelerating Direct Preference Optimization With Prefix Sharing reduces training redundancy in preference optimization.
Memory Efficiency & Model Compression
BITSTACK: Fine-Grained Size Control For Compressed Large Language Models provides dynamic memory compression for LLMs.
NeuZip: Memory-Efficient Training And Inference With Dynamic Compression lowers memory use in training and inference for neural networks.
Agents & Multi-Agent Systems
AgentStore: Scalable Integration Of Heterogeneous Agents integrates multiple agents for dynamic task automation.
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents builds a GUI agent model for universal navigation.
Surveys & Methodological Studies
Document Parsing Unveiled: Techniques And Prospects reviews document parsing for structured data extraction.
Neural Fields In Robotics: A Survey covers neural fields’ applications in 3D robotics.
Teaching Embodied RL Agents: Informativeness And Diversity Of Language studies language feedback’s impact on RL agent learning.
Personalization Of Large Language Models: A Survey details techniques for LLM personalization.
Survey Of UI Design And Interaction In Generative AI explores user interaction designs for generative AI.
Training & Post-Training Optimization
COAT: Compressing Optimizer States And Activation For FP8 Training introduces FP8 training for memory-efficient optimization.
EoRA: Training-Free Compensation For Compressed LLM compensates for errors in compressed LLMs.
SocialGPT: Prompting LLMs For Social Relation Reasoning combines vision and language models for social recognition.
AutoKaggle: A Multi-Agent Framework For Autonomous Data Science
Automates data science competition tasks with multi-agents.
Security & Vulnerability
Stealing User Prompts From Mixture Of Experts identifies a security flaw in MoE models.
Retrieval & Dense Retrieval Optimization
Zero-Shot Dense Retrieval With Embeddings From Relevance Feedback enhances zero-shot retrieval accuracy.
RARe: Retrieval Augmented Retrieval With In-Context Examples uses examples to boost retrieval performance.
Beyond Text: Optimizing RAG With Multimodal Inputs tests multimodal RAG in industrial settings.
FACT: Examining Iterative Context Rewriting For Multi-Fact Retrieval improves multi-fact retrieval through iterative context updates.
Transformers & Tokenization Innovation
TOKENFORMER: Rethinking Transformer Scaling With Tokenized Parameters proposes token-parameter attention for efficient scaling.
Zipfian Whitening enhances word embeddings by adjusting for word frequency.
Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. You will get a 1-month subscription!
Reply