Turing Post
Posts
FOD#74: Sparks of AGI – OpenAI’s plans to get there

FOD#74: Sparks of AGI – OpenAI’s plans to get there

we discuss Sébastien Bubeck at OpenAI, what is going on with SLMs, and offer you a collection of interesting articles, relevant news and research papers. Dive in!

Ksenia Se
November 04, 2024

This Week in Turing Post:

Wednesday, AI 101, Models: Mistral family
Friday, AI Unicorns: Perplexity

If you like Turing Post, consider upgrading, exploring this smarter way to research from our partners, or sharing this digest with a friend. It helps!

Main topic: Small language models are one the rise

Two things sparked my curiosity last week: the surge in papers and announcements related to small language models (SLMs) and ’s recent move to OpenAI.

Bubeck is notable for (at least) two achievements:

He co-authored the 155-page research paper, Sparks of Artificial General Intelligence: Early Experiments with GPT-4, published on April 13, 2023.
He was instrumental in developing the Microsoft’s Phi series of SLMs, efficient AI models optimized for edge devices like smartphones and laptops. The first Phi model was introduced in the paper Textbooks Are All You Need, which made waves and sparked big ideas on efficient AI – quality data, less compute, same power!

In his interview with Turing Post, Bubeck explained their intuition behind the approach they took in Textbooks Are All You Need:

"Following the Sparks of AGI paper, we realized that to 'understand' what’s happening in LLMs, we had to try building our own. We had no experience training large transformers and limited data to begin with. Recognizing how hard it could be to evaluate any LLM we trained (given the maze of academic benchmarks), we decided to narrow the scope: coding was our target because of an existing large dataset (The Stack), a simple evaluation metric (OpenAI’s HumanEval), and prior evidence that ~1B parameter networks could handle this task reasonably. With only a few dozen GPUs, we aimed for a high HumanEval score using an SLM and restricted data. Filtering The Stack for 'educational content' (as identified by GPT-4) and creating 'synthetic textbooks' to diversify the data were crucial. After a month, we reached 50% on HumanEval and declared success. Then came the question: could this approach extend beyond coding? That’s when we tackled common-sense reasoning with phi-1.5 and general cognitive ability with phi-2, eventually reaching phi-3!"

Not that long ago, it has been confirmed, that OpenAI is collaborating with designer Jony Ive to develop an AI-powered hardware device aimed at a less socially intrusive computing experience than current smartphones. This project perfectly aligns with Bubeck's vision of integrating AI models into everyday devices!

In the same interview, Bubeck told us:

"I can’t wait for SLMs like Phi-3 to be embedded everywhere. We’re already seeing this with Phi Silica, a derivative of Phi-3-mini, built specifically for the Copilot+ PCs announced on May 20, just before Build 2024. Windows will be the first platform to feature an in-box, state-of-the-art SLM, optimized for the NPU, by the end of this year. Eventually, I’d love to ask my watch to perform actions while I’m running or have an SLM on my phone while I hike, answering questions about what I’m seeing. The applications are endless."

Given Bubeck's background and OpenAI's recent hardware initiatives, it’s reasonable to assume that OpenAI views SLMs as a crucial part of its strategy toward achieving AGI – or at least a major component. Bubeck’s focus at OpenAI will likely center on:

Developing Efficient AI Models for Hardware Integration: Drawing on his SLM expertise, Bubeck may work on compact AI models optimized for OpenAI's new hardware, ensuring peak performance on devices with limited resources.
Enhancing On-Device AI Capabilities: He could contribute to advancing AI features that function directly on consumer devices, decreasing reliance on cloud computing and improving user privacy.
Collaborating on Custom AI Chip Development: With OpenAI’s partnerships with Broadcom and TSMC to develop custom AI chips, Bubeck's insight could help create models tailored for these chips, boosting both efficiency and performance.

OpenAI has no plans to slow down. With last week’s launch of SearchGPT, an AI-powered search engine that integrates real-time web information with conversational capabilities, positioning itself as a direct competitor to established search platforms like Google, and Bubeck on board with his expertise in SLM (and Sparks of AGI), OpenAI is casting an even wider net, getting their hands on the hottest topics.

Other companies accelerating their SLM game:

Meta AI open-sourced MobileLLM, a foundation model optimized for on-device scenarios.
Hugging Face introduced SmolLM v2 “pushing the state-of-the-art performances of LLMs under 2B parameters with three sizes: 135M, 360M and 1.7B parameters.”
Infosys unveiled Infosys Topaz BankingSLM and Infosys Topaz ITOpsSLM. These SLMs are designed to assist businesses in adopting and scaling AI solutions tailored to banking and IT operations.
Moondream, a platform focused on SLMs, emerged from stealth mode, securing $4.5 million in new funding. This investment underscores the growing interest in developing lightweight AI solutions.

It’s also worth noting that Qualcomm's CEO Cristiano Amon said he wants to "break the paradigm of the app construct," signaling a shift from traditional apps to AI agents on your devices. And what could be more efficient for this than SLMs?

To wrap up on SLMs for today, check out this survey of small language models from thirteen reputable universities and AI labs. They offer a taxonomy that provides a structured approach to understanding and evaluating SLMs, focusing on:

How models are optimized (through architectural design, training efficiency, and compression).
Which constraints are prioritized (e.g., compute, memory, energy) based on the intended application environment and deployment needs.

Image Source: The Survey of SLMs

Twitter library

10 Open Multimodal Models

Our collection of the most powerful open MLLMs to help you find the best option for your specific needs

www.turingpost.com/p/10-open-mllms

Weekly recommendation from AI practitioner👍🏼

Patchwork – an open-source toolset for merging and transforming datasets. Think of it as your essential toolkit for tidying up data chaos, with flexible, modular utilities designed for swift data integration across projects.

💎 We also recommend our partners SciSpace – your research buddy)

SciSpace has got everything you need – over 280 million+ papers at your fingertips, easier lit reviews, and even AI that chats with your PDFs to break things down for you. Trusted by 3M+ active users!

News from The Usual Suspects ©

Breakthroughs? →

Osmo Labs Digitized Smell
“A fresh summer plum was the first fruit and scent to be fully digitized and reprinted with no human intervention.” →Alex Wiltschko’s twitter
Decart & Etched introduced Oasis: A Fully AI-Generated Game World
It’s the first fully playable, real-time, open-world AI game, revolutionizing gaming with AI-generated experiences →Oasis’s GitHub

More news →

Waymo | EMMA: End-to-end Multimodal Driving Model, using Google’s Gemini, combines sensor and language data, excelling in trajectory prediction and object detection. Short-term memory and lack of LiDAR remain limitations →Waymo’s blog
Amazon | AWS Pits Q Developer Against Copilot. Why it might be good? It’s backed by Claude 3.5 – usual programmers’ choice →Amazon’s blog
Google
- Gemini’s “Grounding with Google Search” lets apps use live data, improving factual accuracy and trustworthiness →Google’s blog
- Big Sleep, Google’s AI tool, found critical flaws in SQLite, showcasing AI’s potential to detect complex vulnerabilities in software →Project Zero’s blog
- Google’s new “Learn About” tool turns any search query into a structured, interactive learning experience, powered by Gemini →Learn About playground

Policy

A16z & Microsoft: Teaming Up for AI’s Future. They rally for a balanced AI landscape where both startups and giants can thrive. Their pitch: open-source AI, shared data pools, and policies to empower U.S. innovation – garage dreamers and corporate titans alike →Microsoft’s blog
Anthropic’s Case for Targeted AI Regulation. Anthropic urges swift, focused AI regulation to curb potential risks. Their proposal: adaptive safety policies to keep up with model advancements, aiming for minimal red tape and maximum protection →Anthropic’s blog

AGI, again. Your thoughts:

NVIDIA introduced HOVER (Versatile Neural Whole-Body Controller for Humanoid Robots) →their GitHub

We are reading

The Present Future: AI's Impact Long Before Superintelligence by Ethan Mollick
Why I build open language models – an important read from Nathan Lambert

Leave a review!

The freshest research papers, categorized for your convenience

Our TOP

Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA →read the paper

.@GoogleDeepMind, @GoogleAI and @kaist_ai introduce new methods to turn large LLMs into smaller models:
- Recursive Transformers that reuse layers multiple times
- Relaxed Recursive Transformers with LoRA
- Continuous Depth-wise Batching for speeding up processing
Details 🧵
— TuringPost (@TheTuringPost)
1:24 PM • Oct 30, 2024

Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models
Researchers at Stellenbosch University examined strategies for reducing hallucinations in LLMs through prompt engineering and external tool integration. Testing approaches like self-consistency (SC) and Chain-of-Thought (CoT) on math and trivia tasks, they found SC best reduced hallucinations in reasoning tasks. Meanwhile, simpler prompts and avoiding tool complexity were more effective overall. Tool-using agents like ReAct increased hallucination rates, especially in less powerful LLMs, highlighting tool integration challenges →read the paper
Mind Your Step (By Step): CoT Can Reduce Performance on Tasks Where Thinking Makes Humans Worse
Princeton researchers identify tasks where chain-of-thought (CoT) reasoning degrades LLM performance. Testing across implicit statistical learning, visual recognition, and exception-based classification, CoT reduced accuracy by up to 36%. These reductions mirror human performance issues in similar tasks, linking specific cognitive constraints in humans to LLMs. However, CoT did not impair tasks like spatial reasoning or memory-intensive selection, highlighting cases where human-model constraints diverge →read the paper
Measuring Memorization Through Probabilistic Discoverable Extraction
Researchers from Google DeepMind and Boston University propose a probabilistic method to better measure memorization in LLMs. Current methods, focused on single-attempt extraction via greedy sampling, underestimate memorization. This study introduces the "(𝑛, 𝑝)-discoverable extraction" metric, capturing the probability of extracting memorized data across multiple attempts and sampling schemes →read the paper

Robotics & Embodied AI

Advancing Embodied AI Through Touch And Dexterity provides tools for enhanced touch perception and human-robot interaction.
A Large Recurrent Action Model: xLSTM Enables Fast Inference For Robotics Tasks introduces xLSTM for efficient, real-time robotics actions.

Surveys & Methodological Studies

Document Parsing Unveiled: Techniques And Prospects reviews document parsing for structured data extraction.
Neural Fields In Robotics: A Survey covers neural fields’ applications in 3D robotics.
Teaching Embodied RL Agents: Informativeness And Diversity Of Language studies language feedback’s impact on RL agent learning.
Personalization Of Large Language Models: A Survey details techniques for LLM personalization.
Survey Of UI Design And Interaction In Generative AI explores user interaction designs for generative AI.

Training & Post-Training Optimization

COAT: Compressing Optimizer States And Activation For FP8 Training introduces FP8 training for memory-efficient optimization.
EoRA: Training-Free Compensation For Compressed LLM compensates for errors in compressed LLMs.

SocialGPT: Prompting LLMs For Social Relation Reasoning combines vision and language models for social recognition.
AutoKaggle: A Multi-Agent Framework For Autonomous Data Science
Automates data science competition tasks with multi-agents.

Security & Vulnerability

Stealing User Prompts From Mixture Of Experts identifies a security flaw in MoE models.

Retrieval & Dense Retrieval Optimization

Zero-Shot Dense Retrieval With Embeddings From Relevance Feedback enhances zero-shot retrieval accuracy.
RARe: Retrieval Augmented Retrieval With In-Context Examples uses examples to boost retrieval performance.
Beyond Text: Optimizing RAG With Multimodal Inputs tests multimodal RAG in industrial settings.
FACT: Examining Iterative Context Rewriting For Multi-Fact Retrieval improves multi-fact retrieval through iterative context updates.

Transformers & Tokenization Innovation

TOKENFORMER: Rethinking Transformer Scaling With Tokenized Parameters proposes token-parameter attention for efficient scaling.
Zipfian Whitening enhances word embeddings by adjusting for word frequency.

Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. You will get a 1-month subscription!

Reply

or to participate.