Turing Post
Posts
FOD#67: o in o1 – the first star in Orion constellation

FOD#67: o in o1 – the first star in Orion constellation

plus the little dictionary of terms and collection of interesting reads on the topic

Ksenia Se
September 16, 2024

This Week in Turing Post:

Wednesday, AI 101: we introduce a new size-bite cards with concepts
Friday, a surprise project: we unpack generational stereotypes in four major image generation models.

The main topic

I don’t know why no one has noticed that the "o" in OpenAI's newest model, o1, likely stands for Orion – the long-ago announced model. My chain of thought (haha) is this: during the developers' AMA, the OpenAI team mentioned that o1 is a "model" and "not a system." Meanwhile, Sam Altman tweeted a cryptic message:

"I love being home in the Midwest. The night sky is so beautiful. Excited for the winter constellations to rise soon; they are so great."

Of course, Orion is a winter constellation in the Northern Hemisphere.

This suggests that OpenAI is working on a larger system – a constellation – where o1 is just one star. Maybe all of this is inconsequential, but I’m fascinated by the metaphor. Constellations have been used for navigation and storytelling since the dawn of time, and I believe OpenAI is mapping out its own constellation of AI models, each playing a distinct role in a broader, interconnected system. It also plans to build a narrative and navigate the discourse.

What fascinates me most is the possibility that each “star” in this constellation, like o1, represents not just an individual model but a piece of a larger, more integrated framework. These models might be designed to collaborate, enhancing their reasoning and decision-making capacities as a unified whole. It’s a compelling vision – a new kind of AI ecosystem, where each component is aligned with a purpose, like stars aligning to form a pattern in the sky.

I think there's something poetic about this, don’t you?

But back to the ground: in his thorough and insightful deep dive into o1, Nathan Lambert aptly noted, “This is a soft segue into real language model agents.” The reason for this lies in the unique capabilities o1 brings to the table. By combining reinforcement learning (RL), search-based reasoning, and chain-of-thought (CoT) mechanisms, o1 represents a significant step forward. These foundational elements are critical for developing more advanced, autonomous AI systems, making o1 not just another language model but a piece in a bigger picture toward the future of intelligent agents. And as a single piece, it doesn’t have to be perfect.

I’ll provide a list of links worth reading about o1, but before that, it might be useful to be equipped with a little dictionary. First, let’s clarify the chain: q* → Strawberry → o1.

Reinforcement Learning (RL): The technique used to train o1, where the model improves by getting feedback (rewards) based on its actions or reasoning. RL allows o1 to try different approaches, learn from mistakes, and continuously improve.
RL-Based Search Algorithm: Refers to the reinforcement learning mechanism that helps o1 search over reasoning spaces to solve problems more efficiently.
Chain-of-Thought (CoT) Reasoning: The process where the model breaks down complex tasks into smaller steps and systematically works through them, similar to how humans solve intricate problems step-by-step. This results in more accurate conclusions.
Inference-Time Scaling: In most models, the heavy computational work happens during training, but with o1, the real action happens during inference. As the complexity of a task increases, o1 spends more time thinking, scaling its computations dynamically as it generates responses.
Test-Time Compute Scaling: A new approach where the model dedicates more computational resources while it's actively solving tasks, leading to improved reasoning at the cost of increased compute power. This scaling happens in real time during the problem-solving process.
Self-play Reinforcement Learning: A method where the model learns by solving problems independently, similar to how AI models mastered games like Go. In o1, this approach helps improve decision-making in real-world tasks.
Hidden Reasoning Tokens: These are the internal, unseen steps o1 takes while reasoning through a problem. OpenAI has chosen not to make these visible, citing safety concerns and competitive advantages, which adds a layer of mystery to the reasoning process.
AIME and ARC Benchmarks: These are tests used to measure o1's problem-solving and reasoning performance, particularly in mathematics and science. OpenAI claims that o1 surpasses GPT-4 in these domains.

This release is much closer to the original GPT-3 release than the ChatGPT release. A new paradigm, which some will done incredibly valuable for things even we at OpenAI can’t predict.
But it’s not a mass product that just works and unlocks new value for everyone effortlessly.… x.com/i/web/status/1…
— Boris Power (@BorisMPower)
10:08 PM • Sep 12, 2024

To read more:

OpenAI o1 hub

OpenAI o1-preview

OpenAI technical research blog

OpenAI o1 System Card

OpenAI platform docs

Contributions

OpenAI o1 Results on ARC-AGI-Pub

Summary of AMA with OpenAI team

Amazing analysis by Nathan Lambert

Amazing analysis by Jim Fan

Simon Willison’s notes

Healthcare application of o1-preview

Trying it out by Ethan Mollick

Trying it out by Rohit Krishnan

Importance of self-play

Trying to reason like o1

If you like Turing Post, consider becoming a paid subscriber. We just started an immensely interesting series about Agentic Workflows and their future →

Our Twitter library

7+ Methods for Enhancing Code Generation

This list highlights key challenges in LLM code generation and the need for better methods to improve accuracy and efficiency

www.turingpost.com/p/methods-to-enhance-code-generation

Weekly recommendation from AI practitioner👍🏼:

V0 by vercel – it's like talking to your wireframes and it's going to launch a million of similarly looking prototypes.

If any of this is helpful, please forward this email to a colleague. That allows us to keep Monday’s digest free for everyone.

News from The Usual Suspects ©

Microsoft Makes Waves: UI layer for better AI

As AI becomes more capable and agentic, models themselves become more of a commodity, and all value gets created by how you steer, ground, and finetune these models with your business data and workflow — and how they compose with the UI layer of human to AI to human interaction.… x.com/i/web/status/1…
— Satya Nadella (@satyanadella)
3:15 PM • Sep 16, 2024

World Labs: The Next Dimension of AI
- Fei-Fei Li's World Labs is setting its sights on spatial intelligence with Large World Models (LWMs) that can perceive and interact in 3D. By moving beyond 2D, they aim to revolutionize AI’s understanding of the world, from virtual realms to real-life applications. With $230M in funding and big-name investors, 3D creativity is about to get a major upgrade.
OpenAI's $150B Question: Corporate Revolution
- OpenAI’s next $6.5 billion financing round comes with strings attached: restructuring its nonprofit to remove a profit cap for investors. The $150 billion valuation depends on this shift, promising huge returns to early backers while raising concerns over the company's mission to balance commercial ambition with AI safety. A gamble, but investors seem eager.
Salesforce’s AgentForce: AI Gets to Work
- Salesforce debuts AgentForce, its AI-driven solution for businesses looking to add always-on autonomous agents to their teams.
Oracle's Zettascale Ambition: A New Era in Cloud AI
- Oracle has unveiled the first zettascale cloud supercomputer, featuring up to 131,072 NVIDIA Blackwell GPUs for AI workloads. Boasting 2.4 zettaFLOPS of peak performance, it's a game-changer for industries needing AI at scale. Zoom and WideLabs are already leveraging Oracle’s AI sovereignty and performance to drive innovation. AI just hit hyperspeed.
Musk’s Sky Monopoly: Two-Thirds of All Satellites
- Elon Musk’s SpaceX now controls over 62% of all active satellites, thanks to its ever-growing Starlink constellation, which adds about three satellites per day. With more than 6,300 satellites in low-Earth orbit, SpaceX aims to reach 42,000 for global internet coverage. That’s crazy.
Hugging Face - ZeroGPU

👯‍♀️ Hot Models 👯‍♂️

that didn’t receive nearly as much attention as o1

Google's DataGemma: Fighting AI Hallucinations with Facts
- Google introduces DataGemma, the first open model linking language models to real-world data from the extensive Data Commons. With 240 billion data points, this innovation tackles AI hallucinations by grounding responses in factual information. Through new RIG and RAG techniques, it enhances accuracy and reliability, moving LLMs closer to trustworthy AI.
Mistral’s Pixtral 12B: Seeing is Believing
- French AI startup Mistral just dropped Pixtral 12B, a 12-billion-parameter multimodal model capable of processing both text and images. Whether captioning photos or counting objects, Pixtral is setting its sights on becoming a serious player in AI image understanding. Open for fine-tuning under Apache 2.0, it's the latest move in Mistral’s rise as Europe's answer to OpenAI.

We are watching/reading:

Highly recommended: but we would rename it to
GitHub - stas00/ml-engineering: Machine Learning Engineering Open Book
Common LLM Settings by Stella Biderman
https://x.com/karpathy/status/1835024197506187617
https://www.platformer.news/mark-zuckerberg-acquired-podcast-interview

The freshest research papers, categorized for your convenience

Our top

A path towards AGI: LLMs that generate novel ideas
What if we could enhance LLMs so that they can generate novel ideas independently and go further in their own decision-making?
Several studies analyzed LLMs' creativity in generating new research ideas, showing the tendency of… x.com/i/web/status/1…
— TuringPost (@TheTuringPost)
11:00 PM • Sep 13, 2024

Where we mention Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers; SCIMON : Scientific Inspiration Machines Optimized for Novelty; Can Large Language Models Unlock Novel Scientific Research Ideas?; and The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

LLMs and Architectures

Theory, Analysis, and Best Practices for Sigmoid Self-Attention: Explores an alternative to the widely-used softmax attention, offering stability and speed improvements across language, vision, and speech tasks. Read the paper.
What is the Role of Small Models in the LLM Era: Investigates the roles small models can play alongside LLMs, highlighting their cost and efficiency benefits for specific, resource-constrained tasks. Read the paper.
Configurable Foundation Models: Building LLMs from a Modular Perspective: Introduces a modular approach for constructing LLMs, allowing dynamic reconfiguration and task specialization for improved scalability and efficiency. Read the paper.

Multimodal Models and Vision-Language Integration

Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation: Proposes an auto-regressive model for visual generation, improving image quality and managing large vocabularies for visual tasks. Read the paper.
MMEVOL: Empowering Multimodal Large Language Models with EVOL-Instruct: Enhances multimodal LLMs by using evolved image-text instructions, boosting performance across vision-language tasks. Read the paper.
Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments: Enables robots to perform tasks in new environments using multimodal learning, without requiring retraining, demonstrating adaptability across setups. Read the paper.
LLAMA-OMNI: Seamless Speech Interaction with Large Language Models: Introduces a model for real-time speech interaction with LLMs, allowing text and speech generation without transcription, significantly reducing latency. Read the paper.
UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity: Uses self-supervised learning to predict user intent based on onscreen activity, reducing computational cost and latency in UI interaction tasks. Read the paper.

Optimization, Efficiency, and Model Performance Enhancements

OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs: Merges generation and retrieval tasks in a single model pass, improving performance in tasks like retrieval-augmented generation (RAG) and entity linking. Read the paper.
MEMORAG: Moving Towards Next-Gen RAG via Memory-Inspired Knowledge Discovery: Introduces a memory-enhanced retrieval system to improve accuracy and performance for tasks with complex or ambiguous queries, leveraging long-term memory mechanisms. Read the paper.
SARA: High-Efficient Diffusion Model Fine-Tuning with Progressive Sparse Low-Rank Adaptation: Presents a fine-tuning method for diffusion models using sparse low-rank adaptations to reduce memory costs while maintaining high performance across tasks. Read the paper.
Agent Workflow Memory (AWM): Introduces a memory system that stores reusable workflows for LLM agents, enabling them to complete long-horizon tasks more efficiently by reusing past task experiences. Read the paper.
Towards a Unified View of Preference Learning for Large Language Models: Surveys strategies for aligning LLMs with human preferences, focusing on improving personalization and interaction through feedback systems. Read the paper.
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources: Enhances LLM performance in complex reasoning tasks by using real-world data to generate high-quality synthetic data. Read the paper.

3D Scene Reconstruction and Gaussian Splatting

gsplat: An Open-Source Library for Gaussian Splatting: Provides an open-source library for 3D scene reconstruction, reducing memory usage and training time in Gaussian Splatting models. Read the paper.
Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering: Proposes a Gaussian Splatting method for real-time, cross-platform facial animation and rendering. Read the paper.
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally: Introduces an efficient method for 3D segmentation using linear programming, improving speed and robustness. Read the paper.

System Benchmarks and Agent Evaluation

PingPong: Evaluates LLMs' role-playing abilities through a multi-turn conversation benchmark, assessing factors like character consistency and interaction quality. Read the paper.
SUPER: Introduces a benchmark for evaluating LLMs on research task reproduction from repositories, highlighting challenges in task execution and error handling. Read the paper.
WINDOWSAGENTARENA: Develops a scalable benchmark to assess multi-modal agents' performance on Windows OS tasks, focusing on navigation, tool usage, and coding. Read the paper.

Leave a review!

Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. You will get a 1-month subscription!

Reply

or to participate.