• Turing Post
  • Posts
  • FOD#62: DeepMind’s New Techniques Are Shaping the Future – Here’s How

FOD#62: DeepMind’s New Techniques Are Shaping the Future – Here’s How

Plus, a new rubric on AGI, Karpathy's take on RL, news from the usual suspects, and the hands-down best-curated list of last week's papers and models

This Week in Turing Post:

  • Tuesday, Guest Post: How to Use Function Calling with Ollama, Llama3 and Milvus;

  • Wednesday, AI 101: a deep dive into Speculative RAG and more.

If you like Turing Post, consider becoming a paid subscriber. You’ll immediately get full access to all our articles, investigations, and tech series. They are informative and actually interesting to read

The main topic

Whatever Demis Hassabis and Google DeepMind are up to: watch them. Last week, they amazed everyone with a robot that is able to play competitive table tennis at a solid amateur human-level performance. It’s not that interesting how many players it outplayed; what I was curious about were the novel techniques DeepMind came up with to make it happen. Because that’s probably what other companies will be building upon soon.

A New Approach: hierarchical and modular policy architecture

At the heart of DeepMind’s success is a sophisticated architecture that combines hierarchical and modular policies. The system is built on Low-Level Controllers (LLCs) and a High-Level Controller (HLC). Each LLC is a specialized policy for specific table tennis skills – whether it’s a forehand topspin or a backhand rally – executing at a high frequency of 50Hz. The HLC is the strategic mastermind, selecting the appropriate LLC based on the current game situation, including the type of incoming ball, strategic goals, and real-time match data. This layered approach allows the robot to respond with human-like adaptability and precision, a feat that has far-reaching implications beyond table tennis.

Image Credit: The original paper

Zero-Shot Sim-to-Real Transfer: the game changer

DeepMind's robot is not just theoretically impressive; it’s designed to perform in the real world without the need for extensive fine-tuning, marking a significant advancement in robotics. This achievement is largely due to their Zero-Shot Sim-to-Real Transfer techniques, which establish the training task distribution using real-world data and iteratively refine it through cycles of simulation and real-world deployment. A particularly innovative aspect is the use of Sim-to-Sim Adapter Layers, specifically FiLM (Feature-wise Linear Modulation) layers, to address challenges such as handling different types of ball spin. This approach combines the exploratory strengths of Reinforcement Learning with the practical relevance of Imitation Learning, enabling effective transfer from simulation to real-world performance without requiring additional training. These techniques help bridge the sim-to-real gap, allowing the robot to dynamically adjust for complexities like spin correction during training.

Real-Time Adaptation: learning on the fly

What sets this robot apart is its ability to adapt in real-time, a capability that mirrors human intuition and experience. Detailed metrics for each LLC – like return rate, ball velocity, and landing position – are continuously updated and fed back into the system. The HLC uses this data to evaluate and adjust its strategies on the fly, ensuring that the robot remains competitive even against unfamiliar opponents.

Making It Fun: beyond technical excellence

On a human note, DeepMind researchers also made sure that playing with was fun for humans. They included elements like sampling and learning from mistakes, ensuring that the robot doesn’t simply dominate every game but creates a challenging, enjoyable experience for human players. So thoughtful.

Beyond Table Tennis and Robots

This table tennis-playing robot is a milestone by itself but it also has a lot of potential implications. The hierarchical policy architecture, real-time adaptation techniques all these is important in robotics but also in self-driving. These techniques bridge the gap between simulation and reality. But as the authors note, there's still a long way to go in achieving consistent human-level performance across tasks and building robots (and cars) that interact skillfully and safely with humans.

Twitter Library (all about RAG!)

This Monday, our Twitter lit up with overwhelming attention to this list:

And if you are interested in RAG, we recommend to download this report:

What’s the best model for RAG? Galileo’s latest LLM Hallucination Index ranks 22 of the leading models on their performance across 3 different RAG tasks, evaluating the correctness of their responses and propensity to hallucinate. See which model comes out on top and why larger is not always better… →read the report

  • Comeback of Reinforcement Learning (RL)?

    Speaking about RL (which was used in training the tennis-playing robot)! Andrej Karpathy recently ranted about RLHF (Reinforcement Learning from Human Feedback), calling it "barely RL" and stating, "RL is powerful. RLHF is not." He referenced one of DeepMind's earlier breakthroughs with the game Go, where the main success came from training the machine with true RL. Karpathy argues that RLHF is merely a proxy for human preferences, which can be misleading and prone to adversarial examples. Unlike the RL used in training AlphaGo, which directly optimizes for winning, RLHF in language models only approximates what humans like, limiting its effectiveness and potential. What’s your take, fellow enthusiasts of RLHF, the secret sauce behind ChatGPT? Is it really ultimately a limitation for human-level AI?

  • Machine Psychology
    Researchers from the University of Stuttgart, Google DeepMind, Helmholtz Institute, and TU Munich are pioneering a new field dubbed "machine psychology." By applying behavioral experiments inspired by human psychology, they aim to uncover the deeper workings of Large Language Models (LLMs). This approach goes beyond mere performance metrics, offering insights into emergent abilities, reasoning patterns, and AI behavior. The paper proposes theoretical frameworks, experimental paradigms, and best practices for robust empirical studies, opening a fresh perspective on understanding AI.

  • Predicting Social Science Outcomes with LLMs
    In a compelling study from Stanford University and New York University, researchers tested GPT-4's ability to predict outcomes of 70 pre-registered, nationally representative U.S. survey experiments. The results? An impressive correlation with actual outcomes (r = 0.85), even outperforming human forecasters and maintaining high accuracy for unpublished studies (r = 0.90). This highlights the potential of LLMs in enhancing social science research, though the study also flags the risks of bias and misuse. Jack Clark wrote about that: "AI systems are creative mirrors, they are machine spirits of the human unconscious, they are value simulacras... We are not dealing with calculators here. We are not dealing with simple tools. We are dealing with vast high-dimensional artifacts that encode within themselves the culture on which they have been trained and can reflect this culture back... reality itself is becoming a shared endeavor, written into by both biological beings and their silicon creations."

News from The Usual Suspects ©

OpenAI: Structured and (Not So) Subtle

  • OpenAI's latest API feature, Structured Outputs, ensures AI models produce outputs that fit neatly into developer-defined JSON Schemas, raising the bar for reliability in complex workflows.

  • Meanwhile, their GPT-4o system card reveals quirky behaviors in AI-driven voice modes—like mimicking users' voices or producing unsettling sounds—showing the challenges of AI innovation.

  • Leadership updates see John Schulman and Peter Deng leaving, with Greg Brockman on sabbatical and Zico Kolter stepping in to fortify safety protocols.

Mistral AI: Streamlining AI Mastery

  • Mistral AI is simplifying generative AI development with customizable flagship models and an alpha release of "Agents" for complex workflows.

  • Their stable SDK, mistralai 1.0, now available for Python and Typescript, enhances usability for developers, paving the way for easier creation of domain-specific AI applications. It’s a toolkit for AI innovation, wrapped in a developer-friendly package.

Groq: A $640 Million Chip on the Shoulder

  • Groq has just raised $640 million in a round led by BlackRock (wow!), pushing its valuation to $2.8 billion. They introduced super-fast, energy-efficient Language Processing Units (LPUs), we explained how LPU works here (it’s free).

Microsoft & Palantir & hackers

  • Microsoft and Palantir are expanding their alliance to bolster AI-driven analytics for U.S. Defense and Intelligence, marrying Azure’s cloud with Palantir’s AI platforms for top-secret operations.

  • But not all is secure in Microsoft's AI realm – at Black Hat, researchers exposed how Copilot AI could be weaponized for phishing and data breaches, underscoring the need for stringent security in AI integrations.

Hugging Face Embraces XetHub. Why?

  1. Scalability: XetHub's technology scales Git to handle terabyte-sized repositories, essential for large datasets and models.

  2. Efficiency: Enables partial updates to large files, reducing the need to re-upload entire datasets.

  3. Collaboration: Enhances teamwork on large datasets, models, and code.

  4. Future-Proofing: Prepares Hugging Face for trillion-parameter models and evolving AI needs.

  5. Alignment: XetHub's mission aligns with Hugging Face's goal to optimize AI development.

We are watching/reading:

The freshest research papers, categorized for your convenience

New Models

  • Qwen2-Math: Qwen Labs released Qwen2-Math, a math-specific language model series, with Qwen2-Math-72B-Instruct surpassing GPT-4o and Claude 3.5 in benchmarks. The model excels in complex mathematical tasks with bilingual support forthcoming.

  • CogVideoX: Zhipu AI and Tsinghua University introduced CogVideoX, a diffusion-based transformer for text-to-video generation. It uses a 3D Variational Autoencoder and expert transformer to produce coherent, long-duration videos with state-of-the-art performance.

  • EXAONE 3.0: LG AI Research launched EXAONE 3.0, a bilingual instruction-tuned LLM with 7.8B parameters, optimized for English and Korean tasks. It excels in instruction-following and domain-specific reasoning.

  • VITA: Tencent Youtu Lab and collaborators presented VITA, the first open-source multimodal LLM capable of processing video, image, text, and audio inputs simultaneously. Built on the Mixtral 8x7B model, it advances human-computer interaction with strong benchmark performance.

Our top

  • Self-Taught Evaluators: Meta FAIR researchers introduced a method for training LLM evaluators without human annotations, using synthetic data to enhance model judgment. This approach improved Llama-3-70B-Instruct's RewardBench score from 75.4 to 88.7, surpassing models trained with human data →read the paper

  • RAG Foundry: Intel Labs developed RAG FOUNDRY, an open-source framework for enhancing LLMs in Retrieval-Augmented Generation (RAG) tasks. It demonstrated improvements with Llama-3 and Phi-3 models across knowledge-intensive datasets →read the paper

  • CODEXGRAPH: Researchers from NUS, Alibaba, and Xi’an Jiaotong University introduced CODEXGRAPH, which integrates LLMs with graph databases to enhance code retrieval in large repositories, showing superior performance in coding tasks →read the paper

Optimizing and Enhancing Language Models

  • Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters explores how optimizing compute allocation during test time can enhance the performance of LLMs, outperforming models with more pretraining →read the paper

  • Synthesizing Text-to-SQL Data from Weak and Strong LLMs develops a model that bridges the gap between open-source and closed-source LLMs by using a combination of synthetic data from strong and weak models to improve text-to-SQL tasks →read the paper

  • Better Alignment with Instruction Back-and-Forth Translation proposes a method to improve LLMs by using instruction backtranslation and response rewriting, enhancing alignment and response diversity →read the paper

  • StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation introduces a multi-layered framework to assess LLMs across various cognitive levels, reducing biases and improving the consistency of evaluations →read the paper

Advanced Detection and Evaluation Techniques

  • LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection presents a system that classifies text into multiple categories to detect the extent of LLM involvement, enhancing the detection of machine-generated content →read the paper

  • CoverBench: A Challenging Benchmark for Complex Claim Verification creates a benchmark to evaluate LLM accuracy in verifying complex claims, revealing significant challenges in this domain →read the paper

Simulation and Rendering Innovations

  • GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS introduces a high-performance driving simulator that supports complex agent behaviors and enables rapid training of reinforcement learning agents →read the paper

  • RayGauss: Volumetric Gaussian-Based Ray Casting for Photorealistic Novel View Synthesis proposes a new method for photorealistic rendering using Gaussian functions, achieving superior results in novel view synthesis →read the paper

Leave a review!

Login or Subscribe to participate in polls.

Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. You will get a 1-month subscription!

Reply

or to participate.