FOD#45: Inside NVIDIA's game plan

we explore NVIDIA GTC announcements and offer the best curated list of the freshest ML news and papers

Next Week in Turing Post:

  • Wednesday, FMOps series: FMOps Infrastructure Tools

  • Friday: a very interesting interview and list of useful resources about time series.

Turing Post is a reader-supported publication. To have full access to our most interesting articles and investigations, become a paid subscriber →

I remember the feeling of getting a new video card with a more powerful GPU for my PC. Everyone at school would be jealous, and you knew – the best games were at your disposal now. That's what NVIDIA is in my mind – making the best games possible. With their strategic advance into all things AI, that’s what they're doing again: making the best games – in reality now! – possible. Their gaming plan has spread across accelerated computing, generative AI, industry applications, automotive, enterprise platforms, Omniverse, and robotics.

Today, at their annual GTC conference, they announced new developments in all of those areas. Covering them all would be impossible in one newsletter, but that’s what stands out for me:

AI-driven self-driving cars as intelligent companions

Tesla is already a giant iPhone, meaning it's more a device than a car. NVIDIA is following the same idea: helping to create cars that are defined by software.

A few Chinese leading transportation companies are adopting NVIDIA DRIVE Thor – an in-vehicle computing platform architected for generative AI applications, delivering four times the performance of its predecessor. With up to a thousand trillion operations per second, it's equipped to handle a diverse array of AI workloads, setting the stage for safer autonomous driving. And though the industry is currently navigating through various levels of assisted driving (Levels 2 and 3), the hardware and software are designed to evolve, allowing vehicles equipped with NVIDIA technology today to reach higher levels of autonomy as software and regulations evolve. Thanks to implemented genAI you will be able to talk to your car. And the car could answer back.

Chinese automakers are indeed fast in adopting NVIDIA technologies – which raises some questions from the media; that's due to incentives, regulations favoring innovation, and a strategic focus on new vehicle architectures that prioritize centralized computing and AI. But NVIDIA also assured us at press-briefing that engagement with Western automakers continued robustly, with mentions of ongoing projects with Mercedes-Benz and Jaguar Land Rover.

NVIDIA announces GR00T and Omniverse (a digital twin ecosystem)

  • Omniverse, NVIDIA's platform for creating and deploying digital twins, marks a significant milestone in industrial digitalization. By facilitating the integration of physical and virtual worlds, it allows industries to simulate, optimize, and execute operations with unprecedented efficiency. The introduction of Omniverse Cloud APIs extends these capabilities, promising a transformative impact across various sectors, including automotive, robotics, and beyond.

  • NVIDIA's Isaac robotics platform represents a leap in robotics and AI, catering to both runtime and AI training for robots. With advancements like project GR00T and Isaac Manipulator, NVIDIA is enabling a new generation of robotics development, emphasizing large, multimodal models for a future where robots are more versatile and capable than ever before.

And a few more technical announcements:

  1. Nvidia Blackwell GPU – a cutting-edge advancement designed to power the next generation of AI with 20 petaflops of performance. This GPU represents a quantum leap in AI capabilities, aiming to democratize access to trillion-parameter models.

    • Key Features:

      • Dual-Die Architecture: Combines two of the largest dies achievable, linked by a high-bandwidth NVLink, ensuring seamless operation as a unified architecture.

      • Second-Generation Transformer Engines: Enhances AI computations to achieve unprecedented efficiency, enabling operations in just four bits of precision.

      • Performance: Delivers four times the training performance and 30 times the inference performance of its predecessor, with 25 times better energy efficiency.

  2. NVLink Switch 7.2 TI – a new generation interconnect technology, which addresses the bottleneck of data exchange. It is designed to facilitate communication between GPUs at a scale suitable for the most advanced AI models.

    • Key Features:

      • High Throughput: Offers 18 times faster throughput compared to previous solutions, enabling efficient scaling for trillion-parameter models.

      • Enhanced Communication: Facilitates a new level of data exchange efficiency among GPUs, crucial for complex AI model training and inference.

  3. New GB 272 Computing Platform features 72 Blackwell GPUs. It showcases Nvidia's commitment to pushing the boundaries of data center capabilities.

  4. NVIDIA NIM – a new software product aimed at simplifying the deployment of generative AI within enterprise environments. It packages models with optimized inference engines and supports a wide range of GPU architectures. They call it AI package for all.

    • Key Features:

      • Containerized Microservice: Bundles AI models into a deployable container, facilitating easy deployment and scalability across various environments.

      • Support for Custom and Open Models: Accommodates a diverse range of AI models, including proprietary and open-source, ensuring flexibility and control over intellectual property.

      • Standardized APIs: Offers industry-standard APIs, enabling seamless integration with existing enterprise systems and processes.

From SemiAnalysis: “Nvidia is on top of the world. They have supreme pricing power right now, despite hyperscaler silicon ramping. Everyone simply has to take what Nvidia is feeding them with a silver spoon.”

If there's ever a course about the greatest tech CEOs, Jensen Huang will definitely be on the list. His idea of continuously leveraging GPU technology to accelerate computing across a myriad of domains has fundamentally changed how we interact with technology. It's enabled breakthroughs that were once considered science fiction, and that's how NVIDIA hit a $2 trillion valuation. Under Hueng’s supervision, we're all following NVIDIA's game plan.

Twitter Library

News from The Usual Suspects ©

Cerebras’s chips

  • Cerebras introduces the CS-3, a third-generation wafer-scale AI accelerator boasting over 4 trillion transistors and double the speed of its predecessor, enabling the training of advanced AI models. It's designed for scalability, allowing up to 2048 systems to connect, offering unprecedented computing power. The CS-3 supports external memory configurations up to 1,200 terabytes, facilitating the development of models significantly larger than existing ones. With its advanced technology, the CS-3 represents a leap forward in AI and machine learning capabilities.

Figure AI

  • Figure is advancing humanoid robotics by adding sophisticated language and reasoning skills to the traditional focus on physical tasks. Through a partnership with OpenAI, Figure's robot, showcased in a video, impressively identifies and makes reasoned choices, such as selecting an apple when asked for food. This development suggests promising steps toward more interactive and helpful robots in everyday scenarios.

Midjourney

  • Now you can generate consistent characters across multiple AI-generated images, addressing a common challenge in AI image generation. This new tool, marked by the “–cref” tag, allows users to maintain character features, body type, and clothing across different scenes by referencing a URL of a previously generated image. It includes a "character weight" option ("–cw"), adjustable from 1 to 100, to control the likeness degree to the original character in new images.

Grok

A16z

Last week introduced a few exciting research papers. We categorize them for your convenience

Our Top

SIMA A generalist AI agent for 3D virtual environments

Researchers from DeepMind introduced the Scalable Instructable Multiworld Agent (SIMA), a breakthrough in AI that follows natural-language instructions within various video game environments. Unlike previous AI systems focused on mastering single games, SIMA is trained across multiple games, showcasing its ability to adapt and perform tasks in diverse 3D virtual settings without game-specific code access. This research could lead to AI agents capable of assisting in real-world applications, demonstrating the potential of video games as platforms for developing more general and helpful AI technologies →read the paper

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Researchers from Apple have developed MM1, a family of Multimodal Large Language Models (MLLMs) with up to 30B parameters. Through extensive experimentation, they uncovered vital design principles, notably the impact of image encoder choices, image resolution, token count, and pre-training data mix on model performance. Using a combination of image-caption, interleaved image-text, and text-only data proved crucial for achieving state-of-the-art few-shot results. Surprisingly, the design of the vision-language connector had minimal importance. The MM1 models demonstrate enhanced in-context learning and multi-image reasoning, showing promise in few-shot chain-of-thought prompting. These insights guide the development of performant MLLMs, emphasizing the significance of carefully chosen pre-training strategies and model scaling techniques →read the paper

Chronos: Learning the Language of Time Series

Researchers from Amazon Web Services, UC San Diego, University of Freiburg, and Amazon Supply Chain Optimization Technologies introduced Chronos, a novel framework for time series forecasting that leverages pre-existing transformer-based language model architectures. Chronos tokenizes time series data through scaling and quantization, enabling the application of language models to forecasting without architectural changes. Pretrained on diverse datasets including a synthetic one generated via Gaussian processes, Chronos demonstrates superior forecasting abilities on both familiar datasets and unseen ones in a zero-shot manner. This approach simplifies forecasting pipelines by utilizing the "language of time series," showcasing the adaptability of language models to time series forecasting →read the paper

Advancements in Language Model Capabilities and Applications

  • Quiet-STaR: Enhances LLM reasoning capabilities through self-generated internal rationales, improving accuracy on complex tasks →read the paper

  • Stealing Part of a Production Language Model: Explores vulnerability in LLMs, demonstrating a method to extract model parameters through API interactions →read the paper

  • Simple and Scalable Strategies to Continually Pre-train Large Language Models: Proposes efficient updates to LLMs, minimizing computational resources while maintaining performance →read the paper

  • SOTOPIA-π: Develops socially intelligent language agents through interactive learning, enhancing agents' social skills and safety →read the paper

  • Language Models Scale Reliably with Over-Training and on Downstream Tasks: Establishes scaling laws predicting LLM performance in over-trained regimes and downstream tasks →read the paper

Enhancing Multimodal Understanding and Interaction

  • MoAI: Integrates visual information from specialized CV models into LLMs, enhancing real-world scene understanding →read the paper

  • Gemma: Releases open models based on Gemini technology, focusing on language understanding, reasoning, and safety →read the paper

  • GiT: Unifies diverse vision tasks through a universal language interface, demonstrating strong zero-shot capabilities across tasks →read the paper

  • VisionGPT-3D: Merges LLMs with computer vision models for advanced 3D vision understanding and multimodal interactions →read the paper

Innovations in Model Training and Efficiency

  • Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU: Introduces Fuyou, a framework enabling fine-tuning large models on single GPUs by optimizing data swapping with NVMe SSDs →read the paper

  • Algorithmic Progress In Language Models: Analyzes algorithmic advancements in LLM pre-training, highlighting the role of compute scaling in performance improvements →read the paper

  • FAX: Develops scalable and differentiable federated primitives in JAX for large-scale distributed and federated learning →read the paper

Novel Frameworks and Methods for Generative Models

  • Multistep Consistency Models: Merges consistency models and TRACT for balancing sampling speed and quality in generative models →read the paper

  • Branch-Train-MiX: Combines Branch-Train-Merge and Mixture-of-Experts models for efficient LLM training across specialized domains →read the paper

Exploring Model Vulnerabilities and Security

  • Resonance RoPE: Improves train-short-test-long performance in LLMs by refining Rotary Position Embedding interpolation →read the paper

  • AtP: Enhances the localization of behaviors in LLMs to specific components, improving model understanding and diagnostic capabilities →read the paper

We are reading

Become a Premium subscriber and expense this subscription through your company. Join hundreds of forward-thinking professionals. Please also send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. 🤍 Thank you for reading

How was today's FOD?

Please give us some constructive feedback

Login or Subscribe to participate in polls.

Join the conversation

or to participate.