- Turing Post
- Posts
- Token 1.8: Silicon Valley of AI Chips and Semiconductors
Token 1.8: Silicon Valley of AI Chips and Semiconductors
Understanding the Chips that Drive Today's AI Breakthroughs
Introduction
No conversation about Foundation Models (FMs) and Large Language Models (LLMs) is possible without touching on compute. The development and deployment of FM models are compute-intensive endeavors, often requiring thousands of petaflops of processing power. Compute isn't just a resource for these models; it's a catalyst for their capabilities and evolution. That's what made this AI advancement possible, after all.
In today's Token, we'll peel back the layers on the AI chips that power these models and demystify the complex landscape of semiconductors for a better understanding. Merging the technical with the practical, we will trace the trajectory from general-use semiconductors to the specialized chips of today. This is a narrative of evolution, driven by the needs of sophisticated AI, and a guide to the silicon innovations that meet those needs head-on.
In today’s Token:
Compute fundamentals and the evolution of AI chips
Market dynamics (including the latest AI chips from main vendors and their smaller competitors; as well as cloud services)
How to make compute choices
What’s on the horizon (including Forecasting Market Evolutions)
Conclusion
Compute Fundamentals & The Evolution of AI Chips
In the computing world, semiconductors are the linchpin, with each chip acting as a miniature hub of electrical circuits. Traditional semiconductors have powered everything from wristwatches to spacecraft, but as AI's complexity escalates, so does the need for a different kind of semiconductor: the AI chip.
The march of AI chips began with the ubiquitous Central Processing Unit (CPU), a jack-of-all-trades processor that powered early AI tasks with admirable resilience. However, the CPU's sequential processing soon became a bottleneck for the parallelism AI algorithms craved.
The main AI chip you hear about is the GPU, which is not exactly true. GPU – Graphic Processing Unit – was initially created not for AI purposes but to render graphics in video games. Nvidia – a bespoke provider of GPUs – made a strategic advancement to GPUs by introducing tensor cores, elements finely tuned to accelerate AI workloads, allowing models to learn from vast datasets more efficiently than standard CPUs could.
So the GPU became an AI chip via recalibration of capabilities, specifically architected to handle the parallel processing that ML algorithms require. This marked the first significant leap toward specialized AI compute.
Yet, the insatiable compute appetite of emerging AI models necessitated more than a retrofit; it required a ground-up rethink. Thus, were born the TPUs (Tensor Processing Units), Google's answer to the need for lightning-fast matrix computations, and the IPUs (Intelligence Processing Units) by Graphcore, which offer an architecture designed to mimic the massively parallel processing of the human brain. There are also chips specifically engineered for neural network inference, such as those found in Tesla's Full Self-Driving (FSD) suite, but their utility is more specialized and narrow in scope. New promising work is happening on Neuromorphic Chips and Quantum Processing Units (QPUs), as well.
Market Dynamics
The GPU market is currently experiencing a significant shortage, impacting the AI industry profoundly. NVIDIA's GPUs, essential for ML across all stages, especially in pretraining models, are hard to come by, leading to multiyear leases by major tech firms and excluding smaller innovators. This scarcity is prompting shifts in the market, with competition between the main vendors like Nvidia, AMD, and Intel, new AI chips startups, and research around it, thriving industry of cloud service providers, in-house tech development by firms like Apple, and repurposing of other GPU sources.
Let's take a closer look at some of them:
- The latest AI chips from main vendors and their smaller competitors
Nvidia – the undisputed leader of GPU industry with the company’s capitalization close to one trillion dollars – has announced an accelerated release schedule for its AI chips, shifting to annual updates with the H200 in 2024 and the B100 later the same year, following the current H100. The H200 will continue using the Hopper architecture, while the B100 is rumored to employ a new Blackwell architecture. Nvidia will also update the Grace Hopper Superchip and the L40S universal accelerator and introduce a new NVL chip line for AI workloads on Arm-based systems. The company plans for faster networking products, with 400 Gb/s and 800 Gb/s InfiniBand and Ethernet releases set for 2024 and 2025.
Intel has announced their new Core Ultra processors, codenamed Meteor Lake, with the launch on December 14. These processors will feature Intel's first integrated neural processing unit (NPU) for efficient AI acceleration, making AI more accessible on PCs. Core Ultra is the inaugural chiplet design from Intel enabled by Foveros packaging technology, and it combines an NPU, advanced power-efficient performance due to the Intel 4 process technology, and discrete-level graphics capabilities with onboard Intel® Arc™ graphics. The disaggregated architecture offers a balance across AI-driven tasks, with the GPU providing performance for AI in media and 3D applications, the NPU handling low-power AI and AI offload, and the CPU optimized for responsive, low-latency AI tasks.
AMD has announced the upcoming MI300, an artificial intelligence accelerator chip intended for AI training via data-intensive processes. It leverages parallel computing, akin to gaming PC GPUs, to handle multiple workstreams simultaneously, enhancing AI efficiency. Technical specifics are sparse, but its design targets Nvidia's H100 chip market dominance. The MI300 is central to AMD's strategy in the burgeoning AI accelerator market, projected to reach $150 billion by 2027.
The market is also witnessing the rise of trailblazers like Cerebras, whose wafer-scale engine challenges conventional chip designs and aims at supercomputers. There are also innovators with neuron-inspired architectures that offer a fascinating sidebar in this narrative. Projects like NuPIC (Numenta's Hierarchical Temporal Memory) and NorthPole (from IBM Research) are breaking the mold, drawing inspiration from the neural circuitry of the human brain to develop chips that could one-day process information in fundamentally new ways.
NorthPole is a novel neural inference architecture that integrates computing and memory in a single on-chip system, mirroring the organic brain's efficiency but tailored for silicon. By merging compute with on-chip memory and functioning as active memory, NorthPole circumvents the traditional separation of memory and processor.
Numenta's NuPIC is using its brain-based algorithms, data structures, and architectures to enable the deployment of LLMs efficiently on more accessible CPUs, offering a blend of performance, cost savings, and data privacy. NuPIC ensures on-premise data control for enhanced security and compliance, supports a range of LLMs for customization, and allows rapid prototyping to full-scale deployment with ease.
- Cloud Services
Other important players in the field of AI computing are Cloud service providers, that offer computing resources. They act as a force multiplier for AI development. The list is not exhaustive but reflects the main players on the market (Turing Post has no affiliation with any of these companies):
Provides large-scale GPU clusters for training LLMs and Generative AI.
Aims for the world’s lowest-cost GPU instances.
Specializes in parallelizable workloads at scale.
Supports open-source AI/ML projects.
Geared for AI and ML workloads with NVIDIA’s H100 GPUs.
High-performance dedicated servers across global locations.
Famous for user-friendly interface.
Offering instant provisioning, comprehensive API access, and management tools for efficient infrastructure oversight.
Offers a range of NVIDIA GPUs for various compute needs.
Pricing depends on the GPU type, machine type, and region.
Supports scenarios ranging from video processing to deep learning.
Features NVIDIA and AMD GPUs, with various instances available.
VMs with GPU capabilities for compute and graphics-intensive workloads.
Includes NC, ND, and NV series for different requirements.
Provides flexibility with a selection of bare metal and virtual server GPUs.
Offers AI-focused servers that integrate NVIDIA's advanced GPUs with IBM's Power Systems.
AWS NVIDIA Solutions Collaboration offering scalable GPU-based solutions for a range of applications.
Includes EC2 instances powered by NVIDIA GPUs.
Isn't a cloud service provider and doesn't offer AI chips, but they do partner with cloud providers and leverage their AI capabilities to run their extensive range of pre-trained models and APIs.
Offers the Hub for AI model sharing and deployment.
The demand is so high that some investors organize their ‘local’ AI cloud services, such as Andromeda Cluster by Nat Friedman, ex-CEO of GitHub, and investor Daniel Gross. Their setup, featuring 2,512 H100 GPUs, can train an AI model with 65 billion parameters in roughly 10 days – but only for the startups they invest in. Still, the initiative is highly praised as it helps to democratize access to so much needed compute.
Evaluating Compute Choices
In the realm of computational architecture for AI applications, discernment is key. Each option carries a suite of strengths balanced by inherent limitations. To navigate this complex decision matrix, we must scrutinize the primary contenders in the field.
The following explanation is hidden for free subscribers and is available to Premium users only → please Upgrade to have full access to this and other articles
On the Horizon – that’s an interesting one!
The following explanation is hidden for free subscribers and is available to Premium users only → please Upgrade to have full access to this and other articles
5 Must-Read Books About Chips
“Chip War: The Fight for the World's Most Critical Technology” by Chris Miller
This is a compelling narrative detailing the intense global competition for supremacy in microchip technology. The book underscores the microchip as the cornerstone of modern military, economic, and geopolitical dominance. It chronicles America's pioneering role in chip technology and its recent challenges, as countries like China invest heavily to close the technological gap. With military might and economic power at stake, the struggle over semiconductor mastery is painted as the new Cold War, pivotal to shaping the future world order. → Read here
“The Hardware Lottery” by Sara Hooker
This essay proposes the concept of the "hardware lottery," a term to describe instances where a research idea gains prominence due to its compatibility with existing hardware and software, rather than its inherent superiority. It explores historical instances in computer science where potentially successful ideas were overshadowed due to this phenomenon. The essay argues that with the rise of specialized computing hardware, the disparity in research progress will intensify, propelling certain ideas forward rapidly while hindering others. → Read here
“The Sentient Machine: The Coming Age of Artificial Intelligence” by Amir Husain
This book discusses how artificial intelligence (AI), underpinned by advancing hardware from smartphones to cars, is reshaping human existence and our societal landscape. Husain presents a balanced view, acknowledging the fears of a dystopian future while making a case for AI as humanity's next great leap in creativity and problem-solving. His narrative simplifies AI and hardware concepts into understandable terms, encouraging readers to consider how intelligent machines might contribute to human progress. → Read here
“Fabless: The Transformation of the Semiconductor Industry” by Daniel Nenni and Paul McLellan
The book explores the history and impact of the fabless semiconductor model, where companies design and sell hardware but outsource manufacturing. It looks at how this approach has driven technological advancements and business growth since the 1980s, leading to the wide array of electronic devices we see today. → Read here
“Crystal Fire: The Invention of the Transistor and the Birth of the Information Age” by Michael Riordan and Lillian Hoddeson
The book covers the pivotal role of the transistor and the microchip in shaping modern life, while also delving into the human aspects of invention—such as competition and ambition. It portrays William Shockley, a key figure in the development of the transistor and a Nobel laureate, who also laid the groundwork for Silicon Valley. The narrative aims to make the technical details of the transistor's creation and its monumental influence on society and economy understandable to all readers. → Read here
Please give us feedback |
Thank you for reading, please feel free to share with your friends and colleagues. In the next couple of weeks, we are announcing our referral program 🤍
Reply