Preface
AI will soon be expanding beyond just the datacenter infrastructure “picks and shovels” plays, in my opinion. The explosion in use cases has already begun and will likely result in many “wow, I get it now” moments in the next 6 months. People are building and the technology is improving, even if it is in ways that are not readily apparent to casual end users - it soon will be. We might even see the first genuine AI-driven improvement in a public companies’ bottom line within the next couple quarters!
In the meantime, generalists are left to keep up with a rapidly advancing technological landscape in semiconductors that can easily leave you behind - a function of AI’s position on the “Disruption-Continuation” spectrum we mentioned in our One Year Anniversary piece.
The shame in that is that it often robs you of the ability to truly understand what’s behind the whole picture while ensuring you’re properly positioned - a picture which will only become increasingly important when these narratives take hold in the real world. The data center buildout is underway, but now the beneficiaries will become more skewed towards who can provide the most marginal benefit.
It was only a matter of second order thinking to get long AI via long the data center infrastructure in 2023. But technology only marches forward, never back. While one has indeed so far been able to get away with an understanding that only goes GPU-deep in AI, I believe that will rapidly begin changing.
In my opinion, performance in SMH will narrow and we’ll see a broadening out elsewhere (before the end of the year).
That doesn’t mean there won’t be significant outperformance in our trusty Phase 1 AI beneficiary names, though. We’ll likely see a lot more dispersion in the “AI Semiconductor” space, so it’s important to grasp the drivers that are most likely to see continued benefit in order to keep us asymmetrically positioned as the AI theme progresses.
For that reason, we’re introducing our first “Intra-Thematic Primer”. These will go beyond simply updating what’s occurred in a specific theme since publication, but will highlight areas that we believe will play an increasingly important role in the theme and should be understood.
This article is meant to serve as an introduction to the business of High Performance Computing infrastructure through the lens of Interconnects.
Our call is that interconnects are likely to be more insulated from risk of AI datacenter capex moderating by retaining upside from efforts in custom silicon (for inference now and, potentially, in the future for training). This is an area we believe is rife for innovation and will only become more important as we progress through AI’s development.
Without further ado…
Interconnects 101: Our First Intra-Thematic Primer
Have you ever wondered what makes a GPU cluster so special?
Why are 8, 72, or 32,768 GPUs together better than 1?
How do peripheral devices (like GPUs) connect and communicate with each other to work together?
For that matter, have you ever wondered how GPUs even work or why they work better for AI?
It might surprise you that the methods employed in hyperscale data centers are essentially souped-up versions of the standards and technologies that drive our personal machines.
This cast of characters includes ethernet, fiber optics, and PCIe (Peripheral Component Interconnect Express) that you might know from your home internet, TV, and high-end consumer electronics respectively.
In order to elaborate on why interconnect technology plays such an important role in the progression of AI compute, we are first going to address that I know some of you have just been like this while you’ve been riding your long NVDA position to the moon:
Before diving-in to the connections, why do we even use GPUs for artificial intelligence?
Answering this question is key for a holistic understanding of why interconnects matter and how the parallel computing paradigm that has enabled AI will continue to result in a bullish environment for interconnects (regardless of whether NVDA GPUs or custom silicon is used in the future).
The short answer is:
GPUs were originally designed to process and render graphics (hence Graphics Processing Unit) by performing rapid mathematical calculations on thousands of pixels in parallel and are now widely used for AI due to their ability to efficiently perform the massive amounts of parallel computations (floating point operations per second or FLOPS) required for training AI models.
The long answer is:
To understand why GPUs are used for AI, let's go back in time to 2007 (be sure not to take a detour at 2008 if you value your portfolio).
This was when NVIDIA launched its first server-scale computing accelerator, the Tesla 870 GPU Computing Server. At the time, Intel's top data center chip was the Xeon X5460.
The key difference was in their architecture:
Intel's Xeon had 4 powerful cores (3.16 GHz & 12MB L2 Cache Speed - if you want to be a nerd about it)
NVIDIA's Tesla had 512 smaller cores (called CUDA cores, with a much slower clock speed of 1.35GHz)
While each NVIDIA core was slower and less sophisticated than an Intel core, the sheer number of cores made a big difference. Think of it like having many hands (cores) working together can often accomplish more than a few very skilled hands, especially when the task involves repeating similar calculations over and over.
This design proved ideal for AI and machine learning tasks, which involve performing relatively simple operations but on an enormous scale. GPUs could process vast amounts of data in parallel, making them much more efficient for these specific tasks than traditional CPUs.
CPUs can be generally framed as doing one thing very well, while GPUs can do hundreds or thousands of things at the same time - perhaps not as well - which combines together to supercharge the performance of AI/ML training workloads.
In more complete terms, GPUs are optimized for data parallelism, where the same operation is performed on many data elements in parallel. This is different from CPUs, which are optimized for task parallelism, where different operations are performed on the same or different data.
While each of the cores may be 40% of the speed of a CPU core, having 128 times more free-thinking brains really is better than the lesser alternative, keeping in mind that AI and machine learning are generally rather straightforward operations done on a mind-bending scale.
This shift marked the beginning of GPUs' rise in the world of high-performance computing and AI. This article is going to make a point of demonstrating how this fundamental difference in architecture has shaped the evolution of computing hardware and made interconnects and the speed and transfer rates they enable (while the bottlenecks downstream they avoid) much more important than they’ve been in previous computing paradigms.
Get it?
So CPUs can be contextualized like this:
While GPUs can be contextualized like this:
You’re going to need a lot of communication if you want Rain Man, great at a specific thing, to be able to keep up with multidimensional kung fu. But, then again,…
Two heads are better than one, they say.
Does that imply that three is better than two?
Is the aforementioned 32,768 better than any sum less? Yes and no.
The saying has to be qualified. Many heads are better than one, but not if every head is doing a separate task and they all can only communicate via carrier pigeon (or even worse, can’t communicate at all).
That’s where interconnects come in.
From “too many cooks in the kitchen” to “the right hand not knowing what the left is doing” there are any number of metaphors that convey a feeling of significantly diminishing returns and/or diseconomies of scale.
For now, let’s think about that hypothetical kitchen in which there are too many cooks.
Step into a bustling restaurant kitchen.
All of the burners, frying pans, and razor-sharp knives in the world won't guarantee a flawless dinner service if the staff can't move freely between stations or access ingredients when needed. In high performance computing, much like in a highly performing kitchen, raw materials and finished goods get to where they are needed, not a minute (or microsecond) too late or too soon, lest we risk a digital equivalent of burning a dish or serving it cold.
There’s a method to this madness, however.
Wet and dry ingredients are strategically organized, hot and cold elements carefully managed (or “buffered”), intermediate and dependent items scheduled to be plated right as the item hits the server’s window.
Most often, different corners of the kitchen need to communicate with each other, but rarely do they need to repeat instructions. A fry cook instinctively knows to have the fries hot and ready when a burger is ordered, there’s no reason to command them to do so every time. Just as a GPU knows to process certain data without being explicitly told each time, this is how data can be moved around as fast as is necessary for AI/ML.
When multiple orders come in, they're handled in parallel – four burgers sizzle on the grill simultaneously, not one after another. And just as chefs prep ingredients hours before service, AI/ML engineers engage in data preprocessing – cleaning, annotating, and formatting data for machine consumption.
The devices in our “kitchen” have only gotten more and more powerful.
But how do these powerful devices communicate with each other? While there are numerous ways to move bits between physical components, only a select few can meet the demanding requirements of artificial intelligence and machine learning. These highly specialized, mission-critical conduits are called interconnects.
Just as a well-organized kitchen needs efficient pathways for chefs to move and communicate, a high-performance computing system requires robust interconnects to function at its peak. Interconnects are the nervous system of our digital kitchen, ensuring that data flows swiftly and accurately between components.
Let's break down some key concepts in interconnect technology: