Thematic Memo: Inference on Device

Positioning for Edge AI in Phase 2

Jul 08, 2024

∙ Paid

I wanted to expand a bit on the brief discussion we presented regarding Inference at the Edge in our piece on Interconnects as it relates to mobile devices, as well as check in on our original thesis from June 2nd about going long AAPL for AI and selling Google.

Apple is up more than 19% in the 5 weeks since we wrote about that swap, while Google is only up ~7.5%. But the reaction of Apple to their WWDC presentation, specifically to the Edge AI aspect, has had me thinking about what Phase 2 looks like at the edge and on mobile devices since.

TradingView chart — Created with TradingView

We touched a bit on this in our Interconnects piece as well.

Interconnects 101

Citrini and Renny Zucker

July 3, 2024

Read full story

Here’s what we spoke about last week:

Interestingly, large language model inference - the process of using a trained model to make predictions or decisions based on new data - can sometimes be considered general compute, as they are based on ad hoc, single instance-based logic operations. Currently, inference often occurs in the same data centers as training, which can be inefficient.
This brings us to an important distinction in how data is processed and transferred in different computing scenarios, which directly impacts the types of interconnects required.

Hyperscale GPU clusters utilize a parallel computing model called SIMD (Single Instruction-Multiple Data) operations, akin to a WW2 B-17 assembly line.
In this model, the same instruction is executed simultaneously on multiple data points.
Inference, on the other hand, typically involves processing smaller amounts of data, often in real-time. In this sense, inference can be contextualized like the work of a Swiss watchmaker, emphasis is on speed and meticulous precision rather than scale. Each inference request is unique and requires careful handling to ensure accurate and timely results.
Given these differences, it's becoming increasingly clear that the one-size-fits-all approach of using the same data centers and hardware for both training and inference may not be optimal. Instead, there's a growing trend towards specialized hardware and edge computing for inference.
Indeed, we may eventually see new specialized hardware (ASICs) for training. But ASICs for Inference are already a reality (see: Groq, EdgeCortix) not a very commercially viable one right now. They need to be better than NVDA solutions specifically for inference while being more economical (Groq, for example, passes the first test while miserably failing the second).

I’ve found myself talking more and more about the consumer side of AI, rather than the business side or the data center side (which I still find myself speaking about a lot).

I had a long conversation this weekend with two friends in which I evangelized the needs for inference at the edge.

They’re both professionals, he is an engineer and she is a data scientist. They were talking about how nobody will need an honest to god AI device. They thought that was silly.

I asked for my buddy’s iPhone to demonstrate something. I texted a friend “we’re coming over” and we headed out.

When we got into the car and their phone connected to CarPlay, it immediately gave a notification for directions to that friend’s house.

I pointed it out and said, “Your iPhone knows where you’re going. it knows what you’re doing. It knows who your mom is. Do you want that kind of convenience to be aided by AI? Do you understand how powerful it’ll be to have an AI assistant that understands your whole life - what you did last week and what you’re planning on doing over the next month?”

“Yes”, he said, hesitant that I was definitely leading him into agreeing with something he’d been disagreeing with all night.

“And do you want that deeply, deeply personal information to ever end up on a server in Cupertino instead of right here on the local storage of your device?”

“No…of course not…”

“Okay, so Apple is going to train models in their big data centers to specify themselves to your life. But the data remains on your phone. Now do you agree with why we have to do inference for the next generation of AI assistants at the edge, on your device?”

I converted him.

So I’ve been brainstorming a bit about where the risk/reward is now that Apple is pricing in significantly bullish outcomes.

This afternoon, I’m more bullish on this use case than anything else. I don’t want to bring a timeframe up so let’s say it’ll either be the iPhone 16 or 17 (although this will be a distinct reason for changing the version convention. Maybe we switch to iPhone alpha…the numbers are getting old). That phone will be doing inference at the edge and integrate with Apple’s specialist and large language models. It will be absolutely necessary. And Apple’s head start with their walled garden ecosystem will be HUGE.

This will drive the first truly massive replacement cycle for the iPhone in the last 7-8 years (at least).

Yes, I’ve been saying this since the end of May. And yes we bought a chunk of Apple in the AI basket that’s up nearly 30%. The reason I’m thinking about this is twofold:

I think this defines a huge portion of what I have termed Phase 2 AI Beneficiaries. We are running back the picks and shovels playbook but now the mine (or the gold? I don’t know it’s way too early and too Monday for analogies) is the edge.
People keep messaging me telling me they missed the run in Apple because they were too lazy to read the article when it first came out and now they want to do that thing where they buy the other names. To be clear - the biggest beneficiary of this will be Apple. But we can still explore.

Apple has always been serious about privacy, and Inference at the Edge for AI/ML will mean that strategy will see the fruits of its labor multiply significantly.

Here’s an example of how something like Federated Learning works:

Edge AI still needs those models which are trained by NVDA GPUs, of course…

There’s many ways that this technology can advance to ensure that we can have AI agents that know everything about us without it also meaning we have FBI agents that know everything about us (I personally feel we have probably already lost that war and it might be worth considering that we should just maximize our own gain for having completely lost our privacy BUT that’s not the consumer’s default position…I digresss…). And Apple is at the forefront, which has been pretty rapidly priced in (or at least begun to be).

I love it when the market recognizes a view I’m expressing is more correct than what’s priced in, but I also recognize that when the market evolves to take my previously-different view as a base case the risk reward becomes less asymmetric.

So I’m looking at the other areas that benefit from an AI driven replacement cycle as well as Edge AI proliferation.

We have went into what edge AI looks like in many of our previous articles beginning in December 2023. We also just explained why inference is a lot different (with different computing requirements) than training. Now let’s explain what the picks and shovels of the “Intelliphone” are.

…

Never mind, I’m never using that word again Im going to stick with “AI edge devices” so I don’t feel like a total loser

…

I’m looking at other areas that will benefit from the iPhone as an AI edge device. It doesn’t necessarily have to be providing the AI (inference), it just has to benefit from that replacement cycle.

Why do I think this will be such a massive replacement cycle?

When I look at the key announcements related to AI and edge computing from WWDC 2024, my instinct is that the next device will be purpose built for Edge AI.

And I think that will end up reducing some backwards compatibility that has made the replacement cycles less impressive recently. To review, here’s the AI features Apple discussed at WWDC (generated by perplexity) - which I believe will rapidly accelerate in the next models:

Apple Intelligence: Apple unveiled its new personal intelligence system called Apple Intelligence, which integrates powerful generative AI models into iPhone, iPad, and Mac devices. This system combines on-device processing with cloud-based capabilities to deliver personalized and context-aware intelligence.
On-device AI processing: A cornerstone of Apple Intelligence is on-device processing, which allows many AI models to run entirely on the device, ensuring privacy and security.
Private Cloud Compute: For more complex AI tasks requiring additional processing power, Apple introduced Private Cloud Compute. This technology extends the privacy and security of Apple devices into the cloud, allowing for more advanced AI capabilities while maintaining user privacy.
- AI-powered features: Apple announced several AI-enhanced features across its operating systems, including:
- -Image generation and text summarization in native applications
- -Enhanced Siri with better app control and understanding of user input
- -AI-powered photo searches, object removal, and transcriptions in the Photos app
- Email summarization and response generation
- Custom emoji creation (Genmoji) and AI image generation (Image Playground)
ChatGPT integration: Apple is integrating ChatGPT access into iOS 18, iPadOS 18, and macOS Sequoia, allowing users to leverage its capabilities within the Apple ecosystem.
Privacy focus: Apple emphasized its commitment to privacy in AI, with features like IP address obscuring and no request storage when accessing ChatGPT.

Apple Intelligence will be available on iPhone 15 Pro, iPhone 15 Pro Max, and iPad and Mac devices with M1 chips or newer, highlighting the importance of powerful edge computing capabilities for these AI features. I think by the next generation, the capabilities will be night and day.

The most striking difference between someone of average intelligence and someone of remarkable intelligence isn't their ability to complete tasks… it's their ability to complete tasks quickly and adjust behaviors dynamically. AI models of the future won't simply be about inference, or forcing variable inputs through the meat grinder that is a transformer or CNN model, but they will also be training on the fly, fine-tuning their local versions of more complex, yet smaller and more specialized sub-models, creating a model of models.

This cascading waterfall of ever more efficient but narrow-minded layers alleviates two of the biggest concerns in model deployment, memory and storage requirements, but brings out the bogeyman that is latency. Latency is generally defined as the time needed to complete a call and response, a rather linear measure that is clearly measurable. Perhaps, though, as our models become more myopic as our needs become more specific, we need to think about latency exponentially and combat this order of magnitudes increase in dimensionality by taking collocation to the absolute extreme.

The reality, right now, is that models of this type exist and, in fact, are preferred by researchers for complex tasks. The complexity of them is not even limited by our understanding of machine learning or data science, but still our finite and scarce compute resources. Remember, this isn't about simply making models bigger so simply more RAM isn't going to do the trick, but miniaturized super computers will become the norm where specialization (serialization) and parallelization work in concert to turn that LLM meat grinder into a surgeon's scalpel to extract only the information needed using the bare minimum tools required. To manage this though, high performance logic silicon will need to be placed physically alongside low-latency memory, lightning fast storage, and a hint of acceleration.

This approach aligns perfectly with the goals of edge AI and on-device inference, particularly in the context of Apple's potential next-generation iPhones. Apple's vertical integration and custom silicon design put them in a unique position to implement this "model of models" approach. Their A-series chips, with specialized neural engines and tightly integrated high-bandwidth memory, could support these complex, layered AI systems.

If Apple can successfully implement this type of AI system in its walled garden before or better than competitors, it could represent a significant leap in smartphone capabilities. This could potentially trigger a major replacement cycle, benefiting not just Apple but also its entire supply chain. If it doesn’t and it gets beat to the punch, or users care less about Apple’s track record in privacy and their trust in the brand (not my base case) we can reduce our risk by owning the beneficiaries of the edge AI changes regardless.

According to IDC forecasts, AI smartphone shipments are expected to surge over 200% year-over-year to 170 million units in 2024, accounting for approximately 15% of total smartphone shipments. This rapid growth is anticipated to accelerate beyond 2024 as industry players aggressively push towards more advanced chips and evolving use cases. In the Chinese market alone, AI phone shipments are projected to reach 150 million units by 2027, representing a 4-year CAGR of 97% from 2023 levels.

The transition to AI smartphones necessitates significant hardware upgrades, including more powerful SoCs, increased DRAM, enhanced microphones, advanced cooling systems, and improved cameras. This shift is likely to result in larger bill of materials (BOM) costs for manufacturers.

For instance, while current high-end smartphones typically feature 8GB or 16GB of DRAM, next-generation AI smartphones are expected to require a minimum of 16GB, with potential for further increases. These hardware advancements, coupled with the integration of on-device AI capabilities, are expected to provide compelling reasons for consumers to upgrade their devices, potentially triggering the "first truly massive iPhone replacement cycle in a decade" that industry observers have been anticipating.

That is my thinking this afternoon.

So I’ve begun going through my Monday brain - cui bono (besides Apple)?

We’ve been throwing this back and forth while Apple rallied and I’ve had many side conversations, so it’s time to consolidate them and publish my musings. Now let’s get into the tickers.

Paid subscribers get access to our Edge AI basket and a review of our current exposures that benefit from this next phase of Artificial Intelligence in equity markets.

Citrini Research

Thematic Memo: Inference on Device

Positioning for Edge AI in Phase 2

Interconnects 101

This post is for paid subscribers