Scale42 thought piece: DeepSeek and accelerated AI

Perspective 29 January 2024 • By Jamie Stewart

Updated: 1 May 2024

Authors: Will Tasney, Thomas Beaton, Jamie Stewart

Conceptual illustration of accelerated AI compute

DeepSeek Disruption - Dispersion of AI Computing Demand

Introduction

DeepSeek's innovative approach to AI training has disrupted the market, however, the stated efficiency and cost claims have likely been overstated for two reasons. Firstly to avoid suspicion on breaking US Semi-conductor sanctions and secondly, to cause maximum damage when launched as a geopolitical grenade to challenge US AI leadership. We examine the technology, its implications and winners and losers and conclude that it aligns well with our long-term outlook for the AI sector set out in our pitch-deck.

Summary

Efficiency and cost gains made by DeepSeek have likely been overstated
The achievement by DeepSeek and resulting gain in efficiency was through using PTX as opposed to NVIDIA's CUDA as the programming language
This is not practical for most AI companies to achieve
Some reports indicate that DeepSeek have a lot more GPU resources than they have indicated, up to 60,000 GPUs as opposed to the 2,048 GPUs stated by DeepSeek
This model may have two years to develop sacrificing speed for lower cost
More open-source AI IP piece
Scale42 believes that more competition and open-source AI software will accelerate the requirement for mid-size AI training facilities that we specialise in

What is DeepSeek?

DeepSeek is an open source AI Mixture-of-Experts (MoE) language model, with 671 billion parameters stated to have used a specialized cluster of 2,048 Nvidia H800 GPUs (an H100 GPU variant developed by NVIDIA to not violate US sanctions).

It is claimed that in just two months with a small GPU stack, they trained an AI that is comparable or better than similar models developed by industry leaders such as OpenAI and Meta.

It is estimated that the computing intensity needed to reach a product of this level was 10x higher than stated, making DeepSeek 10x more efficient than competitors.

DeepSeek has innovated by using PTX (Parallel Thread Execution) programming instead of Nvidia's standard CUDA language used almost exclusively by the vast majority of the AI industry.

This is significant because NVIDIA and competing hardware providers (such as AMD) have produced GPUs with similar hardware specifications. Despite this NVIDIA has accumulated a majority market share for AI GPU sales, principally achieved due to NVIDIA's proprietary CUDA programming language which has become the industry standard for AI developers.

DeepSeek is a spinout from a Chinese quantitative hedge fund 'High-Flyer' which in turn is reported to have purchased an estimated 10,000 - 60,000 NVIDIA GPUs. The breakthrough in programming optimization has helped DeepSeek outperform conventional methods.

DeepSeek's Atomic Impact

The low cost, short time and minimal resources stated as being required to train DeepSeek has underpinned the market's reaction. Stated as costing just $6m to build something competitors have sunk billions into, and then releasing the IP for free, it undermined the sunk investments and balance sheets of IP holders like OpenAI.

However, the true cost of DeepSeek is not known and the $6m figure could be an exercise in creative accounting. Even taking DeepSeek at face value, 10,000 Nvidia A100/H800s costs an estimated $300m, 50,000 though would cost $1.5bn.

DeepSeek's announcement and release just days after the US announced its $500bn Stargate programme has led some to believe that it was co-opted by the CCP to undermine US confidence in the AI race.

Market Impact Analysis

Ultimately the biggest losers here are those companies that have pursued a model that seeks to rent their AI models. At the front of this is OpenAI's ChatGPT, Meta's Llama, Google's Gemini, Anthropic/Amazon's Claude among others.

If AI IP was in the DeepSeek blast zone, it was chipmakers that found themselves nearby suffering from significant radiation with shares down over 10%.

"If we acknowledge that DeepSeek may have reduced costs of achieving equivalent model performance by, say, 10x, we also note that current model cost trajectories are increasing by about that much every year anyway... we NEED innovations like this... as semi analysts we are firm believers in the Jevons paradox (i.e. that efficiency gains generate a net increase in demand)"
— Stacy Ragson, Semiconductor Analyst at Bernstein

Implications for Data Centre Infrastructure

Consider the sell-off in electricity companies near data centre hotspots. The market may be saying the electricity demand won't be where we thought, that is, near a few huge proprietary data centres run by giant US tech companies. Instead, AI will be run on smaller data centres all over the place.

Scale42's own experience has been that capital market participants have been searching for ever greater MW sites for placing data centres set to consume ever greater amounts of energy for large singular computing clusters. DeepSeek has put a question mark over the necessity of sites requiring hundreds of MW that have diseconomies of scale.

Our sites are located first and foremost for access to cheap, clean energy and will be brought to market to meet real world demand.

DeepSeek Supports Scale42's Long Term View

In Scale42's pitch, we set out 5 areas where we predicted industry disruption. DeepSeek fits into several of these themes:

Nvidia hardware → Competing hardware providers: DeepSeek abandoned CUDA for PTX programming, potentially opening doors for better performing chips
Generic LLM & AI Tools → Optimised Enterprise Specific AI Tools: DeepSeek catalyzes open source tools built without rent-seeking from established leaders
Few Cutting Edge Providers → Proliferation of Bespoke Providers: DeepSeek exemplifies market widening with new competitors building on its foundations

Scale42's Strategic Position

Our conclusions remain unchanged:

AI will entwine every level of the global economy as enterprises use their data to build custom AI tools
Enterprise AI will require a deeper market for mid-sized AI infrastructure
Technology less centralised on NVIDIA
More open-source tools lower the cost of AI tech and increase end demand for infrastructure

Scale42's approach to next phase of AI innovation:

Mid-scale training assets
Cost leader
Chip-agnostic

The DeepSeek disruption validates our thesis that the future of AI infrastructure lies not in massive centralized facilities, but in distributed, efficient, and adaptable mid-scale data centres positioned at sources of renewable energy.

← Back to Scale42 Home