The internet is freaking out about DeepSeek. NVIDIA stock is down 15%. Let's cut through the hype and noise - what is DeepSeek and why does it matter?
DeepSeek R1 is a new open-source model released by a small team working at a Chinese quant hedge fund, purportedly as a side project.
The model has achieved spectacular price/performance ratios, coming close to OpenAI's o1 model in benchmarks but at roughly 1/30 of the price.
It's actually so inexpensive to run that it can be done on a single computer with a gaming GPU. Doesn't require a data center.
DeepSeek claims it cost only $5M to train (vs. $500M for o1), although that claim seems highly suspect.
Why this is so important:
1. R1 is roughly equivalent to o1, which is one generation behind the state of the art (o3). Because it's fully open-source, anyone can use it in their projects. This means the value of SOTA minus 1 models is effectively brought down to zero.
2. This is fantastic news for PointOne and others building at the application layer. It will force all model providers to release the latest models faster and cheaper.
3. The speed of model distillation is staggering. Within a month of a new SOTA model coming out, it can be distilled down to a small model that can run at 1% of the cost.
4. The reinforcement learning training methods used in R1/o1/o3 (which I've written about previously) are also now open-sourced, which allows the entire AI community to build better reasoning models faster.
5. The commoditization of SOTA intelligence shifts value at the model layer to other kinds of capabilities like agentic behavior.
Cutting through the hype:
- Will the Chinese government steal my data? Only if you use the official DeepSeek app. But the model weights themselves are fully open-source so you can run them anywhere you want and keep your data private
- Does this destroy the value of NVIDIA? In my opinion, not at all. The long-term value of NVIDIA will mostly be in inference, not training. Imagine how much compute is needed to run the 10B (or 100B or 1T) AI agents who will be doing all the world's knowledge work in 20 years. We're still going to need GPUs to run that. Even if it's a lot more efficient than today's models, we should not underestimate the latent demand for intelligence.
- Does this threaten all the closed-source model companies? It certainly lights a fire under them to move faster. I still think that advances at the cutting edge will be made by closed-source labs, and there will be value to be captured there. But those advances will percolate into the open-source at a much faster rate than we all expected
Overall: this is a great development for humanity.
Originally published on LinkedIn