Member-only story
How does DeepSeek’s ‘mixture-of-experts’ architecture improve performance
4 min readJan 29, 2025
DeepSeek, a Chinese AI startup, has rapidly gained global prominence due to its combination of technical innovation, cost efficiency, and strategic market positioning. Here’s an analysis of the key factors driving its popularity:
🔹Technological Advancements
- Mixture-of-Experts Architecture: DeepSeek’s models use a segmented design where specialized submodels (“experts”) activate only when needed. For example, its V3 model has 671 billion parameters but uses just 37 billion per task, reducing computational costs.
- Reinforcement Learning Focus: Unlike OpenAI’s Supervised Fine-Tuning (SFT), DeepSeek prioritized Reinforcement Learning (RL) for training, enabling faster reasoning capabilities at lower resource costs.
- Benchmark Dominance: The R1 model achieved a 73.78% pass rate on coding benchmarks (HumanEval) and 84.1% on math tasks (GSM8K), outperforming competitors like ChatGPT
— Cost Efficiency
- Developed with a $5.5 million budget — a fraction of the billions spent by U.S. firms — DeepSeek challenges the notion that AI advancement requires massive investment.
- Its models reportedly use 3–5% of the computational resources required by an…