• Turing Post
  • Posts
  • šŸŒ#85: Curiosity, Open Source, and Timing: The Formula Behind DeepSeekā€™s Phenomenal Success

šŸŒ#85: Curiosity, Open Source, and Timing: The Formula Behind DeepSeekā€™s Phenomenal Success

How an open-source mindset, relentless curiosity, and strategic calculation are rewriting the rules in AI and challenging Western companies, plus an excellent reading list and curated research collection

Turing Post is reader-supported publication. Upgrade to support us. Thank you

When we first covered DeepSeek models in August 2024 (we are opening that article for everyone, do read it), it didnā€™t gain much traction. That surprised me! Back then, DeepSeek was already one of the most exciting examples of curiosity-driven research in AI, committed to open-sourcing its discoveries. They also employed an intriguing approach: unlike many others racing to beat benchmarks, DeepSeek pivoted to addressing specific challenges, fostering innovation that extended beyond conventional metrics. Even then, they demonstrated significant cost reductions.

ā€œWhatā€™s behind DeepSeek-Coder-V2 that makes it so special it outperforms GPT-4 Turbo, Claude-3 Opus, Gemini 1.5 Pro, Llama 3-70B, and Codestral in coding and math?

DeepSeek-Coder-V2, costing 20ā€“50x less than other models, represents a major upgrade over the original DeepSeek-Coder. It features more extensive training data, larger and more efficient models, improved context handling, and advanced techniques like Fill-In-The-Middle and Reinforcement Learning.ā€(Inside DeepSeek Models)

Although DeepSeek was making waves in the research community, it remained largely unnoticed by the broader public. But then they released R1-Zero and R1.

With that release they crushed industry benchmarks and disrupted the market by training their models at a fraction of the typical cost. But do you know what else they did? Not only did they prove that reinforcement learning (RL) is all you need in reasoning (R1 stands as solid proof of how well RL works), but they also embraced a trial-and-error approach ā€“ fundamental to RL ā€“ for their own business strategies. Previously overlooked, they calculated this release of R1 meticulously. Did you catch the timing? It was a strategic earthquake that shook the market and left everyone reeling:

  1. As ChinaTalk noticed: ā€œR1's release during President Trumpā€™s inauguration last week was clearly intended to rattle public confidence in the United Statesā€™ AI leadership at a pivotal moment in US policy, mirroring Huawei's product launch during former Secretary Raimondo's China visit. After all, the benchmark results of an R1 preview had already been public since November.ā€

  2. The release happened just one week before the Chinese Lunar New Year (this year on January 29), which typically lasts 15 days. However, the week leading up to the holiday is often quiet, giving them a perfect window to outshine other Chinese companies and maximize their PR impact.

So, while the DeepSeek family of models serves as a case study in the power of open-source development paired with relentless curiosity (from an interview with Liang Wenfeng, DeepSeekā€™s CEO: ā€œMany might think there's an undisclosed business logic behind this, but in reality, it's primarily driven by curiosity.ā€), itā€™s also an example of cold-blooded calculation and triumph of reinforcement learning applied to both models and humans :). DeepSeek has shown a deep understanding of how to play Western games and excel at them. Of course, todayā€™s market downturn, though concerning to many, will likely recover soon. However, if DeepSeek can achieve such outstanding results, Western companies need to reassess their strategies quickly and clarify their actual competitive moats.

Worries about NVIDIA

Of course, weā€™ll still need a lot of compute ā€“ everyone is hungry for it. Thatā€™s a quote from Liang Wenfeng, DeepSeekā€™s CEO: ā€œFor researchers, the thirst for computational power is insatiable. After conducting small-scale experiments, there's always a desire to conduct larger ones. Since then, we've consciously deployed as much computational power as possible.ā€

So, letā€™s not count NVIDIA out. What we can count on is Jensen Huangā€™s knack for staying ahead to find the way to stay relevant (NVIDIA wasnā€™t started as an AI company, if you remember). But what the rise of innovators like DeepSeek could push NVIDIA to is to double down on openness. Beyond the technical benefits, an aggressive push toward open-sourcing could serve as a powerful PR boost, reinforcing Nvidiaā€™s centrality in the ever-expanding AI ecosystem.

As I was writing these words about NVIDIA, they sent a statement regarding DeepSeek: ā€œDeepSeek is an excellent AI advancement and a perfect example of Test Time Scaling. DeepSeekā€™s work illustrates how new models can be created using that technique, leveraging widely-available models and compute that is fully export control compliant. Inference requires significant numbers of NVIDIA GPUs and high-performance networking. We now have three scaling laws: pre-training and post-training, which continue, and new test-time scaling.ā€

So ā€“ to wrap up ā€“ the main takeaway from DeepSeek breakthrough is to do that:

  • open-source and decentralize

  • stay curiosity-driven

  • apply reinforcement learning to everything

For DeepSeek, this is just the beginning. As curiosity continues to drive its efforts, it has proven that breakthroughs come not from hoarding innovation but from sharing it. As we move forward, itā€™s these principles that will shape the future of AI.

We are reading (itā€™s all about šŸ³)

šŸ”³ Turing Post is now on šŸ¤— Hugging Face! You can read the rest of this article there (itā€™s free!) ā†’

Curated Collections

Has this email been forwarded to you? We highly recommend signing up!

Leave a review!

Login or Subscribe to participate in polls.

Reply

or to participate.