Annapurna comments on Annapurna’s Shortform

Annapurna 28 Jan 2025 14:21 UTC
11 points
3
SUMMARY OF TAKES FOLLOWING THE RELEASE OF DEEPSEEK’S REASONING MODEL
WALL STREET
Oh my god! The DeepSeek team managed to train a model with less than $6M USD! This must mean that we do not need that many chips or energy to use GenAI! Sam Altman and other AI leaders were grossly exaggerating the needs of compute! AI stocks are super overvalued!
STARTUPS AND ENTERPRISES USING LLMS TO ENHANCE THEIR PRODUCTS
Did… did we just get an open-source model that reasons? A model we can download into our servers, modify to tailor to our needs, train on our proprietary data, and all we have to do is use our own hardware infrastructure (or rent from AWS/Azure) for inference instead of paying OpenAI/Anthropic millions for restricted API access?
AI SCIENTISTS AND ENGINEERS
Whoa! These engineers at DeepSeek are truly impressive! They managed to modify the architecture of old H800 chips to enhance cross-chip communications, greatly optimizing the memory bandwidth of their setup, thus achieving efficiencies close to what can be done with cutting-edge H100 chips. Imagine what they could do if they had access to H100 chips!