DeepSeek-V3 is out today, with weights and a paper published. Tweet thread, GitHub, report (GitHub, HuggingFace). It’s big and mixture-of-experts-y; discussion here and here.
It was super cheap to train — they say 2.8M H800-hours or $5.6M (!!).
This is depressing, but not surprising. We know the approximate processing power of brains (O(1e16-1e17flops) and how long it takes to train them, and should expect that over the next few years the tricks and structures needed to replicate or exceed that efficiency in ML will be uncovered in an accelerating rush towards the cliff as computational resources needed to attain commercially useful performance continue to fall. AI Industry can afford to run thousands of experiments at this cost scale.
Within a few years this will likely see AGI implementations on Nvidia B200 level GPUS (~1e16flop). We have not yet seen hardware application of the various power-reducing computational ‘cheats’ for mimicking multiplication with reduced gate counts that are likely to see a 2-5x performance gain at same chip size and power draw.
We know the approximate processing power of brains (O(1e16-1e17flops)
This is still debatable, see Table 9 is the brain emulation roadmap https://www.fhi.ox.ac.uk/brain-emulation-roadmap-report.pdf. You are referring to level 4 (SNN), but level 5 is plausible imo (at 10^22) and 6 seems possible (10^25), and of course it could be a mix of levels.
DeepSeek-V3 is out today, with weights and a paper published. Tweet thread, GitHub, report (GitHub, HuggingFace). It’s big and mixture-of-experts-y; discussion here and here.
It was super cheap to train — they say 2.8M H800-hours or $5.6M (!!).
It’s powerful:
It’s cheap to run:
This is depressing, but not surprising. We know the approximate processing power of brains (O(1e16-1e17flops) and how long it takes to train them, and should expect that over the next few years the tricks and structures needed to replicate or exceed that efficiency in ML will be uncovered in an accelerating rush towards the cliff as computational resources needed to attain commercially useful performance continue to fall. AI Industry can afford to run thousands of experiments at this cost scale.
Within a few years this will likely see AGI implementations on Nvidia B200 level GPUS (~1e16flop). We have not yet seen hardware application of the various power-reducing computational ‘cheats’ for mimicking multiplication with reduced gate counts that are likely to see a 2-5x performance gain at same chip size and power draw.
Humans are so screwed.
This is still debatable, see Table 9 is the brain emulation roadmap https://www.fhi.ox.ac.uk/brain-emulation-roadmap-report.pdf. You are referring to level 4 (SNN), but level 5 is plausible imo (at 10^22) and 6 seems possible (10^25), and of course it could be a mix of levels.