There has been a lot of interest in this going back to at least early this year and the 1.58bit LLM (ternary) logic paper https://arxiv.org/abs/2402.17764 so expect there has been a research gold rush and a lot of design effort going into producing custom hardware almost immediately that was revealed.
With Nvidia dual chip GB200 Grace Blackwell offering (sparse) 40Pflop fp4 at ~1kW there has already been something close to optimal hardware available—that fp4 performance may have been the reason the latest generation Nvidia GPU are in such high demand—previous generations haven’t offered it as far as I am aware. For comparison a human brain is likely equivalent to 10-100Pflops, though estimates vary.
Being able to up the performance significantly from a single AI chip has huge system cost benefits.
All suggesting that the costs for AI are going to drop yet again, and human level AGI operating costs are going to be measured in cents per hour when it arrives in a few years time.
The implications for autonomous robotics are likely tremendous, with potential OOM power savings likely to bring far more capable systems to smaller platforms, home robotics, fsd cars, and (scarily) military murderbots. Tesla has (according to Elon comments) a new HW5 autonomy chip coming out next year that is ~50x faster than their current FSD development baseline HW3 2 x 72Tflop chipset, but needs closer to 1kW power, so they will be extremely keen on implementing something that could save so much power.
They cannot just add an OOM of parameters, much less three.
How about 2 OOM’s?
HW2.5 21Tflops HW3 72x2 = 72 Tflops (redundant), HW4 3x72=216Tflops (not sure about redundancy) and Elon said in June that next gen AI5 chip for fsd would be about 10x faster say ~2Pflops
By rough approximation to brain processing power you get about 0.1Pflop per gram of brain so HW2.5 might have been a 0.2g baby mouse brain, HW3 a 1g baby rat brain HW4 perhaps adult rat, and upcoming HW5 a 20g small cat brain.
As a real world analogue cat to dog (25-100g brain) seems to me the minimum necessary range of complexity based on behavioral capabilities to do a decent job of driving—need some ability to anticipate and predict motivations and behavior of other road users and something beyond dumb reactive handling (ie somewhat predictive) to understand anomalous objects that exist on and around roads.
Nvidia Blackwell B200 can do up to about 10pflops of FP8, which is getting into large dog/wolf brain processing range, and wouldn’t be unreasonable to package in a self driving car once down closer to manufacturing cost in a few years at around 1kW peak power consumption.
I don’t think the rat brain HW4 is going to cut it, and I suspect that internal to Tesla they know it too, but it’s going to be crazy expensive to own up to it, better to keep kicking the can down the road with promises until they can deliver the real thing. AI5 might just do it, but wouldn’t be surprising to need a further oom to Nvidia Blackwell equivalent and maybe $10k extra cost to get there.