Yeah, you’ve convinced me I was a little too weak just by saying “the scaling laws are untested”—I had the same feeling of like “maybe I’m getting Eulered here, and maybe they’re Eulering themselves” with the 10^23 thing.
Mostly I just kept seeing suggested articles in the mainstream-ish tech press about this “wow, no MatMul” thing, assumed it was an overhyped exaggeration/misleading, and was pleasantly surprised it was for real (as far as it goes). But I’d give it probably… 15%? Of having industrial use cases in the next few years. Which I guess is actually pretty high! Could be nice for really really huge context windows, where scaling on input token length sucks.
Yeah, you’ve convinced me I was a little too weak just by saying “the scaling laws are untested”—I had the same feeling of like “maybe I’m getting Eulered here, and maybe they’re Eulering themselves” with the 10^23 thing.
Mostly I just kept seeing suggested articles in the mainstream-ish tech press about this “wow, no MatMul” thing, assumed it was an overhyped exaggeration/misleading, and was pleasantly surprised it was for real (as far as it goes). But I’d give it probably… 15%? Of having industrial use cases in the next few years. Which I guess is actually pretty high! Could be nice for really really huge context windows, where scaling on input token length sucks.