It might underwhelm for the simple reason that the really high end AIs take too much hardware to find and run.
I think that’s basically equivalent to my claim, accounting for the differences between our models. I expect this part to be non-trivially difficult (as in, not just “scale LLMs”). People would need to basically roll a lot of dice on architectures, in the hopes of hitting upon something that works[1] – and it’d both take dramatically more rolls if they don’t have a solid gears-level vision of AGI (if they’re just following myopic “make AIs more powerful” gradients), and the lack of said vision/faith would make this random-roll process discouraging.
So non-fanatics would get there eventually, yes, by the simple nature of growing amounts of compute and numbers of experiments. But without a fanatical organized push, it’d take considerably longer.
This would be consistent with a preliminary observation about how long it takes to solve mathematical conjectures: while inference is rendered difficult by the exponential growth in the global population and of mathematicians, the distribution of time-to-solution roughly matches a memoryless exponential distribution (one with a constant chance of solving it in any time period) rather than a more intuitive distribution like a type 1 survivorship curve (where a conjecture gets easier to solve over time, perhaps as related mathematical knowledge accumulates), suggesting a model of mathematical activity in which many independent random attempts are made, each with a small chance of success, and eventually one succeeds
People would need to basically roll a lot of dice on architectures, in the hopes of hitting upon something that works
How much is RSI going to help here? This is already what everyone does for hyperparameter searches—train another network to do them—an AGI architecture, aka “find me a combination of models that will pass this benchmark” seems like it would be solvable with such a search.
The way I model it, RSI would let GPU rich but more mediocre devs find AGI. They won’t be first unless hypothetically they don’t get the support of the S tier talent, say they are in a different country.
Are you sure there are timelines where “decades” of delay, if open source models exist and GPUs exist in ever increasing and more powerful quantities is really possible?
I expect that sort of brute-force-y approach to take even longer than the “normal” vision-less meandering-around.
Well, I guess it can be a hybrid. The first-to-AGI would be some group that maximizes the product of “has any idea what they’re doing” and “how much compute they have” (rather than either variable in isolation). Meaning:
Compute is a “great equalizer” that can somewhat compensate for lack of focused S-tier talent.
But focused S-tier talent can likewise somewhat compensate for having less compute.
That seems to agree with your model?
And my initial point is that un-focusing the S-tier talent would lengthen the timelines.
Are you sure there are timelines where “decades” of delay, if open source models exist and GPUs exist in ever increasing and more powerful quantities is really possible?
I think that’s basically equivalent to my claim, accounting for the differences between our models. I expect this part to be non-trivially difficult (as in, not just “scale LLMs”). People would need to basically roll a lot of dice on architectures, in the hopes of hitting upon something that works[1] – and it’d both take dramatically more rolls if they don’t have a solid gears-level vision of AGI (if they’re just following myopic “make AIs more powerful” gradients), and the lack of said vision/faith would make this random-roll process discouraging.
So non-fanatics would get there eventually, yes, by the simple nature of growing amounts of compute and numbers of experiments. But without a fanatical organized push, it’d take considerably longer.
That’s how math research already appears to work:
How much is RSI going to help here? This is already what everyone does for hyperparameter searches—train another network to do them—an AGI architecture, aka “find me a combination of models that will pass this benchmark” seems like it would be solvable with such a search.
The way I model it, RSI would let GPU rich but more mediocre devs find AGI. They won’t be first unless hypothetically they don’t get the support of the S tier talent, say they are in a different country.
Are you sure there are timelines where “decades” of delay, if open source models exist and GPUs exist in ever increasing and more powerful quantities is really possible?
I expect that sort of brute-force-y approach to take even longer than the “normal” vision-less meandering-around.
Well, I guess it can be a hybrid. The first-to-AGI would be some group that maximizes the product of “has any idea what they’re doing” and “how much compute they have” (rather than either variable in isolation). Meaning:
Compute is a “great equalizer” that can somewhat compensate for lack of focused S-tier talent.
But focused S-tier talent can likewise somewhat compensate for having less compute.
That seems to agree with your model?
And my initial point is that un-focusing the S-tier talent would lengthen the timelines.
Sure? No, not at all sure.