I don’t really understand the implicit model where AI companies recognize that having a good thought assessor is the critical barrier to AGI, they put their best minds on solving it, and it seems like you think they just fail because it’s the single incomparably hard human capability.
It seems plausible that the diagnosis of what is missing is correct, but strongly implausible that it’s fundamentally harder than other parts of the puzzle, much less hard in ways that AI companies would need a decade to tackle. In my modal case, once they start, I expect progress to follow curves similar to every other capability they develop.
If thought assessment is as hard as thought generation and you need a thought assessor to get AGI (two non-obvious conditionals), then how do you estimate the time to develop a thought assessor? From which point on do you start to measure the amount of time it took to come up with the transformer architecture?
The snappy answer would be “1956 because that’s when AI started; it took 61 years to invent the transformer architecture that lead to thought generation, so the equivalent insight for thought assessment will take about 61 years”. I don’t think that’s the correct answer, but neither is “2019 because that’s when AI first kinda resembled AGI”.
Keep in mind that we’re now at the stage of “Leading AI labs can raise tens to hundreds of billions of dollars to fund continued development of their technology and infrastructure.” AKA in the next couple of years we’ll see AI investment comparable to or exceeding the total that has ever been invested in the field. Calendar time is not the primary metric, when effort is scaling this fast.
A lot of that next wave of funding will go to physical infrastructure, but if there is an identified research bottleneck, with a plausible claim to being the major bottleneck to AGI, then what happens next? Especially if it happens just as the not-quite-AGI models make existing SWEs and AI researchers etc. much more productive by gradually automating their more boilerplate tasks. Seems to me like the companies and investors just do the obvious thing and raise the money to hire an army of researchers in every plausibly relevant field (including math, neurobiology, philosophy, and many others) to collaborate. Who cares if most of the effort and money are wasted? The payoff for the fraction (faction?) that succeeds isn’t the usual VC target of 10-100x, it’s “many multiples of the current total world economy.”
Transformers work for many other tasks, and it seems incredibly likely to me that the expressiveness includes not only game playing, vision, and language, but also other things the brain does. And to bolster this point, the human brain doesn’t use two completely different architectures!
So I’ll reverse the question; why do you think the thought assessor is fundamentally different from other neural functions that we know transformers can do?
I do think the human brain uses two very different algorithms/architectures for thought generation and assessment. But this falls within the “things I’m not trying to justify in this post” category. I think if you reject the conclusion based on this, that’s completely fair. (I acknowledged in the post that the central claim has a shaky foundation. I think the model should get some points because it does a good job retroactively predicting LLM performance—like, why LLMs aren’t already superhuman—but probably not enough points to convince anyone.)
The transformer architecture was basically developed as soon as we got the computational power to make it useful. If a thought assessor is required and we are aware of the problem, and we have literally billions in funding to make it happen, I don’t expect this to be that hard.
I don’t really understand the implicit model where AI companies recognize that having a good thought assessor is the critical barrier to AGI, they put their best minds on solving it, and it seems like you think they just fail because it’s the single incomparably hard human capability.
It seems plausible that the diagnosis of what is missing is correct, but strongly implausible that it’s fundamentally harder than other parts of the puzzle, much less hard in ways that AI companies would need a decade to tackle. In my modal case, once they start, I expect progress to follow curves similar to every other capability they develop.
If thought assessment is as hard as thought generation and you need a thought assessor to get AGI (two non-obvious conditionals), then how do you estimate the time to develop a thought assessor? From which point on do you start to measure the amount of time it took to come up with the transformer architecture?
The snappy answer would be “1956 because that’s when AI started; it took 61 years to invent the transformer architecture that lead to thought generation, so the equivalent insight for thought assessment will take about 61 years”. I don’t think that’s the correct answer, but neither is “2019 because that’s when AI first kinda resembled AGI”.
Keep in mind that we’re now at the stage of “Leading AI labs can raise tens to hundreds of billions of dollars to fund continued development of their technology and infrastructure.” AKA in the next couple of years we’ll see AI investment comparable to or exceeding the total that has ever been invested in the field. Calendar time is not the primary metric, when effort is scaling this fast.
A lot of that next wave of funding will go to physical infrastructure, but if there is an identified research bottleneck, with a plausible claim to being the major bottleneck to AGI, then what happens next? Especially if it happens just as the not-quite-AGI models make existing SWEs and AI researchers etc. much more productive by gradually automating their more boilerplate tasks. Seems to me like the companies and investors just do the obvious thing and raise the money to hire an army of researchers in every plausibly relevant field (including math, neurobiology, philosophy, and many others) to collaborate. Who cares if most of the effort and money are wasted? The payoff for the fraction (faction?) that succeeds isn’t the usual VC target of 10-100x, it’s “many multiples of the current total world economy.”
Transformers work for many other tasks, and it seems incredibly likely to me that the expressiveness includes not only game playing, vision, and language, but also other things the brain does. And to bolster this point, the human brain doesn’t use two completely different architectures!
So I’ll reverse the question; why do you think the thought assessor is fundamentally different from other neural functions that we know transformers can do?
I do think the human brain uses two very different algorithms/architectures for thought generation and assessment. But this falls within the “things I’m not trying to justify in this post” category. I think if you reject the conclusion based on this, that’s completely fair. (I acknowledged in the post that the central claim has a shaky foundation. I think the model should get some points because it does a good job retroactively predicting LLM performance—like, why LLMs aren’t already superhuman—but probably not enough points to convince anyone.)
The transformer architecture was basically developed as soon as we got the computational power to make it useful. If a thought assessor is required and we are aware of the problem, and we have literally billions in funding to make it happen, I don’t expect this to be that hard.