Chance of discovering or verifying long-term solution(s): I’m not sure whether a “one shot” solution to alignment (that is, a single relatively “clean” algorithm which will work at all scales including for highly superintelligent models) is possible. But if it is, it seems like starting to do a lot of work on aligning narrowly superhuman models probably allows us to discover the right solution sooner than we otherwise would have.
Eliezer Yudkowsky: It’s not possible. Not for us, anyways. A textbook that fell out of a wormhole from the future might have the simplest straightforward working solution with no extra gears, all of whose pieces work reliably. We won’t get it in time because it takes multiple decades to go from sigmoid activation functions to ReLUs, and so we will definitely be working with the AGI equivalent of sigmoid activation functions instead of ReLUs while the world is ending. Hope that answers your question!
Here’s a model of how this works: Simple, powerful insights like “Use ReLU instead of sigmoid” take years/decades on average to achieve because there is a small probability of them being achieved at any given time. Some of them take minutes to achieve, some of them take decades, some take years, and it all depends on luck basically. The longer we wait, the more likely it is that someone stumbles across the idea, but there’s nothing preventing someone from stumbling across it on day one.
On this model—which seems plausible to me—Yudkowsky is wrong. There is a non-negligible chance of us having the relevant insights prior to AGI (if you think it will take decades on average and we only have years, well then that’s a 10-20% chance or so) and that chance can be made to go noticeably higher.
My guess is that a “clean” algorithm is still going to require multiple conceptual insights in order to create it. And typically, those insights are going to be found before we’ve had time to strip away the extraneous ideas in order to make it clean, which requires additional insights. Combine this with the fact that at least some of these insights are likely to be public knowledge and relevant to AGI, and I think Eliezer has the right idea here.
I disagree with Eliezer here:
Here’s a model of how this works: Simple, powerful insights like “Use ReLU instead of sigmoid” take years/decades on average to achieve because there is a small probability of them being achieved at any given time. Some of them take minutes to achieve, some of them take decades, some take years, and it all depends on luck basically. The longer we wait, the more likely it is that someone stumbles across the idea, but there’s nothing preventing someone from stumbling across it on day one.
On this model—which seems plausible to me—Yudkowsky is wrong. There is a non-negligible chance of us having the relevant insights prior to AGI (if you think it will take decades on average and we only have years, well then that’s a 10-20% chance or so) and that chance can be made to go noticeably higher.
My guess is that a “clean” algorithm is still going to require multiple conceptual insights in order to create it. And typically, those insights are going to be found before we’ve had time to strip away the extraneous ideas in order to make it clean, which requires additional insights. Combine this with the fact that at least some of these insights are likely to be public knowledge and relevant to AGI, and I think Eliezer has the right idea here.
OK, fair enough.
Fwiw, I also agree with Adele and Eliezer here and just didn’t see Eliezer’s comment when I was giving my comments.