As phrased in the paper I’m pretty pessimistic, mostly because the paper presents a story with a discontinuity where you throw a huge amount of computation and then at some point AGI emerges abruptly.
I think it’s more likely that there won’t be discontinuities—the giant blob of computation keeps spitting out better and better learning algorithms, and we develop better ways of adapting them to tasks in the real world.
At some point one of these algorithms tries and fails to deceive us, we notice the problem and either fix it or stop using the AI-GA approach / limit ourselves to not-too-capable AI systems.
It seems plausible that you could get something like the Interim Quality-of-Life Improver out of such an approach. You’d have to deal with the problem that by default these AI systems are going to have weird alien drives that would likely make them misaligned with us, but you probably do get examples of systems that would deceive us that you can study and fix.
As phrased in the paper I’m pretty pessimistic, mostly because the paper presents a story with a discontinuity where you throw a huge amount of computation and then at some point AGI emerges abruptly.
I think it’s more likely that there won’t be discontinuities—the giant blob of computation keeps spitting out better and better learning algorithms, and we develop better ways of adapting them to tasks in the real world.
At some point one of these algorithms tries and fails to deceive us, we notice the problem and either fix it or stop using the AI-GA approach / limit ourselves to not-too-capable AI systems.
It seems plausible that you could get something like the Interim Quality-of-Life Improver out of such an approach. You’d have to deal with the problem that by default these AI systems are going to have weird alien drives that would likely make them misaligned with us, but you probably do get examples of systems that would deceive us that you can study and fix.