As the paper acknowledges, this introduces several risks, and so it calls for deep engagement with AI safety researchers (but sadly it does not propose ideas on how to mitigate the risks).
As far as I can tell, IA-GA doesn’t fit into any of the current AI safety success stories, and it seems hard to imagine what kind of success story it might fit into. I’m curious if anyone is more optimistic about this.
As phrased in the paper I’m pretty pessimistic, mostly because the paper presents a story with a discontinuity where you throw a huge amount of computation and then at some point AGI emerges abruptly.
I think it’s more likely that there won’t be discontinuities—the giant blob of computation keeps spitting out better and better learning algorithms, and we develop better ways of adapting them to tasks in the real world.
At some point one of these algorithms tries and fails to deceive us, we notice the problem and either fix it or stop using the AI-GA approach / limit ourselves to not-too-capable AI systems.
It seems plausible that you could get something like the Interim Quality-of-Life Improver out of such an approach. You’d have to deal with the problem that by default these AI systems are going to have weird alien drives that would likely make them misaligned with us, but you probably do get examples of systems that would deceive us that you can study and fix.
As far as I can tell, IA-GA doesn’t fit into any of the current AI safety success stories, and it seems hard to imagine what kind of success story it might fit into. I’m curious if anyone is more optimistic about this.
As phrased in the paper I’m pretty pessimistic, mostly because the paper presents a story with a discontinuity where you throw a huge amount of computation and then at some point AGI emerges abruptly.
I think it’s more likely that there won’t be discontinuities—the giant blob of computation keeps spitting out better and better learning algorithms, and we develop better ways of adapting them to tasks in the real world.
At some point one of these algorithms tries and fails to deceive us, we notice the problem and either fix it or stop using the AI-GA approach / limit ourselves to not-too-capable AI systems.
It seems plausible that you could get something like the Interim Quality-of-Life Improver out of such an approach. You’d have to deal with the problem that by default these AI systems are going to have weird alien drives that would likely make them misaligned with us, but you probably do get examples of systems that would deceive us that you can study and fix.