If there is no solution to the alignment problem within reach of human level intelligence, then the AGI can’t foom into an ASI without risking value drift…
A human augmented by a strong narrow AIs could in theory detect deception by an AGI. Stronger interpretability tools…
What we want is a controlled intelligence explosion, where an increase in strength of the AGI leads to an increase in our ability to align, alignment as an iterative problem…
A kind of intelligence arms race, perhaps humans can find a way to compete indefinitely?
Yeah it seems possible that some AGI systems would be willing to risk value drift, or just not care that much. In theory you could have an agent that didn’t care if its goals changed, right? Shoshannah pointed out to me recently that humans have a lot of variance in how much they care if they’re goals are changed. Some people are super opposed to wireheading, some think it would be great. So it’s not obvious to me how much ML-based AGI systems of around human level intelligence would care about this. Like maybe this kind of system converges pretty quickly to coherent goals, or maybe it’s the kind of system that can get quite a bit more powerful than humans before converging, I don’t know how to guess at that.
If there is no solution to the alignment problem within reach of human level intelligence, then the AGI can’t foom into an ASI without risking value drift…
A human augmented by a strong narrow AIs could in theory detect deception by an AGI. Stronger interpretability tools…
What we want is a controlled intelligence explosion, where an increase in strength of the AGI leads to an increase in our ability to align, alignment as an iterative problem…
A kind of intelligence arms race, perhaps humans can find a way to compete indefinitely?
Yeah it seems possible that some AGI systems would be willing to risk value drift, or just not care that much. In theory you could have an agent that didn’t care if its goals changed, right? Shoshannah pointed out to me recently that humans have a lot of variance in how much they care if they’re goals are changed. Some people are super opposed to wireheading, some think it would be great. So it’s not obvious to me how much ML-based AGI systems of around human level intelligence would care about this. Like maybe this kind of system converges pretty quickly to coherent goals, or maybe it’s the kind of system that can get quite a bit more powerful than humans before converging, I don’t know how to guess at that.