I think there’s a problem with the entire idea of terminal goals, and that AI alignment is difficult because of it.
“What terminal state does you want?” is off-putting because I specifically don’t want a terminal state. Any goal I come up with has to be unachievable, or at least cover my entire life, otherwise I would just be answering “What needs to happen before you’d be okay with dying?”
An AI does not have a goal, but an utility function. Goals have terminal states, once you achieve them you’re done, the program can shut down. An utility function goes on forever. But generally, wanting just one thing so badly that you’d sacrifice everything else for it.. Seems like a bad idea. Such a bad idea that no person has ever been able to define an utility function which wouldn’t destroy the universe when fed to a sufficiently strong AI.
I don’t wish to achieve a state, I want to remain in a state. There’s actually a large space of states that I would be happy with, so it’s a region that I try to stay within. The space of good states form a finite region, meaning that you’d have to stay within this region indefinitely, sustaining it. But something which optimizes seeks to head towards a “better state”, it does not want to stagnate, but this is precisely what makes it unsustainable, and something unsustainable is finite, and something finite must eventually end, and something which optimizes towards an end is just racing to die. A human would likely realize this if they had enough power, but because life offers enough resistance, none of us ever win all our battles. The problem with AGIs is that they don’t have this resistance.
The after-lives we have created so far are either sustainable or the wish to die. Escaping samsara means disappearing, heaven is eternal life (stagnation) and Valhalla is an infinite battlefield (a process which never ends). We wish for continuance. It’s the journey which has value, not the goal. But I don’t wish to journey faster.
I think there’s a problem with the entire idea of terminal goals, and that AI alignment is difficult because of it.
“What terminal state does you want?” is off-putting because I specifically don’t want a terminal state. Any goal I come up with has to be unachievable, or at least cover my entire life, otherwise I would just be answering “What needs to happen before you’d be okay with dying?”
An AI does not have a goal, but an utility function. Goals have terminal states, once you achieve them you’re done, the program can shut down. An utility function goes on forever. But generally, wanting just one thing so badly that you’d sacrifice everything else for it.. Seems like a bad idea. Such a bad idea that no person has ever been able to define an utility function which wouldn’t destroy the universe when fed to a sufficiently strong AI.
I don’t wish to achieve a state, I want to remain in a state. There’s actually a large space of states that I would be happy with, so it’s a region that I try to stay within. The space of good states form a finite region, meaning that you’d have to stay within this region indefinitely, sustaining it. But something which optimizes seeks to head towards a “better state”, it does not want to stagnate, but this is precisely what makes it unsustainable, and something unsustainable is finite, and something finite must eventually end, and something which optimizes towards an end is just racing to die. A human would likely realize this if they had enough power, but because life offers enough resistance, none of us ever win all our battles. The problem with AGIs is that they don’t have this resistance.
The after-lives we have created so far are either sustainable or the wish to die. Escaping samsara means disappearing, heaven is eternal life (stagnation) and Valhalla is an infinite battlefield (a process which never ends). We wish for continuance. It’s the journey which has value, not the goal. But I don’t wish to journey faster.