Well, both, but mostly the issue of it being somewhat evil. But it can probably be good even from strategic human focused view, by giving assurances to agents who otherwise would be adversarial to us. It’s not that clear to me it’s a really good strategy, because it kinda incentivizes threats, where you are trying to find more destructive ways to get more reward by surrendering. Also, it can just try it first if there are any ways with safe failures, and then opt out to cooperate? Sounds difficult to get right.
And to be clear it’s a hard problem anyway, like even without this explicit thinking this stuff is churning in the background, or will be. It’s a really general issue.
By ‘graceful’, do you mean morally graceful, technically graceful, or both / other?
Well, both, but mostly the issue of it being somewhat evil. But it can probably be good even from strategic human focused view, by giving assurances to agents who otherwise would be adversarial to us. It’s not that clear to me it’s a really good strategy, because it kinda incentivizes threats, where you are trying to find more destructive ways to get more reward by surrendering. Also, it can just try it first if there are any ways with safe failures, and then opt out to cooperate? Sounds difficult to get right.
And to be clear it’s a hard problem anyway, like even without this explicit thinking this stuff is churning in the background, or will be. It’s a really general issue.
Check this writeup, I mostly agree with everything there:
https://www.lesswrong.com/posts/cxuzALcmucCndYv4a/daniel-kokotajlo-s-shortform?commentId=y4mLnpAvbcBbW4psB