Perhaps slightly off topic, but I’m skeptical of the idea that two AIs having access to each other’s source code is in general likely to be a particularly strong commitment mechanism. I find it much easier to imagine how this could be gamed than how it could be trustworthy.
Is it just intended as a rhetorical device to symbolize the idea of a very reliable pre-commitment signal (in which case perhaps there are better choices because it doesn’t succeed at that for me, and I imagine would raise doubts for most people with much programming experience) or is it supposed to be accepted as highly likely to be a very reliable commitment signal (in which case I’d like to see the reasoning expanded upon)?
Perhaps slightly off topic, but I’m skeptical of the idea that two AIs having access to each other’s source code is in general likely to be a particularly strong commitment mechanism. I find it much easier to imagine how this could be gamed than how it could be trustworthy.
Is it just intended as a rhetorical device to symbolize the idea of a very reliable pre-commitment signal (in which case perhaps there are better choices because it doesn’t succeed at that for me, and I imagine would raise doubts for most people with much programming experience) or is it supposed to be accepted as highly likely to be a very reliable commitment signal (in which case I’d like to see the reasoning expanded upon)?