I have a slightly different take, which is that we can’t commit to doing this scheme even if we want to, because I don’t see what we can do today that would warrant the term “commitment”, i.e., would be binding on our post-singularity selves.
In either case (we can’t or don’t commit), the argument in the OP loses a lot of its force, because we don’t know whether post-singularity humans will decide to do this kind scheme or not.
Young unaligned AI will also not know if post-singularity humans will follow the commitment, so it will estimate its chances as 0.5, and in this case, the young AI will still want to follow the deal.
I have a slightly different take, which is that we can’t commit to doing this scheme even if we want to, because I don’t see what we can do today that would warrant the term “commitment”, i.e., would be binding on our post-singularity selves.
In either case (we can’t or don’t commit), the argument in the OP loses a lot of its force, because we don’t know whether post-singularity humans will decide to do this kind scheme or not.
Young unaligned AI will also not know if post-singularity humans will follow the commitment, so it will estimate its chances as 0.5, and in this case, the young AI will still want to follow the deal.