There is no particular reason for the first AGI to believe that the more intelligent AGI (B) will judge the first AGI more favorably because of the first AGI’s treatment of less intelligent life forms. (And, even if it would, by that logic, since humans haven’t been very nice to the the other, less intelligent life forms of Earth....)
Certainly, B might read this as a signal that the first AGI is less of a threat. Alternatively, B might read this as the first AGI being easy to destroy. B may have a moral code that looks at this as positive or negative. This just doesn’t seem to say anything conclusive or persuasive.
I like to think of it not like trying to show that agent B is not a threat to C. The way it’s set up we can probably assume B has no chance against C. C also may need to worry about agent D, who is concerned about hypothetical agent E, etc. I think that at some level, the decision an agent X makes is the decision all remaining agents in the hierarchy will make.
That said I sort of agree that’s the real fear about this method. It’s kind of like using super-rationality or something else to solve the prisoner’s dilemma. Are you willing to bet your life the other player would still not choose Defect, despite what the new theory says? That said I feel like there’s something there, whether this would work, and if not, would need some kind of clarification from decision theory.
There is no particular reason for the first AGI to believe that the more intelligent AGI (B) will judge the first AGI more favorably because of the first AGI’s treatment of less intelligent life forms. (And, even if it would, by that logic, since humans haven’t been very nice to the the other, less intelligent life forms of Earth....)
Certainly, B might read this as a signal that the first AGI is less of a threat. Alternatively, B might read this as the first AGI being easy to destroy. B may have a moral code that looks at this as positive or negative. This just doesn’t seem to say anything conclusive or persuasive.
I like to think of it not like trying to show that agent B is not a threat to C. The way it’s set up we can probably assume B has no chance against C. C also may need to worry about agent D, who is concerned about hypothetical agent E, etc. I think that at some level, the decision an agent X makes is the decision all remaining agents in the hierarchy will make.
That said I sort of agree that’s the real fear about this method. It’s kind of like using super-rationality or something else to solve the prisoner’s dilemma. Are you willing to bet your life the other player would still not choose Defect, despite what the new theory says? That said I feel like there’s something there, whether this would work, and if not, would need some kind of clarification from decision theory.