So- does the whole problem go away if instead of trying to deduce what fairbot is going to do with masquerade, we assume that fairbot is going to asses it as if masquerade = the current mask? By ignoring the existence of masquerade in our deduction, we both solve the Gödel inconsistency, while simultaneously ensuring that another AI can easily determine that we will be executing exactly the mask we choose.
Masquerade deduces the outcomes of each of its masks, ignoring its own existence, and chooses the best outcome. Fairbot follows the exact same process, determines what Mask Masquerade is going to use, and then uses that outcome to make its own decision, as if Masquerade were whatever mask it ends up as.I assume Masquerade would check that it is not running against itself, and automatically co-operate if it is, without running the deduction, which would be the other case for avoiding the loop.
The problem with that is something I outlined in the previous post: this agent without a sanity check is exploitable. Let’s call this agent TrustingBot (we’ll see why in a minute), and have them play against a true TDT agent.
Now, TDT will cooperate against FairBot, but not any of the other masks. So TrustingBot goes ahead and cooperates with TDT. But TDT notices that it can defect against TrustingBot without penalty, since TrustingBot only cares what TDT does against the masks; thus TDT defects and steals TrustingBot’s lunch money.
So- does the whole problem go away if instead of trying to deduce what fairbot is going to do with masquerade, we assume that fairbot is going to asses it as if masquerade = the current mask? By ignoring the existence of masquerade in our deduction, we both solve the Gödel inconsistency, while simultaneously ensuring that another AI can easily determine that we will be executing exactly the mask we choose.
Masquerade deduces the outcomes of each of its masks, ignoring its own existence, and chooses the best outcome. Fairbot follows the exact same process, determines what Mask Masquerade is going to use, and then uses that outcome to make its own decision, as if Masquerade were whatever mask it ends up as.I assume Masquerade would check that it is not running against itself, and automatically co-operate if it is, without running the deduction, which would be the other case for avoiding the loop.
The problem with that is something I outlined in the previous post: this agent without a sanity check is exploitable. Let’s call this agent TrustingBot (we’ll see why in a minute), and have them play against a true TDT agent.
Now, TDT will cooperate against FairBot, but not any of the other masks. So TrustingBot goes ahead and cooperates with TDT. But TDT notices that it can defect against TrustingBot without penalty, since TrustingBot only cares what TDT does against the masks; thus TDT defects and steals TrustingBot’s lunch money.
See how tricky this gets?