Not sure what allowing a small chance of false negatives does: you presumably could just repeat all your questions?
More substantially, I don’t know how easy ‘deception’ would be to define—any presentation of information would be selective. Presumably you’d have to use some sort of definition around the AI knowing that the person it’s answering would see other information as vital?
Not sure what allowing a small chance of false negatives does: you presumably could just repeat all your questions?
In this case the result would or could be the same, so long as the AI didn’t sufficiently update its internal state inbetween. but the detail isn’t important; please ignore it. I include it because it makes the device tractable. To achieve perfect detection would require a more powerful computer than the AI being analyzed, which seems impractical. But achieving even infinitesimal error rates appears to be doable (I had a specific construction in mind when writing this post).
More substantially, I don’t know how easy ‘deception’ would be to define—any presentation of information would be selective.
Deception in this case means giving false or incomplete descriptions of its thought processes. It’s okay for the AI to think “how do I present this in a way that the human will accept it?” only so long as the AI tells the human it had that thought. E.g. you ask “why do you recommend this action?” and the answer you get is anything other than the actual, 100% complete justification of both the specified choice and its alternatives, and a calculation showing higher expected utility for the chosen action, as well as a whole slew of meta-information such a description of the search strategy and cutoff thresholds for giving up on generating altneratives, which cached computations were available for use, etc. If any of this is falsified, or a single detail ommitted, the red light goes off.
To me it sounds like the full information provided to avoid being incomplete would be so immense and complex that you’d need another AI just to interpret that! But I may be wrong.
The data is immense, but the deeception detector’s job involves only applying simple rules. It’s effectively a compressor that compresses the AI logs down to “utility functions + search parameters” which is small enough to be inspected directly.
Not sure what allowing a small chance of false negatives does: you presumably could just repeat all your questions?
More substantially, I don’t know how easy ‘deception’ would be to define—any presentation of information would be selective. Presumably you’d have to use some sort of definition around the AI knowing that the person it’s answering would see other information as vital?
In this case the result would or could be the same, so long as the AI didn’t sufficiently update its internal state inbetween. but the detail isn’t important; please ignore it. I include it because it makes the device tractable. To achieve perfect detection would require a more powerful computer than the AI being analyzed, which seems impractical. But achieving even infinitesimal error rates appears to be doable (I had a specific construction in mind when writing this post).
Deception in this case means giving false or incomplete descriptions of its thought processes. It’s okay for the AI to think “how do I present this in a way that the human will accept it?” only so long as the AI tells the human it had that thought. E.g. you ask “why do you recommend this action?” and the answer you get is anything other than the actual, 100% complete justification of both the specified choice and its alternatives, and a calculation showing higher expected utility for the chosen action, as well as a whole slew of meta-information such a description of the search strategy and cutoff thresholds for giving up on generating altneratives, which cached computations were available for use, etc. If any of this is falsified, or a single detail ommitted, the red light goes off.
To me it sounds like the full information provided to avoid being incomplete would be so immense and complex that you’d need another AI just to interpret that! But I may be wrong.
The data is immense, but the deeception detector’s job involves only applying simple rules. It’s effectively a compressor that compresses the AI logs down to “utility functions + search parameters” which is small enough to be inspected directly.