Man in the middle has 3 parties: Bob wants to talk to Alice, but we have Eve who wants to eavesdrop.
Here we have just 2 parties: Harry the human wants to talk to Alexa the AI, but is worried that Alexa is a liar.
Man in the middle has 3 parties: Bob wants to talk to Alice, but we have Eve who wants to eavesdrop.
Here we have just 2 parties: Harry the human wants to talk to Alexa the AI, but is worried that Alexa is a liar.
Clarification request. In the writeup, you discuss the AI Bayes net and the human Bayes net as if there’s some kind of symmetry between them, but it seems to me that there’s at least one big difference.
In the case of the AI, the Bayes net is explicit, in the sense that we could print it out on a sheet of paper and try to study it once training is done, and the main reason we don’t do that is because it’s likely to be too big to make much sense of.
In the case of the human, we have no idea what the Bayes net looks like, because humans don’t have that kind of introspection ability. In fact, there’s not much difference between saying “the human uses a Bayes net” and “the human uses some arbitrary function F, and we worry the AI will figure out F and then use it to lie to us”.
Or am I actually wrong and it’s okay for a “builder” solution to assume we have access to the human Bayes net?
I want to steal the diamond. I don’t care about the chip. I will detach the chip and leave it inside the vault and then I will run away with the diamond.
Or perhaps you say that you attached the chip to the diamond very well, so I can’t just detach it without damaging it. That’s annoying but I came prepared! I have a diamond cutter! I’ll just slice off the part of the diamond that the chip is attached to and then I will steal the rest of the diamond. Good enough for me :)