If we make 1000 decisions in a sequence, then we don’t have a good guarantee that each individual decision isn’t contributing, in a small part, to creating UFAI (say, by depositing a few bits on a computer somewhere).
Why is this? Seeing an UFAI built this way is very strong evidence that it was an AI in the box! And this is what the AI is wanting to avoid. And it’s only got “odd” probabilities with respect to its own creation; it can perfectly know about all the other AI reaching their decisions.
What I mean is that the probability that each individual action was done by a human is not too low, but the joint probability for all the actions is very low.
For concreteness: say that every minute, the AI or human either decides to make a stock trade or not. There’s another computer somewhere that detects whether the AI/human made a trade and appends a bit (0 or 1) to a file accordingly, and runs it (say, as a Python program) after collecting a million bits.
Now suppose the AI made a stock trade only on minutes corresponding to 1 bits in a UFAI source code. Of course, the probability that a human would make all of these decisions is very small. However, the probability that a human would make one of these decisions (to trade a stock or not in particular minute) is probably close to 50%. So if we’re extremely paranoid, we might only be able to use this scheme to get rich once (instead of getting rich multiple times).
Why is this? Seeing an UFAI built this way is very strong evidence that it was an AI in the box! And this is what the AI is wanting to avoid. And it’s only got “odd” probabilities with respect to its own creation; it can perfectly know about all the other AI reaching their decisions.
What I mean is that the probability that each individual action was done by a human is not too low, but the joint probability for all the actions is very low.
For concreteness: say that every minute, the AI or human either decides to make a stock trade or not. There’s another computer somewhere that detects whether the AI/human made a trade and appends a bit (0 or 1) to a file accordingly, and runs it (say, as a Python program) after collecting a million bits.
Now suppose the AI made a stock trade only on minutes corresponding to 1 bits in a UFAI source code. Of course, the probability that a human would make all of these decisions is very small. However, the probability that a human would make one of these decisions (to trade a stock or not in particular minute) is probably close to 50%. So if we’re extremely paranoid, we might only be able to use this scheme to get rich once (instead of getting rich multiple times).
I see. There are many ways to solve this issue, but they’re kludges; I’ll try and think if there’s a principled way of doing it.