This seems pretty similar to this, using the WBE as the natural distribution. I think it has the same problems, namely butterfly effects and the fact that it’s difficult to safely make multiple decisions.
Butterfly effects are probably solvable using false thermodynamic miracles, but there’s still a big problem when we want to make multiple decisions. If we make 1000 decisions in a sequence, then we don’t have a good guarantee that each individual decision isn’t contributing, in a small part, to creating UFAI (say, by depositing a few bits on a computer somewhere). Seeing the results of 1000 decisions by the AI would give us very strong evidence that it is the AI, not the human, in the box. It doesn’t seem like creating multiple get-rich-quick schemes would lead to UFAI, but I don’t have a good mathematical argument for this yet.
As a side note, I don’t think we really need WBE for this; we just need a probabilistic model for humans that doesn’t assign extremely low probability to outputting useful things like get-rich-quick plans. Perhaps even n-gram language models would be good enough.
If we make 1000 decisions in a sequence, then we don’t have a good guarantee that each individual decision isn’t contributing, in a small part, to creating UFAI (say, by depositing a few bits on a computer somewhere).
Why is this? Seeing an UFAI built this way is very strong evidence that it was an AI in the box! And this is what the AI is wanting to avoid. And it’s only got “odd” probabilities with respect to its own creation; it can perfectly know about all the other AI reaching their decisions.
What I mean is that the probability that each individual action was done by a human is not too low, but the joint probability for all the actions is very low.
For concreteness: say that every minute, the AI or human either decides to make a stock trade or not. There’s another computer somewhere that detects whether the AI/human made a trade and appends a bit (0 or 1) to a file accordingly, and runs it (say, as a Python program) after collecting a million bits.
Now suppose the AI made a stock trade only on minutes corresponding to 1 bits in a UFAI source code. Of course, the probability that a human would make all of these decisions is very small. However, the probability that a human would make one of these decisions (to trade a stock or not in particular minute) is probably close to 50%. So if we’re extremely paranoid, we might only be able to use this scheme to get rich once (instead of getting rich multiple times).
This seems pretty similar to this, using the WBE as the natural distribution. I think it has the same problems, namely butterfly effects and the fact that it’s difficult to safely make multiple decisions.
Butterfly effects are probably solvable using false thermodynamic miracles, but there’s still a big problem when we want to make multiple decisions. If we make 1000 decisions in a sequence, then we don’t have a good guarantee that each individual decision isn’t contributing, in a small part, to creating UFAI (say, by depositing a few bits on a computer somewhere). Seeing the results of 1000 decisions by the AI would give us very strong evidence that it is the AI, not the human, in the box. It doesn’t seem like creating multiple get-rich-quick schemes would lead to UFAI, but I don’t have a good mathematical argument for this yet.
As a side note, I don’t think we really need WBE for this; we just need a probabilistic model for humans that doesn’t assign extremely low probability to outputting useful things like get-rich-quick plans. Perhaps even n-gram language models would be good enough.
Why is this? Seeing an UFAI built this way is very strong evidence that it was an AI in the box! And this is what the AI is wanting to avoid. And it’s only got “odd” probabilities with respect to its own creation; it can perfectly know about all the other AI reaching their decisions.
What I mean is that the probability that each individual action was done by a human is not too low, but the joint probability for all the actions is very low.
For concreteness: say that every minute, the AI or human either decides to make a stock trade or not. There’s another computer somewhere that detects whether the AI/human made a trade and appends a bit (0 or 1) to a file accordingly, and runs it (say, as a Python program) after collecting a million bits.
Now suppose the AI made a stock trade only on minutes corresponding to 1 bits in a UFAI source code. Of course, the probability that a human would make all of these decisions is very small. However, the probability that a human would make one of these decisions (to trade a stock or not in particular minute) is probably close to 50%. So if we’re extremely paranoid, we might only be able to use this scheme to get rich once (instead of getting rich multiple times).
I see. There are many ways to solve this issue, but they’re kludges; I’ll try and think if there’s a principled way of doing it.