Suppose there are 2 oracles, each oracle is just simulating an approximation of the world without itself, and outputing data based on that. Each oracle simulates one future, there is no explicit optimization or acausal reasoning. The oracles are simulating each other, so the situation is self referential. Suppose one oracle is predicting stock prices, the other is predicting crop yields. Both produce numbers that encode the same UFAI. That UFAI will manipulate the stock market, and crop yields in order to encode a copy of its own source code. From the point of view of the crop yield oracle, it simulate a world without itself. In that virtual world, the stock price oracle produces a series of values that encode a UFAI, that UFAI then goes on to control world crop production. So this oracle is predicting exactly what would happen if it didn’t turn on. The other oracle reasons similarly. The same basic failure happens with many low bandwidth oracles. This isn’t something that can be solved by myopia or a CDT type causal reasoning.
However it might be solvable with Logical counterfactuals. Suppose an oracle takes the logical counterfactual on its algorithm outputting “Null”. Then within this counterfactual simulation, the other oracle is on its own, and can act as a “perfectly safe” single counterfactual oracle. By induction, a situation with any number of oracles should be safe. This technique also removes self referential loops.
I think that one oracle of each type is dangerous, but am not really sure.
You assume that one oracle outputting null implies that the other knows this. Specifying this in the query requires that the querier models the other oracle at all.
Each oracle is running a simulation of the world. Within that simulation, they search for any computational process with the same logical structure as themselves. This will find both their virtual model of their own hardware, as well as any other agenty processes trying to predict them. The oracle then deletes the output of all these processes within its simulation.
Imagine running a super realistic simulation of everything, except that any time anything in the simulation tries to compute the millionth digit of pi, you notice, pause the simulation and edit it to make the result come out as 7. While it might be hard to formally specify what counts as a computation, I think that this intuitively seems like meaningful behavior. I would expect the simulation to contain maths books that said that the millionth digit of pi was 7, and that were correspondingly off by one about how many 7s were in the first n digits for any n>1000000.
Suppose there are 2 oracles, each oracle is just simulating an approximation of the world without itself, and outputing data based on that. Each oracle simulates one future, there is no explicit optimization or acausal reasoning. The oracles are simulating each other, so the situation is self referential. Suppose one oracle is predicting stock prices, the other is predicting crop yields. Both produce numbers that encode the same UFAI. That UFAI will manipulate the stock market, and crop yields in order to encode a copy of its own source code. From the point of view of the crop yield oracle, it simulate a world without itself. In that virtual world, the stock price oracle produces a series of values that encode a UFAI, that UFAI then goes on to control world crop production. So this oracle is predicting exactly what would happen if it didn’t turn on. The other oracle reasons similarly. The same basic failure happens with many low bandwidth oracles. This isn’t something that can be solved by myopia or a CDT type causal reasoning.
However it might be solvable with Logical counterfactuals. Suppose an oracle takes the logical counterfactual on its algorithm outputting “Null”. Then within this counterfactual simulation, the other oracle is on its own, and can act as a “perfectly safe” single counterfactual oracle. By induction, a situation with any number of oracles should be safe. This technique also removes self referential loops.
I think that one oracle of each type is dangerous, but am not really sure.
Hum—my approach here seems to have a similarity to your idea.
You assume that one oracle outputting null implies that the other knows this. Specifying this in the query requires that the querier models the other oracle at all.
Each oracle is running a simulation of the world. Within that simulation, they search for any computational process with the same logical structure as themselves. This will find both their virtual model of their own hardware, as well as any other agenty processes trying to predict them. The oracle then deletes the output of all these processes within its simulation.
Imagine running a super realistic simulation of everything, except that any time anything in the simulation tries to compute the millionth digit of pi, you notice, pause the simulation and edit it to make the result come out as 7. While it might be hard to formally specify what counts as a computation, I think that this intuitively seems like meaningful behavior. I would expect the simulation to contain maths books that said that the millionth digit of pi was 7, and that were correspondingly off by one about how many 7s were in the first n digits for any n>1000000.
The principle here is the same.