It is not clear that there is any general way to design a Predictor that will not exhibit goal-seeking behavior, short of dramatically limiting the power of the Predictor.
Not sure if this is a new idea or how safe it is, but we could design a Predictor that incorporates a quantum random number generator, such that with some small probability it will output “no predictions today, run me again tomorrow”. Then have the Predictor make predictions that are conditional on it giving the output “no predictions today, run me again tomorrow”.
It wouldn’t be a new idea, if only I was a little smarter.
Last year when describing this Predictor problem I phrased all my danger examples in the form “a self-incorporating predicition of X might be vastly worse than a prediction of X in a counterfactual Predictor-free world”, yet it never occurred to me that “just ask for a prediction of X in a Predictor-free world” was a possible solution. I think it would have to be a “Predictor-free world” not just a “Predictor-free tomorrow”, though; there’s nothing that says those dangerous feedback loops would all resolve themselves in 24 hours.
I think this solution still starts failing as society starts to rely on Predictors, though. Suppose Predictor-guided decisions are much better than others. Then major governments and businesses start to rely on Predictor-guided decisions… and then the predictions start all coming up “in a Predictor-free world, the economy crashes”.
It wouldn’t be a new idea, if only I was a little smarter.
It might be interesting to write down what it took to come up with such an obvious-in-retrospect idea. What happened was I started thinking “Wait, decision procedures need to use predictors as part of making decisions. Why isn’t this a problem there? Or is it?” I then realized that a predictor used by a decision procedure does not leak its output into the world, except through the decision itself, and it only makes predictions that are conditional on a specific decision, which breaks the feedback loop. That gave me the idea described in the grandparent comment.
I think it would have to be a “Predictor-free world” not just a “Predictor-free tomorrow”, though; there’s nothing that says those dangerous feedback loops would all resolve themselves in 24 hours.
Yeah, I was thinking that the user would limit their questions to things that happen before the next Predictor run, like tomorrow’s stock prices. But I’m also not sure what kind of dangerous feedback loops might occur if they don’t. Can you think of an example?
Thinking this over a bit more, it seems that the situation of Predictors being in feedback loops with each other is already the case today. Each of us has a Predictor in our own brain that we make use of to make decisions, right? As I mentioned above, we can break a Predictor’s self-feedback loop by conditionalizing its predictions on our decisions, but each Predictor still needs to predict other Predictors which are in turn trying to predict it.
Is there reason to think that with more powerful Artificial Predictors, the situation would be worse than today?
We do indeed have billions of seriously flawed predictors walking around today, and feedback loops between them are not a negligible problem. Going back to that example, we nearly managed to start WW3 all by ourselves without waiting for artificially intelligent assistance. And it’s easy to come up with a half a dozen contemporary examples of entire populations thinking “what we’re doing to them may be bad, but not as bad as what they’d do to us if we let up”.
It’s entirely possible that the answer to the Fermi Paradox is that there’s a devastatingly bad massively multiplayer Mutually Assured Distruction situation waiting along the path of technological development, one in which even a dumb natural predictor can reason “I predict that a few of them are thinking about defecting, in which case I should think about defecting first, but once they realize that they’ll really want to defect, and oh damn I’d better hit that red button right now!” And the next thing you know all the slow biowarfare researchers are killed off by a tailored virus that left the fastest researchers alone (to pick an exaggerated trope out of a hat). Artificial Predictors would make such things worse by speeding up the inevitable.
Even if a situation like that isn’t inevitable with only natural intelligences, Oracle AIs might make one inevitable by reducing the barrier to entry for predictions. When it takes more than a decade of dedicated work to become a natural expert on something, people don’t want to put in that investment becoming an expert on evil. If becoming an expert on evil merely requires building an automated Question-Answerer for the purpose of asking it good questions, but then succumbing to temptation and asking it an evil question too, proliferation of any technology with evil applications might become harder to stop. Research and development that is presently guided by market forces, government decisions, and moral considerations would instead proceed in the order of “which new technologies can the computer figure out first”.
And a Predictor asked to predict “What will we do based on your prediction” is effectively a lobotomized Question-Answerer, for which we can’t phrase questions directly, leaving us stuck with whatever implicit questions (almost certainly including “which new technologies can computers figure out first”) are inherent in that feedback loop.
(On the theme of quantum random number generators, if for some reason ontotechnology is possible then running an AI that explores the space of possible self-modifications based on quantum random seeds is significantly more dangerous than running it based on pseudorandom seeds, as you only need to get ontotechnology in a vanishingly small fraction of worlds in order to change the entire ensemble. I think this is a reductio ad absurdum of the idea of ontotechnology, as the universe should of course already be at equilibrium with respect to such total rewrites, but there are really weird, moderately interesting, and almost assuredly misguided ideas in roughly this area of inquiry.)
Then have the Predictor make predictions that are conditional on it giving the output “no predictions today, run me again tomorrow”.
Predictor may (per Solomonoff induction) simulate the real world, including itself, but it does not necessarily mean that it will recognize its own simulation as itself. It will not even necessarily recognize that it is simulating, it may be something like “I am calculating this equation, I have no idea what it means, but its results make my masters happy, so I will continue calculating it”. So it will not realise that your command applies to this specific situation.
This is an antropomorphisation, but technically speaking, to implement a command like “when you simulate yourself, assume the output is X” you need to specify a “simulation” predicate and “itself” predicate, otherwise the Predictor will not use the rule. What happens if the Predictor’s simulation is imprecise, but still good enough to provide good answers about the real world? Should it recognize the imprecise simulation of itself as “itself” too? What if this imprecise simulation does not contain the quantum random number generator; how will the rule apply here?
Also in some situations the answer to “what happens if I don’t make a prediction” is useless… the more useful the Predictor proves, the more often this will happen, because people will use the predictions for their important actions, so the answer to “what happens if I don’t make a prediction” will be like “humans will wait another day” (which does not say what would happen if humans actually did something instead of waiting). Also, if the Predictor refuses to provide answer too often, for example: 1000 times in a row—the simulations of “what happens if I don’t make a prediction” may have this situation as an attractor—humans will assume it is somehow broken and perhaps build another AI; now the Predictor may be actually predicting what would that other AI do.
If Predictors are to be modeled as accuracy-maximizing agents, they could acausally cooperate with each other, so that one Predictor optimizes its accuracy about a world where it’s absent through controlling the predictions of another Predictor that is present in that world.
I don’t think that would make much of a difference because then it still can answer conditional on what its answer would be when it is asked the same question again the next day, which would presumably be the same answer unless it got new information during that wait period.
A perhaps safer but less efficient alternative is to design a Predictor (without keeping any backup copies of its source code), such that with some small probability, it will delete its source code and shut down instead of answering your question. Then if you still want to know the answer, you have to design a new Predictor with the same specifications but a different algorithm. The Predictor’s answer (if it gives one) refers to what would happen conditional on it shuts down.
Decision-theoretic variant: build an agent that incorporates a quantum random number generator, such that with some small probability it will output a random action. Then have the agent calculate how much expected utility each action would imply if it were chosen because of the random number generator, and output the best one.
Unless I’m missing something, this agent doesn’t play very well in Newcomblike problems, but seems to be a good enough formalization of CDT. I cannot define it as formally as I’d like, though, because how do you write a computer program that refers to a specific quantum event in the outside world?
I’m not sure there is a way to make sense of such utility-definitions. What fixed question that relates to utility value is being answered by observing the result of a random number generator? Original state of the world is not clarified (both states of the random result were expected, not correlated with anything interesting), so state of knowledge about utility defined in terms of the original state of the world won’t be influenced by these observations, except accidentally.
Not sure if this is a new idea or how safe it is, but we could design a Predictor that incorporates a quantum random number generator, such that with some small probability it will output “no predictions today, run me again tomorrow”. Then have the Predictor make predictions that are conditional on it giving the output “no predictions today, run me again tomorrow”.
It wouldn’t be a new idea, if only I was a little smarter.
Last year when describing this Predictor problem I phrased all my danger examples in the form “a self-incorporating predicition of X might be vastly worse than a prediction of X in a counterfactual Predictor-free world”, yet it never occurred to me that “just ask for a prediction of X in a Predictor-free world” was a possible solution. I think it would have to be a “Predictor-free world” not just a “Predictor-free tomorrow”, though; there’s nothing that says those dangerous feedback loops would all resolve themselves in 24 hours.
I think this solution still starts failing as society starts to rely on Predictors, though. Suppose Predictor-guided decisions are much better than others. Then major governments and businesses start to rely on Predictor-guided decisions… and then the predictions start all coming up “in a Predictor-free world, the economy crashes”.
It might be interesting to write down what it took to come up with such an obvious-in-retrospect idea. What happened was I started thinking “Wait, decision procedures need to use predictors as part of making decisions. Why isn’t this a problem there? Or is it?” I then realized that a predictor used by a decision procedure does not leak its output into the world, except through the decision itself, and it only makes predictions that are conditional on a specific decision, which breaks the feedback loop. That gave me the idea described in the grandparent comment.
Yeah, I was thinking that the user would limit their questions to things that happen before the next Predictor run, like tomorrow’s stock prices. But I’m also not sure what kind of dangerous feedback loops might occur if they don’t. Can you think of an example?
The WW3 example from my comment last year holds up in long time frames.
Any kind of technological arms race could be greatly accelerated if “What will we be manufacturing in five years” became a predictable question.
Thinking this over a bit more, it seems that the situation of Predictors being in feedback loops with each other is already the case today. Each of us has a Predictor in our own brain that we make use of to make decisions, right? As I mentioned above, we can break a Predictor’s self-feedback loop by conditionalizing its predictions on our decisions, but each Predictor still needs to predict other Predictors which are in turn trying to predict it.
Is there reason to think that with more powerful Artificial Predictors, the situation would be worse than today?
We do indeed have billions of seriously flawed predictors walking around today, and feedback loops between them are not a negligible problem. Going back to that example, we nearly managed to start WW3 all by ourselves without waiting for artificially intelligent assistance. And it’s easy to come up with a half a dozen contemporary examples of entire populations thinking “what we’re doing to them may be bad, but not as bad as what they’d do to us if we let up”.
It’s entirely possible that the answer to the Fermi Paradox is that there’s a devastatingly bad massively multiplayer Mutually Assured Distruction situation waiting along the path of technological development, one in which even a dumb natural predictor can reason “I predict that a few of them are thinking about defecting, in which case I should think about defecting first, but once they realize that they’ll really want to defect, and oh damn I’d better hit that red button right now!” And the next thing you know all the slow biowarfare researchers are killed off by a tailored virus that left the fastest researchers alone (to pick an exaggerated trope out of a hat). Artificial Predictors would make such things worse by speeding up the inevitable.
Even if a situation like that isn’t inevitable with only natural intelligences, Oracle AIs might make one inevitable by reducing the barrier to entry for predictions. When it takes more than a decade of dedicated work to become a natural expert on something, people don’t want to put in that investment becoming an expert on evil. If becoming an expert on evil merely requires building an automated Question-Answerer for the purpose of asking it good questions, but then succumbing to temptation and asking it an evil question too, proliferation of any technology with evil applications might become harder to stop. Research and development that is presently guided by market forces, government decisions, and moral considerations would instead proceed in the order of “which new technologies can the computer figure out first”.
And a Predictor asked to predict “What will we do based on your prediction” is effectively a lobotomized Question-Answerer, for which we can’t phrase questions directly, leaving us stuck with whatever implicit questions (almost certainly including “which new technologies can computers figure out first”) are inherent in that feedback loop.
(On the theme of quantum random number generators, if for some reason ontotechnology is possible then running an AI that explores the space of possible self-modifications based on quantum random seeds is significantly more dangerous than running it based on pseudorandom seeds, as you only need to get ontotechnology in a vanishingly small fraction of worlds in order to change the entire ensemble. I think this is a reductio ad absurdum of the idea of ontotechnology, as the universe should of course already be at equilibrium with respect to such total rewrites, but there are really weird, moderately interesting, and almost assuredly misguided ideas in roughly this area of inquiry.)
Predictor may (per Solomonoff induction) simulate the real world, including itself, but it does not necessarily mean that it will recognize its own simulation as itself. It will not even necessarily recognize that it is simulating, it may be something like “I am calculating this equation, I have no idea what it means, but its results make my masters happy, so I will continue calculating it”. So it will not realise that your command applies to this specific situation.
This is an antropomorphisation, but technically speaking, to implement a command like “when you simulate yourself, assume the output is X” you need to specify a “simulation” predicate and “itself” predicate, otherwise the Predictor will not use the rule. What happens if the Predictor’s simulation is imprecise, but still good enough to provide good answers about the real world? Should it recognize the imprecise simulation of itself as “itself” too? What if this imprecise simulation does not contain the quantum random number generator; how will the rule apply here?
Also in some situations the answer to “what happens if I don’t make a prediction” is useless… the more useful the Predictor proves, the more often this will happen, because people will use the predictions for their important actions, so the answer to “what happens if I don’t make a prediction” will be like “humans will wait another day” (which does not say what would happen if humans actually did something instead of waiting). Also, if the Predictor refuses to provide answer too often, for example: 1000 times in a row—the simulations of “what happens if I don’t make a prediction” may have this situation as an attractor—humans will assume it is somehow broken and perhaps build another AI; now the Predictor may be actually predicting what would that other AI do.
If Predictors are to be modeled as accuracy-maximizing agents, they could acausally cooperate with each other, so that one Predictor optimizes its accuracy about a world where it’s absent through controlling the predictions of another Predictor that is present in that world.
I don’t think that would make much of a difference because then it still can answer conditional on what its answer would be when it is asked the same question again the next day, which would presumably be the same answer unless it got new information during that wait period.
A perhaps safer but less efficient alternative is to design a Predictor (without keeping any backup copies of its source code), such that with some small probability, it will delete its source code and shut down instead of answering your question. Then if you still want to know the answer, you have to design a new Predictor with the same specifications but a different algorithm. The Predictor’s answer (if it gives one) refers to what would happen conditional on it shuts down.
Decision-theoretic variant: build an agent that incorporates a quantum random number generator, such that with some small probability it will output a random action. Then have the agent calculate how much expected utility each action would imply if it were chosen because of the random number generator, and output the best one.
Unless I’m missing something, this agent doesn’t play very well in Newcomblike problems, but seems to be a good enough formalization of CDT. I cannot define it as formally as I’d like, though, because how do you write a computer program that refers to a specific quantum event in the outside world?
I’m not sure there is a way to make sense of such utility-definitions. What fixed question that relates to utility value is being answered by observing the result of a random number generator? Original state of the world is not clarified (both states of the random result were expected, not correlated with anything interesting), so state of knowledge about utility defined in terms of the original state of the world won’t be influenced by these observations, except accidentally.
How does it help?