Suppose Predict-O-Matic A has access to historical data which suggests Predict-O-Matic B tends to be extremely accurate, or otherwise has reason to believe Predict-O-Matic B is extremely accurate.
Suppose the way Predict-O-Matic A makes predictions is by some process analogous to writing a story about how things will go, evaluating the plausibility of the story, and doing simulated annealing or some other sort of stochastic hill-climbing on its story until the plausibility of its story is maximized.
Suppose that it’s overwhelmingly plausible that at some time in the near future, Important Person is going to walk up to Predict-O-Matic B and ask Predict-O-Matic B for a forecast and make an important decision based on what Predict-O-Matic B says.
Because of point 3, stories which don’t involve a forecast from Predict-O-Matic B will tend to get rejected during the hill-climbing process. And...
Because of point 1, stories which involve an inaccurate forecast from Predict-O-Matic B will tend to get rejected during the hill-climbing process. We will tend to hill-climb our way into having Predict-O-Matic B’s prediction change so it matches what actually happens in the rest of the story.
Because the person in point 3 is important and Predict-O-Matic B’s forecast influences their decision, a change to the part of the story regarding Predict-O-Matic B’s prediction could easily mean the rest is no longer plausible and will benefit from revision.
So now we’ve got a loop in the hill-climbing process where changes in Predict-O-Matic B’s forecast lead to changes in what happens after Predict-O-Matic B’s forecast, and changes in what happens after Predict-O-Matic B’s forecast lead to changes in Predict-O-Matic B’s forecast. It stops when we hit a fixed point.
Now that I’ve written this out, I’m realizing that I don’t think this would happen for sure. I’ve argued both that changing the forecast to match what happens will improve plausibility, and that changing what happens so it’s a plausible result of the forecast will improve plausibility. But if the only way to achieve one is by discarding the other, I guess both tweaks won’t cause improvements to plausibility in general. However, the point remains that a fixed point will be among the most plausible stories available, so any good optimization method will tend to converge on it. (Maybe just simulated annealing, but with a temperature parameter high enough that it finds easy to leap between these kind of semi-plausible stories until it gets a fixed point by chance. Or if we’re doing hill climbing based on local improvements in plausibility instead of considering plausibility when taken as a whole.)
I think the scenario is similar to your P(X) and P(Y) discussion in this post.
It just now occurred to me that you could get a similar effect given certain implementations of beam search. Suppose we’re doing beam search with a beam width of 1 million. For the sake of simplicity, suppose that when Important Person walks up to Predict-O-Matic B and asks their question in A’s sim, each of 1M beam states gets allocated to a different response that Predict-O-Matic B could give. Some of those states lead to “incoherent”, low-probability stories where Predict-O-Matic B’s forecast turns out to be false, and they get pruned. The only states left over are states where Predict-O-Matic B’s prophecy ends up being correct—cases where Predict-O-Matic B made a self-fulfilling prophecy.
Ah, OK, I buy that, thanks. What about the idea of building a system that doesn’t model itself or its past predictions, and asking it questions that don’t entail modeling any other superpredictors? (Like “what’s the likeliest way for a person to find the cure for Alzheimers, if we hypothetically lived in a world with no superpredictors or AGIs?”)
Thanks!
Here’s another attempt at explaining.
Suppose Predict-O-Matic A has access to historical data which suggests Predict-O-Matic B tends to be extremely accurate, or otherwise has reason to believe Predict-O-Matic B is extremely accurate.
Suppose the way Predict-O-Matic A makes predictions is by some process analogous to writing a story about how things will go, evaluating the plausibility of the story, and doing simulated annealing or some other sort of stochastic hill-climbing on its story until the plausibility of its story is maximized.
Suppose that it’s overwhelmingly plausible that at some time in the near future, Important Person is going to walk up to Predict-O-Matic B and ask Predict-O-Matic B for a forecast and make an important decision based on what Predict-O-Matic B says.
Because of point 3, stories which don’t involve a forecast from Predict-O-Matic B will tend to get rejected during the hill-climbing process. And...
Because of point 1, stories which involve an inaccurate forecast from Predict-O-Matic B will tend to get rejected during the hill-climbing process. We will tend to hill-climb our way into having Predict-O-Matic B’s prediction change so it matches what actually happens in the rest of the story.
Because the person in point 3 is important and Predict-O-Matic B’s forecast influences their decision, a change to the part of the story regarding Predict-O-Matic B’s prediction could easily mean the rest is no longer plausible and will benefit from revision.
So now we’ve got a loop in the hill-climbing process where changes in Predict-O-Matic B’s forecast lead to changes in what happens after Predict-O-Matic B’s forecast, and changes in what happens after Predict-O-Matic B’s forecast lead to changes in Predict-O-Matic B’s forecast. It stops when we hit a fixed point.
Now that I’ve written this out, I’m realizing that I don’t think this would happen for sure. I’ve argued both that changing the forecast to match what happens will improve plausibility, and that changing what happens so it’s a plausible result of the forecast will improve plausibility. But if the only way to achieve one is by discarding the other, I guess both tweaks won’t cause improvements to plausibility in general. However, the point remains that a fixed point will be among the most plausible stories available, so any good optimization method will tend to converge on it. (Maybe just simulated annealing, but with a temperature parameter high enough that it finds easy to leap between these kind of semi-plausible stories until it gets a fixed point by chance. Or if we’re doing hill climbing based on local improvements in plausibility instead of considering plausibility when taken as a whole.)
I think the scenario is similar to your P(X) and P(Y) discussion in this post.
It just now occurred to me that you could get a similar effect given certain implementations of beam search. Suppose we’re doing beam search with a beam width of 1 million. For the sake of simplicity, suppose that when Important Person walks up to Predict-O-Matic B and asks their question in A’s sim, each of 1M beam states gets allocated to a different response that Predict-O-Matic B could give. Some of those states lead to “incoherent”, low-probability stories where Predict-O-Matic B’s forecast turns out to be false, and they get pruned. The only states left over are states where Predict-O-Matic B’s prophecy ends up being correct—cases where Predict-O-Matic B made a self-fulfilling prophecy.
Ah, OK, I buy that, thanks. What about the idea of building a system that doesn’t model itself or its past predictions, and asking it questions that don’t entail modeling any other superpredictors? (Like “what’s the likeliest way for a person to find the cure for Alzheimers, if we hypothetically lived in a world with no superpredictors or AGIs?”)
Could work.