There’s a similar issue with less extreme requirements:
(Note that what Stuart Armstrong calls “erasure” just means that the current episode has been selected as a training episode.)
Imagine there’s a circumstance in which the variable you want to predict can be affected by predictions. Fortunately, you were smart enough to use a counterfactual oracle. Unfortunately, you weren’t the only person who had this idea, and absent coordination to use the same RNG (in the same way), rather than the oracles learning from episodes they don’t influence and not learning to make “manipulative predictions”, instead they learned from each other (because even when their output is erased, the other oracles’ outputs aren’t) and eventually make manipulative predictions.
I don’t have much in the way of a model of “manipulative predictions”—they’ve been mentioned before as a motivation for counterfactual oracles.
I think the original example was, there’s this one oracle that everyone has access to and believes, and it says “company X’s stock is gonna go way done by the end of today” and because everyone believes it, it happens.
In a similar fashion, I can imagine multiple people/groups trying to independently create (their own) “oracles” for predicting the (stock) market.
There’s a similar issue with less extreme requirements:
Imagine there’s a circumstance in which the variable you want to predict can be affected by predictions. Fortunately, you were smart enough to use a counterfactual oracle. Unfortunately, you weren’t the only person who had this idea, and absent coordination to use the same RNG (in the same way), rather than the oracles learning from episodes they don’t influence and not learning to make “manipulative predictions”, instead they learned from each other (because even when their output is erased, the other oracles’ outputs aren’t) and eventually make manipulative predictions.
I vaguely agree with this concern but would like a clearer understanding of it. Can you think of a specific example of how this problem can happen?
I don’t have much in the way of a model of “manipulative predictions”—they’ve been mentioned before as a motivation for counterfactual oracles.
I think the original example was, there’s this one oracle that everyone has access to and believes, and it says “company X’s stock is gonna go way done by the end of today” and because everyone believes it, it happens.
In a similar fashion, I can imagine multiple people/groups trying to independently create (their own) “oracles” for predicting the (stock) market.