sandfort comments on Contest: $1,000 for good questions to ask to an Oracle AI

sandfort 31 Aug 2019 12:55 UTC
1 point
Submission (LB). The post’s team-choosing example suggests a method for turning any low-bandwidth oracle $O$ into a counterfactual oracle $O^{'}$ : have $O^{'}$ output $o$ from the same set of possible outputs $L$ ; in case of erasure calculate $R (l)$ for a randomly chosen $l \in L$ and set $R^{'} (o) = R (l)$ if $o = l$ and to $- \infty$ otherwise. Although the counterfactual low-bandwidth oracle is not any safer $^{1}$ , it has the advantage of almost never requiring us to evaluate its score. Thus, by running multiple oracles in sequence (stopping the process after the first erasure event) we can (with high probability) receive the full series of answers as if from a high-bandwidth oracle.

For example, we can ask each oracle in turn for advice on how to make a more effective processor. If erasure occurs, we attempt to make a processor with the advice obtained up that point and use some measure of its performance as the score. If there is no erasure event, the final concatenation of answers forms a much safer guide to processor building than an equally large answer from a single oracle.

1. It seems that in general, the less certain any counterfactual oracle is about its prediction, the more self-confirming it is. This is because the possible counterfactual worlds in which we have or acquire self-confirming beliefs regarding the prediction will have a high expected score. Hence:
Submission (CF). Given a high-bandwidth counterfactual oracle, use a second counterfactual oracle with a shared erasure event to predict its score. If the predicted score’s distance from its upper bound is greater than some chosen limit, discard the high-bandwidth prediction.
- sandfort 1 Sep 2019 10:43 UTC
  1 point
  Parent
  Correction:
  It seems that in general, the less certain any counterfactual oracle is about its prediction, the more self-confirming it is. This is because the possible counterfactual worlds in which we have or acquire self-confirming beliefs regarding the prediction will have a high expected score
  This is actually only true in certain cases, since in general many other counterfactual worlds could also have high expected scores. Specifically, it is true to the extent that the oracle is uncertain mostly about aspects of the world that would be affected by the prediction, and to the extent that self-confirming predictions lead to higher scores than any alternative.