Let o be the method by which an oracle AI outputs its predictions and s any answer to a question q. Then we’d want it to compute something like argmax sP(s|q) so that |P(s|do(o(s)))−P(s|do(o()))|<ε, right?
If we have a working causal approach, should prevent self-fulfilling predictions (obviously not solving any embedded agency etc.)
If the possible answers are not very constrained, you’ll get a maximally uninformative answer. If they are constrained to a few options, and some of the options are excluded by the no-interference rule, you’ll get an arbitrary answer that happens to not be excluded. It’s probably more useful to heavily constrain answers and only say anything if no answers were excluded, or add some “anti-regularization” term that rewards answers that are more specific.
Let o be the method by which an oracle AI outputs its predictions and s any answer to a question q. Then we’d want it to compute something like argmax sP(s|q) so that |P(s|do(o(s)))−P(s|do(o()))|<ε, right?
If we have a working causal approach, should prevent self-fulfilling predictions (obviously not solving any embedded agency etc.)
If the possible answers are not very constrained, you’ll get a maximally uninformative answer. If they are constrained to a few options, and some of the options are excluded by the no-interference rule, you’ll get an arbitrary answer that happens to not be excluded. It’s probably more useful to heavily constrain answers and only say anything if no answers were excluded, or add some “anti-regularization” term that rewards answers that are more specific.