niplav comments on shortplav

niplav 10 Oct 2021 14:38 UTC
1 point
Let $o$ be the method by which an oracle AI outputs its predictions and $s$ any answer to a question $q$ . Then we’d want it to compute something like $argmax s P (s | q)$ so that $| P (s | do (o (s))) - P (s | do (o ())) | < ε$ , right?

If we have a working causal approach, should prevent self-fulfilling predictions (obviously not solving any embedded agency etc.)
- Vladimir_Nesov 10 Oct 2021 22:16 UTC
  3 points
  Parent
  If the possible answers are not very constrained, you’ll get a maximally uninformative answer. If they are constrained to a few options, and some of the options are excluded by the no-interference rule, you’ll get an arbitrary answer that happens to not be excluded. It’s probably more useful to heavily constrain answers and only say anything if no answers were excluded, or add some “anti-regularization” term that rewards answers that are more specific.