Luke_A_Somers comments on The Ultimate Testing Grounds

Luke_A_Somers 29 Oct 2015 11:57 UTC
0 points
On ‘certain to fail’… what if it would have pursued plan X that requires only abilities it has, but only if it had ability Y that it doesn’t have and you made it think it has, that comes up in a contingency that turns out not to arise?

Like for a human, “I’ll ask so-and-so out, and if e says no, I’ll leave myself a note and use my forgetfulness potion on both of us so things don’t get awkward.”

Only for a world-spanning AI, the parts of the contingency table that are realizable could involve wiping out humanity.

So we’re going to need to test at the intentions level, or sandbox.
- Dagon 29 Oct 2015 22:31 UTC
  2 points
  Parent
  This is a good point. Theory of second-best implies that if you take an optimal recommendation and take away any of the conditions, there is no guarantee that the remaining components are optional without the missing one.
  - Stuart_Armstrong 30 Oct 2015 8:37 UTC
    0 points
    Parent
    for links, you need to put the “[” brakets for the text, “(” for the links ;-)
    - Dagon 30 Oct 2015 16:08 UTC
      0 points
      Parent
      Thanks, fixed.
- Stuart_Armstrong 29 Oct 2015 16:39 UTC
  0 points
  Parent
  A better theory of counterfactuals—that can deal with events of zero probability—could help here.