Liron comments on Counterfactual Mugging v. Subjective Probability

Liron 20 Jul 2009 23:37 UTC
1 point
A subproblem of Friendly AI, or at least a similar problem, is the challenge of proving that properties of an algorithm are stable under self-modification. If we don’t identify a provably optimal algorithm for maximizing expected utility in decision-dependent counterfactuals, it’s hard to predict how the AI will decide to modify its decision procedure, and it’s harder to prove invariants about it.

Also, if someone else builds a rival AI, you don’t want it to able to trick your AI into deciding to self-destruct by setting up a clever Omega-like situation.
- CannibalSmith 21 Jul 2009 8:43 UTC
  0 points
  Parent
  If we can predict to how an AI would modify itself, why don’t we just write an already modified AI?
  - thomblake 21 Jul 2009 18:08 UTC
    1 point
    Parent
    Because the point of a self-modifying AI is that it will be able to self-modify in situations we don’t anticipate. Being able to predict its self-modification in principle is useful precisely because we can’t hard-code every special case.