John_Maxwell comments on The Main Sources of AI Risk?

John_Maxwell 24 Mar 2019 19:36 UTC
3 points
One possibility is a sort of proof by induction, where you start with code which has been inspected by humans, then that code inspects further code, etc.

Daemons and mindcrime seem most worrisome for superhuman systems, but a human-level system is plausibly sufficient to comprehend human values (and thus do useful inspections). For daemons, I think you might even be able to formalize the idea without leaning hard on any specific utility function. The best approach might involve utility uncertainty on the part of the AI that becomes narrower with time, so you can gradually bootstrap your way to understanding human values while avoiding computational hazards according to your current guesses about human values on your way there.

People already choose not to think about particular topics on the basis of information hazards and internal suffering. Sometimes these judgements are made in an interrupt fashion partway through thinking about a topic; others are outside view judgments (“thinking about topic X always makes me feel depressed”).
- TheWakalix 26 Mar 2019 2:03 UTC
  1 point
  Parent
  Can you personally (under your own power) and confidently prove that a particular tool will only recursively-trust safe-and-reliable tools, where this recursive tree reaches far enough to trust superhuman AI?
  On the other hand, you can “follow” the tree for a distance. You can prove a calculator trustworthy and use it in your following proofs, for instance. This might make it more feasible.
  - John_Maxwell 26 Mar 2019 3:06 UTC
    2 points
    Parent
    I don’t think proofs are the right tool here. Proof by induction was meant as an analogy.