VAuroch comments on Steelmanning MIRI critics

VAuroch 20 Aug 2014 3:14 UTC
3 points

In order to get that provably friendly thing to work, you have to deal with an explicit, unchanging utility function,

I think this is incorrect. If it isn’t, it at least requires some proof.
- DanielLC 20 Aug 2014 3:29 UTC
  1 point
  Parent
  For one thing, you’d have to explicitly come up with the utility function before you can prove the AI follows it.
  
  You can either make an AI that will proveably do what you mean, or make one that will hopefully figure out what you meant when you said “do what I mean,” and do that.
  - VAuroch 20 Aug 2014 7:10 UTC
    1 point
    Parent
    When I picture what a proven-Friendly AI looks like, I think of something where it’s goals are 1)Using a sample of simulated humans, generalize to unpack ‘do what I mean’ followed by 2)Make satisfying that your utility function.
    
    Proving those two steps each rigorously would produce a proven-Friendly AI without an explicit utility function. Proving step 1 to be safe would obviously be very difficult; proving step 2 to be safe would probably be comparatively easy. Both, however, are plausibly rigorously provable.
    - DanielLC 20 Aug 2014 16:27 UTC
      1 point
      Parent
      
      2)Make satisfying that your utility function.
      
      This is what I mean by an explicit utility function. An implicit one is where it never actually calculates utility, like how humans work.