Eliezer Yudkowsky comments on On Terminal Goals and Virtue Ethics

Eliezer Yudkowsky 24 Jun 2014 18:25 UTC
−1 points

From what I can tell on the outside, the MIRI approach seems to be: (1) find a practical theory of FAI; (2) design an AGI in accordance with this theory; (3) implement that design; (4) mission accomplished!

Yes, dear, some of us are programmers, we know about waterfalls. Our approach is more like, “Attack the most promising problems that present themselves, at every point; don’t actually build things which you don’t yet know how to make not destroy the world, at any point.” Right now this means working on unbounded problems because there are no bounded problems which seem more relevant and more on the critical path. If at any point we can build something to test ideas, of course we will; unless our state of ignorance is such that we can’t test that particular idea without risking destroying the world, in which case we won’t, but if you’re really setting out to test ideas you can probably figure out some other way to test them, except for very rare highly global theses like “The intelligence explosion continues past the human level.” More local theses should be testable.

See also Ch. 22 from HPMOR, and keep in mind that I am not Harry, I contain Harry, all the other characters, their whole universe, and everything that happens inside it. In other words, I am not Harry, I am the universe that responded to Harry.
- Mark_Friedenbach 24 Jun 2014 19:09 UTC
  8 points
  Parent
  I’ll have to review Ch. 22 later as it is quite long.
  
  If a stable self-modifying agent + friendly value-loading was the only pathway to a positive singularity, then MIRI would be doing a fine job. However I find that assumption not adequately justified.
  
  For example, take oracle AI. The sequences do a good job of showing how a black box AI can’t be safely boxed, nor can any of its recommendations be trusted. But those arguments don’t generalize to when we can see and understand the inner workings of the AI. Yes engineering challenges apply as you can’t demand a computational trace of the entire returned result, as that would require an even more powerful AI to analyze, and then it’d be turtles all the way down. However you can do something like the Fiat-Shamir transform for selecting branches of the computational trace to audit. In essence, use the cryptographic hash of the result in order to choose which traces of the audit log to reveal. This allows the audit log to be only a tiny, tiny slice of the entire computation, yet it can be shown that faking such an audit log is computationally infeasible, meaning that it requires a large multiple more computation than went into the original result, which means it would be energetically detectable.
  
  Of course you would also have to have the AI be written for a virtual machine which is pure, strongly typed and directly supports the native abstraction primitives of the AGI design (e.g. if it were OpenCog, the virtual machine would exist at the abstraction level of the atomspace), and have a computational stack underneath that which prevents VM breakout, and other protective measures. But these are surmountable engineering challenges, the difficulty of which can be realistically quantified.
  
  So how much more or less difficult would it be to accomplish such an untrusted oracle AI vs the stable self-modifying agent and value-loading approach? Which one is more likely to occur before the “competition”?
  
  I’m not demanding a full waterfall project plan, but even agile requires convincing arguments about critical paths and relative priorities. I for one am not convinced.
  - TheAncientGeek 25 Jun 2014 13:54 UTC
    −4 points
    Parent
    
    If a stable self-modifying agent + friendly value-loading was the only pathway to a positive singularity, then MIRI would be doing a fine job. However I find that assumption not adequately justified.
    
    Well that makes three of us...
- [deleted] 24 Jun 2014 21:55 UTC
  3 points
  Parent
  
  See also Ch. 22 from HPMOR, and keep in mind that I am not Harry, I contain Harry, all the other characters, their whole universe, and everything that happens inside it. In other words, I am not Harry, I am the universe that responded to Harry.
  
  Badass boasting from fictional evidence?
  
  Yes, dear, some of us are programmers, we know about waterfalls.
  
  If anyone here knew anything about the Waterfall Model, they’d know it was only ever proposed sarcastically, as a perfect example of how real engineering projects never work. “Agile” is pretty goddamn fake, too. There’s no replacement for actually using your mind to reason about what project-planning steps have the greatest expected value at any given time, and to account for unknown unknowns (ie: debugging, other obstacles) as well.
  - Eliezer Yudkowsky 26 Jun 2014 17:51 UTC
    0 points
    Parent
    
    If anyone here knew anything about the Waterfall Model, they’d know it was only ever proposed sarcastically, as a perfect example of how real engineering projects never work
    
    Yes, and I used it in that context: “We know about waterfalls” = “We know not to do waterfalls, so you don’t need to tell us that”. Thank you for that very charitable interpretation of my words.
    - [deleted] 27 Jun 2014 5:48 UTC
      2 points
      Parent
      Well, when you start off a sentence with “Yes, dear”, the dripping sarcasm can be read multiple ways, none of them very useful or nice.
      
      Whatever. No point fighting over tone given shared goals.