Mark_Friedenbach comments on On Terminal Goals and Virtue Ethics

Mark_Friedenbach 23 Jun 2014 22:25 UTC
3 points
0
I think that’s a gross simplification of the possible outcomes.

Admittedly my own calculation looks less like an elaborate graph involving supposed credibility intervals, and, “Do we need to do this? Yes. Can we realistically avoid having to do this? No. Let’s start now EOM.”

I think you need better planning.

There’s a great essay that has been a featured article on the main page for some time now called Levels of Action. Applied to FAI theory:

Level 1: Directly ending human suffering.

Level 2: Constructing an AGI capable of ending human suffering for us.

Level 3: Working on the computer science aspects of AGI theory.

Level 4: Researching FAI theory, which constrains the Level 3 AGI theory.

But for that high-level basic research to have any utility, these levels must be connected to each other: there must be a firm chain where FAI theory informs AGI designs, which are actually used in the construction of an AGI tasked with ending human suffering in a friendly way.

From what I can tell on the outside, the MIRI approach seems to be: (1) find a practical theory of FAI; (2) design an AGI in accordance with this theory; (3) implement that design; (4) mission accomplished!

That makes a certain amount of intuitive sense, having stages laid out end-to-end in chronological order. However as a trained project manager I must tell you this is a recipe for disaster! The problem is that the design space branches out at each link, but without the feedback of follow-on steps, inefficient decision making will occur at earlier stages. The space of working FAI theories is much, much larger than the FAI-theory-space which results in practical AGI designs which can be implemented prior to the UFAI competition and are suitable for addressing real-world issues of human suffering as quickly as possible.

Some examples from the comparably large programs of the Manhattan project and Apollo moonshot are appropriate, if you’ll forgive the length (skip to the end for a conclusion):

The Manhattan project had one driving goal: drop a bomb on Berlin and Tokyo before the GIs arrived, hopefully ending the war early. (Of course Germany surrendered before the bomb was finished, and Tokyo ended up so devastated by conventional firebombing that Hiroshima and Nagasaki were selected instead, but the original goal is what matters here.) The location of the targets meant that the bomb had to be small enough to fit in a conventional long-distance bomber, and the timeline meant that the simpler but less efficient U-235 designs were preferred. A program was designed, adequate resources allocated, and the goal achieved on time.

On the other hand it is easy to imagine how differently things might have gone if the strategy was reversed; if instead the US military decided to institute a basic research program into nuclear physics and atomic structure, before deciding on the optimal bomb reactions, then doing detailed bomb design before creating the industry necessary to produce enough material for a working weapon. Just looking at the first stage, there is nothing a priori which makes it obvious that U-235 and Pu-239 are the “interesting” nuclear fuels to focus on. Thorium, for example, was more naturally abundant and already being extracted as a by product of rare earth metal extraction, its reactions generate less lethal radiation and long-lasting waste products, and does generate U-233 which could be used in a nuclear bomb. However the straight-forward military and engineering requirements of making a bomb on schedule, and successfully delivering it on target favored U-235 and Pu-239 based weapon designs, which focused focused the efforts of the physicists involved on those fuel pathways. The rest is history.

The Apollo moonshot is another great example. NASA had a single driving goal: deliver a man to the moon before 1970, and return him safely to Earth. There’s a lot of decisions that were made in the first few years driven simply by time and resources available: e.g. heavy-lift vs orbital assembly, direct return vs lunar rendezvous, expendable vs. reuse, staging vs. fuel depots. Ask Wernher von Braun what he imagined an ideal moon mission would look like, and you would have gotten something very different than Apollo. But with Apollo NASA made the right tradeoffs with respect to schedule constraints and programmatic risk.

The follow-on projects of Shuttle and Station are a completely different story, however. They were designed with no articulated long-term strategy, which meant they tried to be everything to everybody and as a result were useful to no one. Meanwhile the basic research being carried out at NASA has little, if anything to do with the long-term goals of sending humans to Mars. There’s an entire division, the Space Biosciences group, which does research on Station about the long-term effects of microgravity and radiation on humans, supposedly to enable a long-duration voyage to Mars. Never mind that the microgravity issue is trivially solved by spinning the spacecraft with nothing more than a strong steel rope as a tether, and the radiation issue is sufficiently mitigated by having a storm shelter en route and throwing a couple of Martian sandbags on the roof once you get there.

There’s an apocryphal story about the US government spending millions of dollars to develop the “Space Pen”—a ballpoint pen with ink under pressure to enable writing in microgravity environments. Much later at some conference an engineer in that program meets his Soviet counterpart and asks how they solved that difficult problem. The cosmonauts used a pencil.

Sadly the story is not true—the “Space Pen” was a successful marketing ploy by inventor Paul Fisher without any ties to NASA, although it was used by NASA and the Russians on later missions—but it does serve to illustrate the point very succinctly. I worry that MIRI is spending its days coming up with space pens when a pencil would have done just fine.

Let me provide some practical advice. If I were running MIRI, I would still employ mathematicians working on the hail-Mary of a complete FAI theory—avoiding the Löbian obstacle etc. -- and run the very successful workshops, though maybe just two a year. But beyond that I would spend all remaining resources on a pragmatic AGI design programme:

1) Have a series of workshops with AGI people to do a review of possible AI-influenced strategies for a positive singulatiry—top-down FAI, seed AI to FAI, Oracle AI to FAI, Oracle AI to human augmentation, teaching a UFAI morals in a nursery environment, etc.

2) Have a series of workshops, again with AGI people to review tactics: possible AGI architectures & the minimal seed AI for each architecture, probabilistically reliable boxing setups, programmatic security, etc.

Then use the output of these workshops—including reliable constraints on timelines—to drive most of the research done by MIRI. For example, I anticipate that reliable unfriendly Oracle AI setups will require probabilistically auditable computation, which itself will require a strongly typed, purely functional virtual machine layer from which computation traces can be extracted and meaningfully analyzed in isolation. This is the sort of research MIRI could sponsor a grad student or Ph.d postdoc to perform.

BTW, other gripe: I have yet to see adequate arguments for the “can we realistically avoid having to do this?” from MIRI which aren’t strawman arguments.
- Shmi 23 Jun 2014 22:52 UTC
  0 points
  0
  Parent
  While I don’t know much about your AGi expertise, I agree that MIRI is missing an experienced top-level executive who knows how to structure, implement and risk-mitigate an ambitious project like FAI and has a track record to prove it. Such a person would help prevent flailing about and wasting time and resources. I am not sure what other projects are in this reference class and whether MIRI can find and hire a person like that, so maybe they are doing what they can with the meager budget they’ve got. Do you think that the Manhattan project and the Space Shuttle are in the ballpark of the FAI? My guess is that they don’t even come close in terms of ambition, risk, effort or complexity.
  - Mark_Friedenbach 23 Jun 2014 23:37 UTC
    4 points
    0
    Parent
    
    I am not sure what other projects are in this reference class and whether MIRI can find and hire a person like that, so maybe they are doing what they can with the meager budget they’ve got.
    
    Project managers are typically expensive because they are senior people before they enter management. Someone who has never actually worked at the bottom rung of the ladder is often quite useless in a project management role. But that’s not to say that you can’t find someone young who has done a short stint at the bottom, got PMP certified (or whatever), and has 1-2 projects under their belt. It wouldn’t be cheap, but not horribly expensive either.
    
    On the other hand, Luke seems pretty on the ball with respect to administrative stuff. It may be sufficient to get him some project manager training and some very senior project management advisers.
    
    Neither one of these would be a long-term adequate solution. You need very senior, very experienced project management people in order to tackle something as large as FAI, and stay on schedule and on budget. But in terms of just making sure the organization is focused on the right issues, either of the above would be a drastic improvement, and enough for now.
    
    Do you think that the Manhattan project and the Space Shuttle are in the ballpark of the FAI? My guess is that they don’t even come close in terms of ambition, risk, effort or complexity.
    
    60 years ago, maybe. However these days advances in cognitive science, narrow AI, and computational tools are advancing at rapid paces on their own. The problem for MIRI should be that of ensuring a positive singularity via careful leverage of the machine intelligence already being developed for other purposes. That’s a much smaller project, and something I think a small but adequately funded organization should be able to pull off.
- Eliezer Yudkowsky 24 Jun 2014 18:25 UTC
  −1 points
  0
  Parent
  
  From what I can tell on the outside, the MIRI approach seems to be: (1) find a practical theory of FAI; (2) design an AGI in accordance with this theory; (3) implement that design; (4) mission accomplished!
  
  Yes, dear, some of us are programmers, we know about waterfalls. Our approach is more like, “Attack the most promising problems that present themselves, at every point; don’t actually build things which you don’t yet know how to make not destroy the world, at any point.” Right now this means working on unbounded problems because there are no bounded problems which seem more relevant and more on the critical path. If at any point we can build something to test ideas, of course we will; unless our state of ignorance is such that we can’t test that particular idea without risking destroying the world, in which case we won’t, but if you’re really setting out to test ideas you can probably figure out some other way to test them, except for very rare highly global theses like “The intelligence explosion continues past the human level.” More local theses should be testable.
  
  See also Ch. 22 from HPMOR, and keep in mind that I am not Harry, I contain Harry, all the other characters, their whole universe, and everything that happens inside it. In other words, I am not Harry, I am the universe that responded to Harry.
  - Mark_Friedenbach 24 Jun 2014 19:09 UTC
    8 points
    0
    Parent
    I’ll have to review Ch. 22 later as it is quite long.
    
    If a stable self-modifying agent + friendly value-loading was the only pathway to a positive singularity, then MIRI would be doing a fine job. However I find that assumption not adequately justified.
    
    For example, take oracle AI. The sequences do a good job of showing how a black box AI can’t be safely boxed, nor can any of its recommendations be trusted. But those arguments don’t generalize to when we can see and understand the inner workings of the AI. Yes engineering challenges apply as you can’t demand a computational trace of the entire returned result, as that would require an even more powerful AI to analyze, and then it’d be turtles all the way down. However you can do something like the Fiat-Shamir transform for selecting branches of the computational trace to audit. In essence, use the cryptographic hash of the result in order to choose which traces of the audit log to reveal. This allows the audit log to be only a tiny, tiny slice of the entire computation, yet it can be shown that faking such an audit log is computationally infeasible, meaning that it requires a large multiple more computation than went into the original result, which means it would be energetically detectable.
    
    Of course you would also have to have the AI be written for a virtual machine which is pure, strongly typed and directly supports the native abstraction primitives of the AGI design (e.g. if it were OpenCog, the virtual machine would exist at the abstraction level of the atomspace), and have a computational stack underneath that which prevents VM breakout, and other protective measures. But these are surmountable engineering challenges, the difficulty of which can be realistically quantified.
    
    So how much more or less difficult would it be to accomplish such an untrusted oracle AI vs the stable self-modifying agent and value-loading approach? Which one is more likely to occur before the “competition”?
    
    I’m not demanding a full waterfall project plan, but even agile requires convincing arguments about critical paths and relative priorities. I for one am not convinced.
    - TheAncientGeek 25 Jun 2014 13:54 UTC
      −4 points
      0
      Parent
      
      If a stable self-modifying agent + friendly value-loading was the only pathway to a positive singularity, then MIRI would be doing a fine job. However I find that assumption not adequately justified.
      
      Well that makes three of us...
  - [deleted] 24 Jun 2014 21:55 UTC
    3 points
    0
    Parent
    
    See also Ch. 22 from HPMOR, and keep in mind that I am not Harry, I contain Harry, all the other characters, their whole universe, and everything that happens inside it. In other words, I am not Harry, I am the universe that responded to Harry.
    
    Badass boasting from fictional evidence?
    
    Yes, dear, some of us are programmers, we know about waterfalls.
    
    If anyone here knew anything about the Waterfall Model, they’d know it was only ever proposed sarcastically, as a perfect example of how real engineering projects never work. “Agile” is pretty goddamn fake, too. There’s no replacement for actually using your mind to reason about what project-planning steps have the greatest expected value at any given time, and to account for unknown unknowns (ie: debugging, other obstacles) as well.
    - Eliezer Yudkowsky 26 Jun 2014 17:51 UTC
      0 points
      0
      Parent
      
      If anyone here knew anything about the Waterfall Model, they’d know it was only ever proposed sarcastically, as a perfect example of how real engineering projects never work
      
      Yes, and I used it in that context: “We know about waterfalls” = “We know not to do waterfalls, so you don’t need to tell us that”. Thank you for that very charitable interpretation of my words.
      - [deleted] 27 Jun 2014 5:48 UTC
        2 points
        0
        Parent
        Well, when you start off a sentence with “Yes, dear”, the dripping sarcasm can be read multiple ways, none of them very useful or nice.
        
        Whatever. No point fighting over tone given shared goals.