Wei Dai comments on AlphaGo Zero and capability amplification

Wei Dai Jan 11, 2019, 4:33 AM
LW: 7 AF: 4
AF

I’m not aware of any AI safety researchers that are extremely optimistic about solving alignment competitively.

I’m not sure what you’d consider “extremely” optimistic, but I gathered some quantitative estimates of AI risk here, and they all seem overly optimistic to me. Did you see that?

Paul: I just think working on this problem earlier will tell us what’s going on. If we’re in the world where you need a really drastic policy response to cope with this problem, then you want to know that as soon as possible.

I agree with this motivation to do early work, but in a world where we do need drastic policy responses, I think it’s pretty likely that the early work won’t actually produce conclusive enough results to show that. For example, if a safety approach fails to make much progress, there’s not really a good way to tell if it’s because safe and competitive AI really is just too hard (and therefore we need a drastic policy response), or because the approach is wrong, or the people working on it aren’t smart enough, or they’re trying to do the work too early. People who are inclined to be optimistic will probably remain so until it’s too late.
What links here?
- Wei Dai's comment on Some Thoughts on Metaphilosophy by Wei Dai (Feb 11, 2019, 4:32 AM; 5 points)
- Ofer Jan 11, 2019, 11:38 PM
  3 points
  Parent
  but I gathered some quantitative estimates of AI risk here, and they all seem overly optimistic to me. Did you see that?
  I only now read that thread. I think it is extremely worthwhile to gather such estimates.
  I think all the three estimates mentioned there correspond to marginal probabilities (rather than probabilities conditioned on “no governance interventions”). So those estimates already account for scenarios in which governance interventions save the world. Therefore, it seems we should not strongly update against the necessity of governance interventions due to those estimates being optimistic.
  Maybe we should gather researchers’ credences for predictions like:
  ”If there will be no governance interventions, competitive aligned AIs will exist in 10 years from now”.
  I suspect that gathering such estimates from publicly available information might expose us to a selection bias, because very pessimistic estimates might be outside the Overton window (even for the EA/AIS crowd). For example, if Robert Wiblin would have concluded that an AI existential catastrophe is 50% likely, I’m not sure that the 80,000 Hours website (which targets a large and motivationally diverse audience) would have published that estimate.
  I agree with this motivation to do early work, but in a world where we do need drastic policy responses, I think it’s pretty likely that the early work won’t actually produce conclusive enough results to show that. For example, if a safety approach fails to make much progress, there’s not really a good way to tell if it’s because safe and competitive AI really is just too hard (and therefore we need a drastic policy response), or because the approach is wrong, or the people working on it aren’t smart enough, or they’re trying to do the work too early.
  I strongly agree with all of this.
  - paulfchristiano Jan 12, 2019, 2:12 AM
    7 points
    Parent
    I think all the three estimates mentioned there correspond to marginal probabilities (rather than probabilities conditioned on “no governance interventions”). So those estimates already account for scenarios in which governance interventions save the world. Therefore, it seems we should not strongly update against the necessity of governance interventions due to those estimates being optimistic
    I normally give ~50% as my probability we’d be fine without any kind of coordination.
    - Wei Dai Jan 12, 2019, 8:03 AM
      6 points
      Parent
      Upvoted for giving this number, but what does it mean exactly? You expect “50% fine” through all kinds of x-risk, assuming no coordination from now until the end of the universe? Or just assuming no coordination until AGI? Is it just AI risk instead of all x-risk, or just risk from narrow AI alignment? If “AI risk”, are you including risks from AI exacerbating human safety problems, or AI differentially accelerating dangerous technologies? Is it 50% probability that humanity survives (which might be “fine” to some people) or 50% that we end up with a nearly optimal universe? Do you have a document that gives all of your quantitative risk estimates with clear explanations of what they mean?
      
      (Sorry to put you on the spot here when I haven’t produced anything like that myself, but I just want to convey how confusing all this is.)