Wei Dai comments on What failure looks like

Wei Dai 17 Apr 2019 21:08 UTC
LW: 3 AF: 1
AF

The key issue here is whether there will be coordination between a set of influence-seeking systems that can cause (and will benefit from) a catastrophe, even when other systems are opposing them.

Do you not expect this threshold to be crossed sooner or later, assuming AI alignment remains unsolved? Also, it seems like the main alternative to this scenario is that the influence-seeking systems expect to eventually gain control of most of the universe anyway (even without a “correlated automation failure”), so they don’t see a reason to “rock the boat” and try to dispossess humans of their remaining influence/power/resources, but this is almost as bad as the “correlated automation failure” scenario from an astronomical waste perspective. (I’m wondering if you’re questioning whether things will turn out badly, or questioning whether things will turn out badly this way.)
- Richard_Ngo 18 Apr 2019 2:40 UTC
  LW: 6 AF: 2
  2
  AF Parent
  Mostly I am questioning whether things will turn out badly this way.
  Do you not expect this threshold to be crossed sooner or later, assuming AI alignment remains unsolved?
  Probably, but I’m pretty uncertain about this. It depends on a lot of messy details about reality, things like: how offense-defence balance scales; what proportion of powerful systems are mostly aligned; whether influence-seeking systems are risk-neutral; what self-governance structures they’ll set up; the extent to which their preferences are compatible with ours; how human-comprehensible the most important upcoming scientific advances are.