There are also interesting questions about whether MIRIs goals can be made to align with those of us who think that alignment is not trivial but is achievable. I’d better leave that for a separate post, as this has gotten pretty long for a “short form” post.
I’m not sure I see the conflict? If you’re a longtermist, most value is in the far future anyways. Delaying AGI by 10 years to buy just an 0.1% chance improvement at aligning AI seems like a good deal. I don’t agree with MIRI’s strong claims, but maybe those strong claims will slow AI progress, and that would be good by my lights.
What concerns me more is that their comms will have unexpected bad effects of speeding AI progress. On the outside view: (a) their comms have arguably backfired in the past and (b) they don’t seem to do much red-teaming, which I suspect is associated with unintentional harms, especially in a domain with few feedback loops.
Most of the world is not longtermist, which is one reason MIRI’s comms have backfired in the past. Most humans care vastly more about themselves, their children and grandchildren than they do about future generations. Thus, it makes perfect sense to them to increase the chance of a really good future for their children while reducing the odds of longterm survival. Delaying ten years is enough, for instance, to dramatically shift the odds of personal survival for many of us. It might make perfect sense for a utilitarian longtermist to say “it’s fine if I die to gain a .1% chance of a good long term future for humanity”, but that statement sounds absolutely insane to most humans.
I’m not sure I see the conflict? If you’re a longtermist, most value is in the far future anyways. Delaying AGI by 10 years to buy just an 0.1% chance improvement at aligning AI seems like a good deal. I don’t agree with MIRI’s strong claims, but maybe those strong claims will slow AI progress, and that would be good by my lights.
What concerns me more is that their comms will have unexpected bad effects of speeding AI progress. On the outside view: (a) their comms have arguably backfired in the past and (b) they don’t seem to do much red-teaming, which I suspect is associated with unintentional harms, especially in a domain with few feedback loops.
Most of the world is not longtermist, which is one reason MIRI’s comms have backfired in the past. Most humans care vastly more about themselves, their children and grandchildren than they do about future generations. Thus, it makes perfect sense to them to increase the chance of a really good future for their children while reducing the odds of longterm survival. Delaying ten years is enough, for instance, to dramatically shift the odds of personal survival for many of us. It might make perfect sense for a utilitarian longtermist to say “it’s fine if I die to gain a .1% chance of a good long term future for humanity”, but that statement sounds absolutely insane to most humans.