Ivan Vendrov comments on Any further work on AI Safety Success Stories?

Ivan Vendrov 4 Oct 2022 2:03 UTC
1 point
0
Depends on offense-defense balance, I guess. E.g. if well-intentioned and well-coordinated actors are controlling 90% of AI-relevant compute then it seems plausible that they could defend against 10% of the compute being controlled by misaligned AGI or other bad actors—by denying them resources, by hardening core infrastructure, via MAD, etc.
- Krieger 4 Oct 2022 7:31 UTC
  1 point
  0
  Parent
  It seems like the exact model which the AI will adopt is kinda confounding my picture when I’m trying to imagine how “existentially secure” a world looks like. I’m current thinking there are two possible existentially secure worlds:
  The obvious one is where all human dependence is removed from setting/modifying the AI’s value system (like CEV, fully value-aligned)—this would look much more unipolar.
  The alternate is for the well-intentioned-and-coordianted group to use a corrigible AI that is aligned with its human instructor. To me, whether this scenario looks existentially secure probably depends on “whether small differences in capability can magnify to great power differences”—if false, it would be much easier for capable groups to defect and make their own corrigible AI push agendas that may not be in favor of humanity’s interest (hence not so existentially secure). If true, then the world would again be more unipolar—and its existential secureness would depend on how value-aligned the humans that are operating the corrigible AI are (I’m guessing this is your offense-defense balance example?)
  So it seems to me that the ideal end game is for humanity to end up with a value-aligned AI, either by starting with it or somehow going through the “dangerous period” of multipolar corrigible AIs and transition to a value-aligned one. Possible pathways (non-exhaustive).
  I’m not sure whether this is a good framing at all (probably isn’t), but simply counting the number of dependencies (without taking into consider how plausible each dependencies are) it just seems to me that humanity’s chances would be better off with a unipolar takeover scenario—either using a value-aligned AI from the start or transitioning into one after a pivotal act.