Minimum Viable Alignment

HunterJay7 May 2022 13:18 UTC

11 points

What is the largest possible target we could have for aligned AGI?

That is, instead of creating a great and prosperous future, is it possible that we can find an easier path to align an AGI by aiming for the entire set of ‘this-is-fine’ kind of futures?

For example, a future where all new computers are rendered inoperable by malicious software. Or a future where a mostly-inactive AI does nothing except prevent any superintelligence from forming, or that continuously tries to use up all over the available compute in the world.

I don’t believe there is a solution here yet either, but could relaxing the problem from ‘what we actually want’ to ‘anything we could live with’ help? Has there been much work in this direction? Please let me know what to search for if so. Thank you.

HunterJay7 May 2022 13:18 UTC

11 points

7 comments1 min readLW link

Charlie Steiner 7 May 2022 16:24 UTC
7 points
Yes, some people are interested in it and other people think it’s not worth it. See e.g. the Eliezer Yudkowsky + Richard Ngo chat log posts.
- HunterJay 8 May 2022 13:54 UTC
  1 point
  Parent
  Will check them out, thank you.
Chris_Leong 8 May 2022 21:36 UTC
3 points
I wrote a post that is related—“Is some kind of minimally-invasive mass surveillance required for catastrophic risk prevention?”
- HunterJay 9 May 2022 7:54 UTC
  1 point
  Parent
  Thanks Chris, but I think you linked to the wrong thing there, I can’t see your post in the last 3 years of your history either!
  - Chris_Leong 9 May 2022 11:32 UTC
    2 points
    Parent
    Sorry, fixed.
Perhaps 7 May 2022 20:21 UTC
2 points
Well it depends on your priors for how an AGI would act, but as I understand it, all AGIs will be powerseeking. If an AGI is powerseeking, and has access to some amount of compute, then it will probably bootstrap itself to superintelligence, and then start pushing its utility function all over. Different utility functions cause different results, but even relatively mundane ones like “prevent another superintelligence from being created” could result in the AGI killing all humans and taking over the galaxy to make sure no other superintelligence gets made. I think it’s actually really really hard to specify the what-we-actually-want future for an AGI, so much so that evolutionarily training an AGI in an Earth-like environment so it develops human-ish morals will be necessary.
- HunterJay 8 May 2022 13:55 UTC
  1 point
  Parent
  Aye, I agree it is not a solution to avoiding power seeking, only that there may be a slightly easier target to hit if we can relax as many constraints on alignment as possible.