johnswentworth comments on General alignment plus human values, or alignment via human values?

johnswentworth 22 Oct 2021 16:39 UTC
LW: 5 AF: 5
AF
Under my current understanding of the “general alignment” angle, a core argument goes like:
- We need some way for agents to create aligned successor agents, so our AI doesn’t succumb to value drift. This is a thing we need regardless, assuming that AIs will design successively-more-powerful descendants.
- If the successor-design process is sufficiently-general-purpose, a human could use that same process to design the “seed” AI in the first place.
I don’t necessarily think this is the best framing, and I don’t necessarily agree with it (e.g. whether the agent has direct read-access to its own values is an important distinction, and separately there’s an argument that an AGI will be better-equipped to figure out its own succession problem than we will). I also don’t know whether this is an accurate representation of anybody’s view.
- Stuart_Armstrong 25 Oct 2021 13:48 UTC
  LW: 4 AF: 4
  AF Parent
  The successor problem is important, but it assumes we have the values already.
  
  I’m imagining algorithms designing successors with imperfect values (that they know to be imperfect). It’s a somewhat different problem (though solving the classical successor problem is also important).