Under my current understanding of the “general alignment” angle, a core argument goes like:
We need some way for agents to create aligned successor agents, so our AI doesn’t succumb to value drift. This is a thing we need regardless, assuming that AIs will design successively-more-powerful descendants.
If the successor-design process is sufficiently-general-purpose, a human could use that same process to design the “seed” AI in the first place.
I don’t necessarily think this is the best framing, and I don’t necessarily agree with it (e.g. whether the agent has direct read-access to its own values is an important distinction, and separately there’s an argument that an AGI will be better-equipped to figure out its own succession problem than we will). I also don’t know whether this is an accurate representation of anybody’s view.
The successor problem is important, but it assumes we have the values already.
I’m imagining algorithms designing successors with imperfect values (that they know to be imperfect). It’s a somewhat different problem (though solving the classical successor problem is also important).
Under my current understanding of the “general alignment” angle, a core argument goes like:
We need some way for agents to create aligned successor agents, so our AI doesn’t succumb to value drift. This is a thing we need regardless, assuming that AIs will design successively-more-powerful descendants.
If the successor-design process is sufficiently-general-purpose, a human could use that same process to design the “seed” AI in the first place.
I don’t necessarily think this is the best framing, and I don’t necessarily agree with it (e.g. whether the agent has direct read-access to its own values is an important distinction, and separately there’s an argument that an AGI will be better-equipped to figure out its own succession problem than we will). I also don’t know whether this is an accurate representation of anybody’s view.
The successor problem is important, but it assumes we have the values already.
I’m imagining algorithms designing successors with imperfect values (that they know to be imperfect). It’s a somewhat different problem (though solving the classical successor problem is also important).