Who is aligning the AGI? And to what is it aligning?
Generally, I tend to think of “how do we align an AGI to literally anyone at all whatsoever instead of producing absolutely nothing of value to any human ever” as being a strict prerequisite to “who to align to”; the former without the latter may be suboptimal, but the latter without the former is useless.
My guess is, it’s a fuckton easier to sort out Friendliness/alignment within a human being than it is on a computer. Because the stuff making up Friendliness is right there.
I don’t think this is a given. Humans are not necessarily aligned, and one advantage of AI is we can build it ourselves with inductive biases that we can craft and inspect its internals fully.
Solving AI alignment is fundamentally a coordination problem.
I think this is less true about alignment than many other things. It is true that it would be hugely easier if we could get everyone in the world to coordinate and eliminate capabilities races, but this is true for a lot of things, and isn’t strictly necessary to solve alignment imo. Most people I know are working in the regime where we assume we can’t solve these coordination problems, and instead try to solve just enough of the technical problem to bootstrap alignment or perform a pivotal act.
Generally, I tend to think of “how do we align an AGI to literally anyone at all whatsoever instead of producing absolutely nothing of value to any human ever” as being a strict prerequisite to “who to align to”; the former without the latter may be suboptimal, but the latter without the former is useless.
I don’t think this is a given. Humans are not necessarily aligned, and one advantage of AI is we can build it ourselves with inductive biases that we can craft and inspect its internals fully.
I think this is less true about alignment than many other things. It is true that it would be hugely easier if we could get everyone in the world to coordinate and eliminate capabilities races, but this is true for a lot of things, and isn’t strictly necessary to solve alignment imo. Most people I know are working in the regime where we assume we can’t solve these coordination problems, and instead try to solve just enough of the technical problem to bootstrap alignment or perform a pivotal act.