I set out to review the OpenAI alignment plan, and my brain at some point diverged to modeling the humans behind the arguments instead of the actual arguments.
So behold! A simplified, first-pass Alignment Typology.
Why can’t we all just get agree?
There are a lot of disagreements in AI alignment. Some people don’t see the problem, some think we’ll be fine, some think we’re doomed, and then different clusters of people have different ideas on how we should go about solving alignment. Thus I tried to sketch out my understanding of the key differences between the largest clusters of views on AI alignment. What emerged are roughly five cluster, sorted in order of optimism about the fate of humanity: the sceptics, the humanists, the empiricists, the rationalists, and the fatalists.
Sceptics don’t expect AGI to show up in any relevant time frame.
Humanists think humanity will prevail fairly easily through coordination around alignment or just solving the problem directly.
Empiricists think the problem is hard, AGI will show up soon, and if we want to have any hope of solving it, then we need to iterate and take some necessary risk by making progress in capabilities while we go.
Rationalists think the problem is hard, AGI will show up soon, and we need to figure out as much as we can before making any capabilities progress.
Fatalists think we are doomed and we shouldn’t even try (though some are quite happy about it).
Here is a table.
Sceptics
Humanists
Empiricists
Theorists
Fatalists
Alignment Difficulty
-
One of these
is low
high
high
-
Coordination Difficulty
-
high
high
-
Distance to AGI
high
-
low/med
low/med
—
Closeness to AGI required
to Solve Alignment
h
ighm
ed/high—
m
Closeness to AGI resulting
in unacceptable danger
e
d/highhi
gh—h
Alignment Necessary
or Possible
i
ghhi
ghhi
ghlo
w
Less Wrong is mostly populated by empiricists and rationalists. They agree alignment is a problem that can and should be solved. The key disagreement is on the methodology. While empiricists lean more heavily on gathering data and iterating solutions, rationalists lean more heavily toward discovering theories and proofs to lower risk from AGI (and some people are a mix of the two). Just by shifting the weights of risk/reward on iteration and moving forward, you get two opposite approaches to doing alignment work.
How is this useful?
Personally it helps me quickly get an idea of what clusters people are in, and understanding the likely arguments for their conclusions. However, a counterargument can be made that this just feeds into stereotyping and creating schisms, and I can’t be sure that’s untrue.
A Simple Alignment Typology
I set out to review the OpenAI alignment plan, and my brain at some point diverged to modeling the humans behind the arguments instead of the actual arguments.
So behold! A simplified, first-pass Alignment Typology.
Why can’t we all just get agree?
There are a lot of disagreements in AI alignment. Some people don’t see the problem, some think we’ll be fine, some think we’re doomed, and then different clusters of people have different ideas on how we should go about solving alignment. Thus I tried to sketch out my understanding of the key differences between the largest clusters of views on AI alignment. What emerged are roughly five cluster, sorted in order of optimism about the fate of humanity: the sceptics, the humanists, the empiricists, the rationalists, and the fatalists.
Sceptics don’t expect AGI to show up in any relevant time frame.
Humanists think humanity will prevail fairly easily through coordination around alignment or just solving the problem directly.
Empiricists think the problem is hard, AGI will show up soon, and if we want to have any hope of solving it, then we need to iterate and take some necessary risk by making progress in capabilities while we go.
Rationalists think the problem is hard, AGI will show up soon, and we need to figure out as much as we can before making any capabilities progress.
Fatalists think we are doomed and we shouldn’t even try (though some are quite happy about it).
Here is a table.
One of these
is low
Closeness to AGI required
to Solve Alignment
Closeness to AGI resulting
in unacceptable danger
Alignment Necessary
or Possible
Less Wrong is mostly populated by empiricists and rationalists. They agree alignment is a problem that can and should be solved. The key disagreement is on the methodology. While empiricists lean more heavily on gathering data and iterating solutions, rationalists lean more heavily toward discovering theories and proofs to lower risk from AGI (and some people are a mix of the two). Just by shifting the weights of risk/reward on iteration and moving forward, you get two opposite approaches to doing alignment work.
How is this useful?
Personally it helps me quickly get an idea of what clusters people are in, and understanding the likely arguments for their conclusions. However, a counterargument can be made that this just feeds into stereotyping and creating schisms, and I can’t be sure that’s untrue.
What do you think?