Regarding the “Safety/Alignment vs. Capabilities” meme: it seems like people are sometimes using “capabilities” to use 2 different things:
1) “intelligence” or “optimization power”… i.e. the ability to optimize some objective function
2) “usefulness”: the ability to do economically valuable tasks or things that people consider useful
I think it is meant to refer to (1).
Alignment is likely to be a bottleneck for (2).
For a given task, we can expect 3 stages of progress:
i) sufficient capabilities(1) to perform the task
ii) sufficient alignment to perform the task unsafely
iii) sufficient alignment to perform the task safely
Between (i) and (ii) we can expect a “capabilities(1) overhang”. When we go from (i) to (ii) we will see unsafe AI systems deployed and a potentially discontinuous jump in their ability to do the task.
Regarding the “Safety/Alignment vs. Capabilities” meme: it seems like people are sometimes using “capabilities” to use 2 different things:
1) “intelligence” or “optimization power”… i.e. the ability to optimize some objective function
2) “usefulness”: the ability to do economically valuable tasks or things that people consider useful
I think it is meant to refer to (1).
Alignment is likely to be a bottleneck for (2).
For a given task, we can expect 3 stages of progress:
i) sufficient capabilities(1) to perform the task
ii) sufficient alignment to perform the task unsafely
iii) sufficient alignment to perform the task safely
Between (i) and (ii) we can expect a “capabilities(1) overhang”. When we go from (i) to (ii) we will see unsafe AI systems deployed and a potentially discontinuous jump in their ability to do the task.