David Scott Krueger (formerly: capybaralet) comments on capybaralet’s Shortform

David Scott Krueger (formerly: capybaralet) 13 Sep 2020 21:23 UTC
5 points
Regarding the “Safety/Alignment vs. Capabilities” meme: it seems like people are sometimes using “capabilities” to use 2 different things:
1) “intelligence” or “optimization power”… i.e. the ability to optimize some objective function
2) “usefulness”: the ability to do economically valuable tasks or things that people consider useful
I think it is meant to refer to (1).
Alignment is likely to be a bottleneck for (2).
For a given task, we can expect 3 stages of progress:
i) sufficient capabilities(1) to perform the task
ii) sufficient alignment to perform the task unsafely
iii) sufficient alignment to perform the task safely
Between (i) and (ii) we can expect a “capabilities(1) overhang”. When we go from (i) to (ii) we will see unsafe AI systems deployed and a potentially discontinuous jump in their ability to do the task.
What links here?
- Forecasting Thread: AI Timelines by Amandango (22 Aug 2020 2:33 UTC; 133 points)
- Mati_Roy's comment on Mati_Roy’s Shortform by Mati_Roy (15 Sep 2020 5:33 UTC; 1 point)