Thanks, it definitely seems right that improving capabilities through alignment research (negative Thing 3) doesn’t necessarily make safe AI easier to build than unsafe AI overall (Things 1 and 2). This is precisely why if techniques for building safe AI were discovered that were simultaneously powerful enough to move us toward friendly TAI (and away from the more-likely-by-default dangerous TAI, to your point), this would probably be good from an x-risk perspective.
We aren’t celebrating any capability improvement that emerges from alignment research—rather, we are emphasizing the expected value of techniques that inherently improve both alignment and capabilities such that the strategic move for those who want to build maximally-capable AI shifts towards the adoption of these techniques (and away from the adoption of approaches that might cause similar capabilities gains without the alignment benefits). This is a more specific take than just a general ‘negative Thing 3.’
I also think there is a relevant distinction between ‘mere’ dual-use research and what we are describing here—note the difference between points 1 and 2 in this comment.
Thanks, it definitely seems right that improving capabilities through alignment research (negative Thing 3) doesn’t necessarily make safe AI easier to build than unsafe AI overall (Things 1 and 2). This is precisely why if techniques for building safe AI were discovered that were simultaneously powerful enough to move us toward friendly TAI (and away from the more-likely-by-default dangerous TAI, to your point), this would probably be good from an x-risk perspective.
We aren’t celebrating any capability improvement that emerges from alignment research—rather, we are emphasizing the expected value of techniques that inherently improve both alignment and capabilities such that the strategic move for those who want to build maximally-capable AI shifts towards the adoption of these techniques (and away from the adoption of approaches that might cause similar capabilities gains without the alignment benefits). This is a more specific take than just a general ‘negative Thing 3.’
I also think there is a relevant distinction between ‘mere’ dual-use research and what we are describing here—note the difference between points 1 and 2 in this comment.