Charlie Steiner comments on The case for a negative alignment tax

Charlie Steiner 19 Sep 2024 5:56 UTC
18 points
6
Here are some different things you might have clustered as “alignment tax.”

Thing 1: The difference between the difficulty of building the technologically-closest friendly transformative AI and the technologically-closest dangerous transformative AI.

Thing 2: The expected difference between the difficulty of building likely transformative AI conditional on it being friendly and the difficulty of building likely transformative AI no matter friendly or dangerous.

Thing 3: The average amount that effort spent on alignment detracts from the broader capability or usefulness of AI.

Turns out it’s possible to have negative Thing 3 but positive Things 1 and 2. This post seems to call such a state of affairs “optimistic,” which is way too hasty.

An opposite-vibed way of framing the oblique angle between alignment and capabilities is as “unavoidable dual use research.” See this long post about the subject
- Cameron Berg 19 Sep 2024 14:26 UTC
  9 points
  5
  Parent
  Thanks, it definitely seems right that improving capabilities through alignment research (negative Thing 3) doesn’t necessarily make safe AI easier to build than unsafe AI overall (Things 1 and 2). This is precisely why if techniques for building safe AI were discovered that were simultaneously powerful enough to move us toward friendly TAI (and away from the more-likely-by-default dangerous TAI, to your point), this would probably be good from an x-risk perspective.
  We aren’t celebrating any capability improvement that emerges from alignment research—rather, we are emphasizing the expected value of techniques that inherently improve both alignment and capabilities such that the strategic move for those who want to build maximally-capable AI shifts towards the adoption of these techniques (and away from the adoption of approaches that might cause similar capabilities gains without the alignment benefits). This is a more specific take than just a general ‘negative Thing 3.’
  I also think there is a relevant distinction between ‘mere’ dual-use research and what we are describing here—note the difference between points 1 and 2 in this comment.