Seth Herd comments on Instruction-following AGI is easier and more likely than value aligned AGI

Seth Herd 16 May 2024 1:51 UTC
16 points
12
I very much agree. Part of why I wrote that post was that this is a common assumption, yet much of the discourse ignores it and addresses value alignment. Which would be better if we could get it, but it seems wildly unrealistic to expect us to try.

The pragmatics of creating AGI for profit are a powerful reason to aim for instruction-following instead of value alignment; to the extent it will actually be safer and work better, that’s just one more reason that we should be thinking about that type of alignment. Not talking about it won’t keep it from taking that path.
- RussellThor 16 May 2024 2:20 UTC
  4 points
  0
  Parent
  I think value alignment will be expected/enforced as a negative to some extent. E.g. don’t do something obviously bad (many such things are illegal anyway) and I expect that constraint to get tighter. That could give some kind of status quo bias on what AI tools are allowed to do also as an unknown new thing could be bad or seen as bad.
  Already the AI could “do what I mean and check” a lot better. for coding tasks etc it will often do the wrong thing when it could clarify. I would like to see a confidence indicator that it knows what I want before it continues. I don’t want to guess how much to clarify which what I currently have to do—this wastes time and mental effort. You are right there will be commercial pressure to do something at least somewhat similar.