I think that as people push AIs to do more and more ambitious things, it will become more and more likely that situational awareness comes along with this, for reasons broadly along the lines of those I linked to (it will be useful to train the AI to have situational awareness and/or other properties tightly linked to it).
I think this could happen via RL fine-tuning, but I also think it’s a mistake to fixate too much on today’s dominant methods—if today’s methods can’t produce situational awareness, they probably can’t produce as much value as possible, and people will probably move beyond them.
The “responsible things to do” you list seem reasonable, but expensive, and perhaps skipped over in an environment where there’s intense competition, things are moving quickly, and the risks aren’t obvious (because situationally aware AIs are deliberately hiding a lot of the evidence of risk).
I think that as people push AIs to do more and more ambitious things, it will become more and more likely that situational awareness comes along with this, for reasons broadly along the lines of those I linked to (it will be useful to train the AI to have situational awareness and/or other properties tightly linked to it).
I think this could happen via RL fine-tuning, but I also think it’s a mistake to fixate too much on today’s dominant methods—if today’s methods can’t produce situational awareness, they probably can’t produce as much value as possible, and people will probably move beyond them.
The “responsible things to do” you list seem reasonable, but expensive, and perhaps skipped over in an environment where there’s intense competition, things are moving quickly, and the risks aren’t obvious (because situationally aware AIs are deliberately hiding a lot of the evidence of risk).