Is the disagreement here about whether AIs are likely to develop things like situational awareness, foresightful planning ability, and understanding of adversaries’ decisions as they are used for more and more challenging tasks?
My thought on this is, if a baseline AI system does not have situational awareness before the AI researchers started fine-tuning it, I would not expect it to obtain situational awareness through reinforcement learning with human feedback.
I am not sure I can answer this for the hypothetical “Alex” system in the linked post, since I don’t think I have a good mental model of how such a system would work or what kind of training data or training protocol you would need to have to create such a thing.
If I saw something that, from the outside, appeared to exhibit the full range of abilities Alex is described as having (including advancing R&D in multiple disparate domains in ways that are not simple extrapolations of its training data) I would assign a significantly higher probability to that system having situational awareness than I do to current systems. If someone had a system that was empirically that powerful, which had been trained largely by reinforcement learning, I would say the responsible thing to do would be:
Keep it air-gapped rather than unleashing large numbers of copies of it onto the internet
Carefully vet any machine blueprints, drugs or other medical interventions, or other plans or technologies the system comes up with (perhaps first building a prototype to gather data on it in an isolated controlled setting where it can be quickly destroyed) to ensure safety before deploying them out into the world.
The 2nd of those would have the downside that beneficial ideas and inventions produced by the system take longer to get rolled out and have a positive effect. But it would be worth it in that context to reduce the risk of some large unforeseen downside.
I think that as people push AIs to do more and more ambitious things, it will become more and more likely that situational awareness comes along with this, for reasons broadly along the lines of those I linked to (it will be useful to train the AI to have situational awareness and/or other properties tightly linked to it).
I think this could happen via RL fine-tuning, but I also think it’s a mistake to fixate too much on today’s dominant methods—if today’s methods can’t produce situational awareness, they probably can’t produce as much value as possible, and people will probably move beyond them.
The “responsible things to do” you list seem reasonable, but expensive, and perhaps skipped over in an environment where there’s intense competition, things are moving quickly, and the risks aren’t obvious (because situationally aware AIs are deliberately hiding a lot of the evidence of risk).
My thought on this is, if a baseline AI system does not have situational awareness before the AI researchers started fine-tuning it, I would not expect it to obtain situational awareness through reinforcement learning with human feedback.
I am not sure I can answer this for the hypothetical “Alex” system in the linked post, since I don’t think I have a good mental model of how such a system would work or what kind of training data or training protocol you would need to have to create such a thing.
If I saw something that, from the outside, appeared to exhibit the full range of abilities Alex is described as having (including advancing R&D in multiple disparate domains in ways that are not simple extrapolations of its training data) I would assign a significantly higher probability to that system having situational awareness than I do to current systems. If someone had a system that was empirically that powerful, which had been trained largely by reinforcement learning, I would say the responsible thing to do would be:
Keep it air-gapped rather than unleashing large numbers of copies of it onto the internet
Carefully vet any machine blueprints, drugs or other medical interventions, or other plans or technologies the system comes up with (perhaps first building a prototype to gather data on it in an isolated controlled setting where it can be quickly destroyed) to ensure safety before deploying them out into the world.
The 2nd of those would have the downside that beneficial ideas and inventions produced by the system take longer to get rolled out and have a positive effect. But it would be worth it in that context to reduce the risk of some large unforeseen downside.
I think that as people push AIs to do more and more ambitious things, it will become more and more likely that situational awareness comes along with this, for reasons broadly along the lines of those I linked to (it will be useful to train the AI to have situational awareness and/or other properties tightly linked to it).
I think this could happen via RL fine-tuning, but I also think it’s a mistake to fixate too much on today’s dominant methods—if today’s methods can’t produce situational awareness, they probably can’t produce as much value as possible, and people will probably move beyond them.
The “responsible things to do” you list seem reasonable, but expensive, and perhaps skipped over in an environment where there’s intense competition, things are moving quickly, and the risks aren’t obvious (because situationally aware AIs are deliberately hiding a lot of the evidence of risk).