As of right now, I don’t think that LLMs are trained to be power seeking and deceptive.
Power-seeking is likely if the model is directly maximizing rewards, but LLMs are not quite doing this.
As of right now, I don’t think that LLMs are trained to be power seeking and deceptive.
Power-seeking is likely if the model is directly maximizing rewards, but LLMs are not quite doing this.