research_prime_space comments on Penalize Model Complexity Via Self-Distillation

research_prime_space 8 Apr 2023 0:58 UTC
2 points
0
As of right now, I don’t think that LLMs are trained to be power seeking and deceptive.
Power-seeking is likely if the model is directly maximizing rewards, but LLMs are not quite doing this.