quetzal_rainbow comments on Penalize Model Complexity Via Self-Distillation

quetzal_rainbow 7 Apr 2023 18:36 UTC
1 point
0
if the original model learned complex, power-seeking behaviors that doesn’t help it do well on the training data
The problem with power-seeking behavior is that it helps to do well in quite broad range of tasks.
- research_prime_space 8 Apr 2023 0:58 UTC
  2 points
  0
  Parent
  As of right now, I don’t think that LLMs are trained to be power seeking and deceptive.
  Power-seeking is likely if the model is directly maximizing rewards, but LLMs are not quite doing this.