As always, thanks for everyone involved in the newsletter!
The Understanding Learned Reward Functions paperlooks great, both in terms of studying inner alignment (the version with goal-directed/RL policies instead of mesa-optimizers) and for thinking about goal-directedness.
As always, thanks for everyone involved in the newsletter!
The Understanding Learned Reward Functions paper looks great, both in terms of studying inner alignment (the version with goal-directed/RL policies instead of mesa-optimizers) and for thinking about goal-directedness.