|
[AN #132]: Complex and subtly incorrect arguments as an obstacle to debate
[AN #132]: Complex and subtly incorrect arguments as an obstacle to debate
Alignment Newsletter is a weekly publication with recent content relevant to AI alignment around the world
As always, thanks for everyone involved in the newsletter!
The Understanding Learned Reward Functions paper looks great, both in terms of studying inner alignment (the version with goal-directed/RL policies instead of mesa-optimizers) and for thinking about goal-directedness.