J Bostock answers When is reward ever the optimization target?

J Bostock 16 Oct 2024 12:41 UTC
6 points
2
“Optimization target” is itself a concept which needs deconfusing/operationalizing. For a certain definition of optimization and impact, I’ve found that the optimization is mostly correlated with reward, but that the learned policy will typically have more impact on the world/optimize the world more than is strictly necessary to achieve a given amount of reward.

This uses an empirical metric of impact/optimization which may or may not correlate well with algorithm-level measures of optimization targets.

https://www.alignmentforum.org/posts/qEwCitrgberdjjtuW/measuring-learned-optimization-in-small-transformer-models