The point I wanted to make was about Extrapolated volition as a strategy to avoid Goodhart’s law issues. If you extrapolate the volition of a person towards the “person he/she wants to be” and put a resulting goal as G*, it will be pretty much close to G as can be. I presented CEV as an example, since the audience is more familiar with it.
And FAWS, your definition of G and G* in the friendliness scenario is perfect. I’ve nothing more to add there.
Ah, sorry. I’ve read the post as saying something different from what it actually says.
Good discussion.
The point I wanted to make was about Extrapolated volition as a strategy to avoid Goodhart’s law issues. If you extrapolate the volition of a person towards the “person he/she wants to be” and put a resulting goal as G*, it will be pretty much close to G as can be. I presented CEV as an example, since the audience is more familiar with it.
And FAWS, your definition of G and G* in the friendliness scenario is perfect. I’ve nothing more to add there.