While small errors in reward specification can incentivize catastrophic outcomes, small errors in approval feedback are unlikely to incentivize catastrophic outcomes.
I think this is a really important point, thanks.
Objection 3: There’s no difference between approval feedback and myopic feedback, since perfect approval feedback can be turned into perfect reward feedback. So you might as well use the perfect reward feedback, since this is more competitive.
Did you mean “There’s no difference between approval feedback and reward feedback”?
I think this is a really important point, thanks.
Did you mean “There’s no difference between approval feedback and reward feedback”?
Yes, fixed, thanks.