but all rewards are subject to Goodhart’s Law. I expect to see people doing a lot of ill-thought-out somethings because the reward structure is too simplified.
Well, since the reward structure isn’t explicit, and we expect McGonagoll to get much smarter on a much smaller timescale than opportunities to earn a reward by “disobeying McGonagoll according to your own judgement.”
Well, since the reward structure isn’t explicit, and we expect McGonagoll to get much smarter on a much smaller timescale than opportunities to earn a reward by “disobeying McGonagoll according to your own judgement.”