Wei Dai comments on Hackable Rewards as a Safety Valve?

Wei Dai 12 Sep 2019 7:12 UTC
LW: 4 AF: 3
AF

I was mostly noting that I hadn’t thought of this, hadn’t seen it mentioned

There was some related discussion back in 2012 but of course you can be excused for not knowing about that. :) (The part about “AIXI would fail due to incorrect decision theory” is in part talking about reward-maximizing agent doing reward hacking.)