Rohin Shah comments on AI safety via market making

Rohin Shah 9 Jul 2020 21:01 UTC
LW: 4 AF: 3
AF
Oh, another worry: there may not be a stable equilibrium to converge to—every time $M$ approximates the final result well, $A d v$ may be incentivized to switch to making different arguments to make $M$ ’s predictions wrong. (Or rather, maybe the stable equilibrium has to be a mixture over policies for this reason, and so you only get the true answer with some probability.)