TurnTrout comments on Attainable Utility Preservation: Scaling to Superhuman

TurnTrout 18 Mar 2020 18:33 UTC
LW: 3 AF: 1
AF

I think this is probably going to do something quite different from the conceptual version of AUP, because impact (as defined in this sequence) occurs only when the agent’s beliefs change, which doesn’t happen for optimal agents in deterministic environments. The current implementation of AUP tries to get around this using proxies for power (but these can be gamed) or by defining “dumber” beliefs that power is measured relative to (but this fails to leverage the AI system’s understanding of the world).

For the benefit of future readers, I replied to this in the newsletter’s comments.