ESRogs comments on Towards formalizing universality

ESRogs 16 Jan 2019 21:27 UTC
4 points
Simple caricatured examples:
C might propose a design for a computer that has a backdoor that an attacker can use to take over the computer. But if this backdoor will actually be effective, then A[C] will know about it.
C might propose a design that exploits a predictable flaw in A’s reasoning (e.g. overlooking consequences of a certain kind, being overly optimistic about some kinds of activities, incorrectly equating two importantly different quantities...). But then A[C] will know about it, and so if A[C] actually reasons in that way then (in some sense) it is endorsed.
These remind me of Eliezer’s notions of Epistemic and instrumental efficiency, where the first example (about the backdoor) would roughly correspond to A[C] being instrumentally efficient relative to C, and the second example (about potential bias) would correspond to A[C] being epistemically efficient relative to C.