This presents one way of implementing the indifference-based correlations of theseposts.
Let u be a utility function, a map from worlds to real numbers. An expected utility maximiser considering whether to produce output Y, looks at the expected utility
Σw u(w)P(w|Y).
We now assume that there is another random variable X in the world, and we want the AI to be indifferent to worlds where Y≠X. We also want it to be indifferent to worlds where Z=0. Then it will assess the value of output Y as:
Σw u(w)P(w|X=Y,Z=1,Y).
Now, the idea of the setup was to ensure that Z=1 would erase the output Y so that it was never read. Hence P(w|Z=1,Y) Hence that equation simplifies to:
Σw u(w)P(w|X=Y).
Therefore the AI will chose the Y that maximises the (conditional) expected utility of u if X=Y. To get the full version of the initial post, you need to define some function f of Y and modify this to
AI utility-based correlation
A putative new idea for AI control; index here.
This presents one way of implementing the indifference-based correlations of these posts.
Let u be a utility function, a map from worlds to real numbers. An expected utility maximiser considering whether to produce output Y, looks at the expected utility
Σw u(w)P(w|Y).
We now assume that there is another random variable X in the world, and we want the AI to be indifferent to worlds where Y≠X. We also want it to be indifferent to worlds where Z=0. Then it will assess the value of output Y as:
Σw u(w)P(w|X=Y,Z=1,Y).
Now, the idea of the setup was to ensure that Z=1 would erase the output Y so that it was never read. Hence P(w|Z=1,Y) Hence that equation simplifies to:
Σw u(w)P(w|X=Y).
Therefore the AI will chose the Y that maximises the (conditional) expected utility of u if X=Y. To get the full version of the initial post, you need to define some function f of Y and modify this to
Σw u(w)P(w|X=Y) + f(Y).