I think you can put this scheme on a nicer foundation by talking about strategies rather than actions, and by letting the AI have some probability distribution over W0.
Then you just use the strategy that maximizes P(W0=u)⋅P(u|W0=u,do(strategy))+P(W0=v)⋅P(v|W0=v,do(strategy)). You can also think of this as doing a simplification of the expected utility calculation that bakes in the assumption that the AI can’t change W0.
You can then reintroduce the action ∅ with the observation that the AI will also be well-behaved if it maximizes P(W1=u|do(∅))⋅P(u|W1=u,do(strategy))+P(W1=v|do(∅))⋅P(v|W1=v,do(strategy)).
In this example, it’s clear that W0 is a special node. However, the AI only deduced that because, under ∅, W0 determines W. It’s perfectly plausible that under action b, say, Hum instead determines it. Under ORu and ORv, none of those nodes have any impact.
Therefore we need ∅ to be a special strategy, as it allows us to identify what nodes connect with W. The advantage of this method is that it lets the AI find the causal graph and compute the dependencies.
Ah, I see what you mean. I wonder what kind of real-world issues crop up if you identify ∅ with no output along the output channel.
Another way to approach identifying W0 is with observational data collected by humans. Or (hopefully) with some stronger and more semantically-rich method that tries to separate the causes we really want to preserve from the causes that we regard as interference.
I think you can put this scheme on a nicer foundation by talking about strategies rather than actions, and by letting the AI have some probability distribution over W0.
Then you just use the strategy that maximizes P(W0=u)⋅P(u|W0=u,do(strategy))+P(W0=v)⋅P(v|W0=v,do(strategy)). You can also think of this as doing a simplification of the expected utility calculation that bakes in the assumption that the AI can’t change W0.
You can then reintroduce the action ∅ with the observation that the AI will also be well-behaved if it maximizes P(W1=u|do(∅))⋅P(u|W1=u,do(strategy))+P(W1=v|do(∅))⋅P(v|W1=v,do(strategy)).
In this example, it’s clear that W0 is a special node. However, the AI only deduced that because, under ∅, W0 determines W. It’s perfectly plausible that under action b, say, Hum instead determines it. Under ORu and ORv, none of those nodes have any impact.
Therefore we need ∅ to be a special strategy, as it allows us to identify what nodes connect with W. The advantage of this method is that it lets the AI find the causal graph and compute the dependencies.
Agree strategies are better than actions.
Ah, I see what you mean. I wonder what kind of real-world issues crop up if you identify ∅ with no output along the output channel.
Another way to approach identifying W0 is with observational data collected by humans. Or (hopefully) with some stronger and more semantically-rich method that tries to separate the causes we really want to preserve from the causes that we regard as interference.
I generally think of ∅ as the “turn yourself off and do nothing” strategy.