If pol(P) sucks, even if the AI is literally corrigible, we still won’t reach good outcomes.
If pol(P) sucks by default, a general AI (corrigible or otherwise) may be able to give us information I which: Makes Vp(pol(P)) much higher, by making pol(P) given I suck a whole lot less. Makes Vq(pol(Q)) a little lower, by making pol(Q) given I make concessions to allow pol(P) to perform better.
A non-obstructive AI can’t do that, since it’s required to maintain the AU for pol(Q).
A simple example is where P and Q currently look the same to us—so our pol(P) and pol(Q) have the same outcome [ETA for a long time at least, with potentially permanent AU consequences], which happens to be great for Vq(pol(Q)), but not so great for Vp(pol(P)).
In this situation, we want an AI that can tell us: ”You may actually want either P or Q here. Here’s an optimisation that works 99% as well for Q, and much better than your current approach for P. Since you don’t currently know which you want, this is much better than your current optimisation for Q: that only does 40% as well for P.”
A non-obstructive AI cannot give us that information if it predicts it would lower Vq(pol(Q)) in so doing—which it probably would.
Does non-obstruction rule out lowering Vq(pol(Q)) in this way? If not, I’ve misunderstood you somewhere. If so, that’s a problem.
I’m not sure I understand the distinction you’re making between a “conceptual frame”, and a “constraint under which...”.
[Non-obstruction with respect to a set S] must be a constraint of some kind. I’m simply saying that there are cases where it seems to rule out desirable behaviour—e.g. giving us information that allows us to trade a small potential AU penalty for a large potential AU gain, when we’re currently uncertain over which is our true payoff function.
Anyway, my brain is now dead. So I doubt I’ll be saying much intelligible before tomorrow (if the preceding even qualifies :)).
If pol(P) sucks by default, a general AI (corrigible or otherwise) may be able to give us information I which:
Makes Vp(pol(P)) much higher, by making pol(P) given I suck a whole lot less.
Makes Vq(pol(Q)) a little lower, by making pol(Q) given I make concessions to allow pol(P) to perform better.
A non-obstructive AI can’t do that, since it’s required to maintain the AU for pol(Q).
A simple example is where P and Q currently look the same to us—so our pol(P) and pol(Q) have the same outcome [ETA for a long time at least, with potentially permanent AU consequences], which happens to be great for Vq(pol(Q)), but not so great for Vp(pol(P)).
In this situation, we want an AI that can tell us:
”You may actually want either P or Q here. Here’s an optimisation that works 99% as well for Q, and much better than your current approach for P. Since you don’t currently know which you want, this is much better than your current optimisation for Q: that only does 40% as well for P.”
A non-obstructive AI cannot give us that information if it predicts it would lower Vq(pol(Q)) in so doing—which it probably would.
Does non-obstruction rule out lowering Vq(pol(Q)) in this way?
If not, I’ve misunderstood you somewhere.
If so, that’s a problem.
I’m not sure I understand the distinction you’re making between a “conceptual frame”, and a “constraint under which...”.
[Non-obstruction with respect to a set S] must be a constraint of some kind.
I’m simply saying that there are cases where it seems to rule out desirable behaviour—e.g. giving us information that allows us to trade a small potential AU penalty for a large potential AU gain, when we’re currently uncertain over which is our true payoff function.
Anyway, my brain is now dead. So I doubt I’ll be saying much intelligible before tomorrow (if the preceding even qualifies :)).