I really like this post. I’m very excited about understanding more about this as I said in my mechanistic corrigibility post (which as you mention is very related to the full/partial agency distinction).
we can kind of expect any type of learning to be myopic to some extent
I’m pretty uncertain about this. Certainly to the extent that full agency is impossible (due to computational/informational constraints, for example), I agree with this. But I think a critical point which is missing here is that full agency can still exhibit pseudo-myopic behavior (and thus get selected for) if using an objective that is discounted over time or if deceptive. Thus, I don’t think that having some sort of soft episode boundary is enough to rule out full-ish agency.
Furthermore, it seems to me like it’s quite plausible that for many learning setups models implementing algorithms closer to full agency will be simpler than models implementing algorithms closer to partial agency. As you note, partial agency is a pretty weird thing to do from a mathematical standpoint, so it seems like many learning processes might penalize it pretty heavily for that. At the very least, if you count Solomonoff Induction as a learning process, it seems like you should probably expect something a lot closer to full agency there.
That being said, I definitely agree that the fact that epistemic learning seems to just do this by default seems pretty promising for figuring out how to get myopia, so I’m definitely pretty excited about that.
RL tends to require temporal discounting—this also creates a soft episode boundary, because things far enough in the future matter so little that they can be thought of as “a different episode”.
This is just a side note, but RL also tends to have hard episode boundaries if you are regularly resetting the state of the environment as is common in many RL setups.
Thanks, I appreciate your enthusiasm! I’m still not sure how much sense all of this makes.
I agree with your simplicity point, but it may be possible to ignore this by talking about what’s learned in the limit. If strategic manipulation is disincentivized, then strategic manipulators will eventually lose. We might still expect strategic manipulators in practice, because they might be significantly simpler. But a theory of partial agency can examine the limiting behavior separately from the prevalence of manipulators in the prior.
I really like this post. I’m very excited about understanding more about this as I said in my mechanistic corrigibility post (which as you mention is very related to the full/partial agency distinction).
I’m pretty uncertain about this. Certainly to the extent that full agency is impossible (due to computational/informational constraints, for example), I agree with this. But I think a critical point which is missing here is that full agency can still exhibit pseudo-myopic behavior (and thus get selected for) if using an objective that is discounted over time or if deceptive. Thus, I don’t think that having some sort of soft episode boundary is enough to rule out full-ish agency.
Furthermore, it seems to me like it’s quite plausible that for many learning setups models implementing algorithms closer to full agency will be simpler than models implementing algorithms closer to partial agency. As you note, partial agency is a pretty weird thing to do from a mathematical standpoint, so it seems like many learning processes might penalize it pretty heavily for that. At the very least, if you count Solomonoff Induction as a learning process, it seems like you should probably expect something a lot closer to full agency there.
That being said, I definitely agree that the fact that epistemic learning seems to just do this by default seems pretty promising for figuring out how to get myopia, so I’m definitely pretty excited about that.
This is just a side note, but RL also tends to have hard episode boundaries if you are regularly resetting the state of the environment as is common in many RL setups.
Thanks, I appreciate your enthusiasm! I’m still not sure how much sense all of this makes.
I agree with your simplicity point, but it may be possible to ignore this by talking about what’s learned in the limit. If strategic manipulation is disincentivized, then strategic manipulators will eventually lose. We might still expect strategic manipulators in practice, because they might be significantly simpler. But a theory of partial agency can examine the limiting behavior separately from the prevalence of manipulators in the prior.
I agree with your other points.