4) this is why we want to slowly increment N. This should work whether it’s a human policy or a meaningless string of text. The reason for this is that even if the meaningless string is very low impact, eventually N gets large enough to let the agent do useful things; conversely, if the human policy is more aggressive, we stop incrementing sooner and avoid giving too much leeway.
Let’s say for concreteness that it’s a human policy that is used for aunit, if you think it works either way. I think that most human actions are moderately low impact, and some are extremely high impact. No matter what N is, then, if the impact of aunit is leaping to very large values infinitely often, then infinitely often there will effectively be no impact regularization, no matter what N is. No setting for N fixes this; if N were small enough to preclude even actions that are less impactful than aunit, then agent can’t ever act usefully, and if N permits actions as impactful as aunit, then when aunit has very large impact (which I contend happens infinitely often for any assignment of aunit that permits any useful action ever), then dangerously high impact actions will be allowed.
I think there’s some variance, but not as much as you have in mind. Even if there were a very large value, however, this isn’t how N-incrementation works (in the post – if you’re thinking of the paper, then yes, the version I presented there doesn’t bound lifetime returns and therefore doesn’t get the same desirable properties as in the post). If you’ll forgive my postponing this discussion, I’d be interested in hearing your thoughts after I post a more in-depth exploration of the phenomenon?
Let’s say for concreteness that it’s a human policy that is used for aunit, if you think it works either way. I think that most human actions are moderately low impact, and some are extremely high impact. No matter what N is, then, if the impact of aunit is leaping to very large values infinitely often, then infinitely often there will effectively be no impact regularization, no matter what N is. No setting for N fixes this; if N were small enough to preclude even actions that are less impactful than aunit, then agent can’t ever act usefully, and if N permits actions as impactful as aunit, then when aunit has very large impact (which I contend happens infinitely often for any assignment of aunit that permits any useful action ever), then dangerously high impact actions will be allowed.
I think there’s some variance, but not as much as you have in mind. Even if there were a very large value, however, this isn’t how N-incrementation works (in the post – if you’re thinking of the paper, then yes, the version I presented there doesn’t bound lifetime returns and therefore doesn’t get the same desirable properties as in the post). If you’ll forgive my postponing this discussion, I’d be interested in hearing your thoughts after I post a more in-depth exploration of the phenomenon?
Sure thing.