michaelcohen comments on Towards a New Impact Measure

michaelcohen 13 Apr 2019 9:46 UTC
LW: 1 AF: 1
0
AF
4) this is why we want to slowly increment N. This should work whether it’s a human policy or a meaningless string of text. The reason for this is that even if the meaningless string is very low impact, eventually N gets large enough to let the agent do useful things; conversely, if the human policy is more aggressive, we stop incrementing sooner and avoid giving too much leeway.
Let’s say for concreteness that it’s a human policy that is used for $a_{u n i t}$ , if you think it works either way. I think that most human actions are moderately low impact, and some are extremely high impact. No matter what N is, then, if the impact of $a_{u n i t}$ is leaping to very large values infinitely often, then infinitely often there will effectively be no impact regularization, no matter what N is. No setting for N fixes this; if N were small enough to preclude even actions that are less impactful than $a_{u n i t}$ , then agent can’t ever act usefully, and if N permits actions as impactful as $a_{u n i t}$ , then when $a_{u n i t}$ has very large impact (which I contend happens infinitely often for any assignment of $a_{u n i t}$ that permits any useful action ever), then dangerously high impact actions will be allowed.
- TurnTrout 13 Apr 2019 17:36 UTC
  LW: 2 AF: 1
  0
  AF Parent
  I think there’s some variance, but not as much as you have in mind. Even if there were a very large value, however, this isn’t how N-incrementation works (in the post – if you’re thinking of the paper, then yes, the version I presented there doesn’t bound lifetime returns and therefore doesn’t get the same desirable properties as in the post). If you’ll forgive my postponing this discussion, I’d be interested in hearing your thoughts after I post a more in-depth exploration of the phenomenon?
  - michaelcohen 14 Apr 2019 1:30 UTC
    LW: 1 AF: 1
    0
    AF Parent
    Sure thing.