Vladimir_Nesov comments on Impact Measure Desiderata

Vladimir_Nesov 23 Sep 2018 18:49 UTC
LW: 2 AF: 1
AF
I was talking about what I understand the purpose/design of intent verification to be, not specifically the formalizations you described. (I don’t think it’s particularly useful to work out the details without a general plan or expectation of important technical surprises.)
- TurnTrout 23 Sep 2018 19:31 UTC
  LW: 1 AF: 1
  AF Parent
  If you decompose the creation of such an agent, some of those actions are wasted effort in the eyes of a pure u_A maximizer (“dont help me too much”). So, the logic goes, they really aren’t related to u_A, but rather to skirting the impact measure, and should therefore be penalized.
  - Vladimir_Nesov 23 Sep 2018 19:59 UTC
    LW: 2 AF: 1
    AF Parent
    It could as easily be “do this one slightly helpful thing”, an addition on top of doing nothing. It doesn’t seem like there is an essential distinction between such different framings of the same outcome that intent verification can capture.
    - TurnTrout 23 Sep 2018 20:58 UTC
      LW: 1 AF: 1
      AF Parent
      Whether these granular actions exist is also an open question I listed.
      
      I don’t see why some version of IV won’t be able to get past this, however. There seems to be a simple class of things the agent does to get around an impact measure that it wouldn’t do if it were just trying to pursue a goal to the maximum extent. It might be true that the things the agent does to get around it are also slightly helpful for the goal, but probably not as helpful as the most helpful action.
      - Vladimir_Nesov 23 Sep 2018 21:29 UTC
        LW: 2 AF: 1
        AF Parent
        I worry there might be leaks in logical time that let the agent choose an action that takes into account that an impactful action will be denied. For example, a sub-agent could be built so that it’s a maximizer that’s not constrained by an impact measure. The sub-agent then notices that to maximize its goal, it must constrain its impact, or else the main agent won’t be allowed to create it. And so it will so constrain its impact and will be allowed to be created, as a low-impact and maximally useful action of the main agent. It’s sort of a daemon, but with respect to impact measure and not goals, which additionally does respect the impact measure and only circumvents it once in order to get created.
        
        TurnTrout 24 Sep 2018 2:44 UTC
        LW: 1 AF: 1
        AF Parent
        That’s a really interesting point. I’d like to think about this more, but one preliminary intuition I have against this (and any general successor creation by AUP, really) being the best action is that making new agents aligned with your goals is instrumentally convergent. This could add a frictional cost so that the AUP agent would be better off just doing the job itself. Perhaps we could also stop this via an approval incentives, which might tip the scales enough?