tailcalled comments on Contra “Strong Coherence”

tailcalled 1 Mar 2023 10:06 UTC
2 points
0
I don’t know whether I agree with “strong coherence” or not because it seems vaguely defined. Can you give three examples of arguments that assume strong coherence? (Because the definition becomes crisper when applied than when theorwtical.)
- DragonGod 1 Mar 2023 13:31 UTC
  5 points
  1
  Parent
  Examples of strong coherence: Assuming AI systems:
  - Are (expected) utility maximisers
  - Have (immutable) terminal goals
  I think even Wentworth’s Subagents is predicated on an assumption of stronger coherence than obtains in practice. I think humans aren’t actually fully modeled as a fixed committee of agents making pareto optimal decisions with respect to their fixed utility functions. The ways in which I think Wentworth’s Subagents falls short are:
  - Human preferences change over time
  - Human preferences don’t have fixed weights, but activate to different degrees in particular contexts
  And like I don’t think those particular features are necessarily just incoherencies. Malleable values are I think just a facet of generally intelligent systems in our universe.
  
  See also: nostalgebraist’s “why assume AGIs will optimize for fixed goals?”.
  And this comment.
  
  I do think Subagents could be adapted/extended to model human preferences adequately, but the inherent inconsistency and context dependence of said preferences makes me think the agent model may be somewhat misleading.
  
  E.g. Rob Bensinger suggested that agents may self modify to become more coherent over time.
  
  Arguments that condition on strong coherence:
  - Deceptive alignment in mesa-optimisers
  - General failure modes from utility maximising
  - tailcalled 1 Mar 2023 19:18 UTC
    2 points
    0
    Parent
    The ways in which I think Wentworth’s Subagents falls short are:
    Human preferences change over time
    Human preferences don’t have fixed weights, but activate to different degrees in particular contexts
    What are your preferred examples of this?
    Arguments that condition on strong coherence:
    Deceptive alignment in mesa-optimisers
    General failure modes from utility maximising
    Can you link your preferred examples of this?
    - DragonGod 1 Mar 2023 21:39 UTC
      1 point
      −1
      Parent
      Meta note, that I dislike this style of engagement.
      It feels like a lot of effort to reply to for me, with little additional value being provided by my reply/from the conversation.
      - tailcalled 1 Mar 2023 22:02 UTC
        3 points
        2
        Parent
        Wait, why is it a lot of effort to reply? I’d have expected it to just involve considering the factors that made you endorse these memes and then pick one of the factors to dump as an example?
        As for value, I think examples are valuable because they help grounding things and make it easier to provide alternate perspectives on things.
        DragonGod 1 Mar 2023 22:07 UTC
        1 point
        0
        Parent
        Replies are effortful in general, and it feels like I’m not really making much progress in the conversation for the effort I’m putting in.
        tailcalled 1 Mar 2023 22:15 UTC
        6 points
        1
        Parent
        🤷 Up to you. You were the one who tagged me in here. I can just ignore your post if you don’t feel like giving examples of what you are talking about.
        But I will continue to ask for examples if you tag me in the future because I think examples are extremely valueable.
        DragonGod 1 Mar 2023 22:16 UTC
        2 points
        0
        Parent
        Fair enough.
        
        I’ll try to come back to it later.