rotatingpaguro answers Is “Strong Coherence” Anti-Natural?

rotatingpaguro 11 Apr 2023 10:24 UTC
3 points
−2
EDIT: I found out my answer is quite similar to this other one you probably read already.

I think not.

Imagine such a malleable agent’s mind as made of parts. Each part of the mind does something. There’s some arrangement of the things each part does, and how many parts do each kind of thing. We won’t ask right now where this organization comes from, but take it for given.

Imagine that—be it by chance or design—some parts were cooperating, while some were not. “Cooperation” means making actions that bring about a consequence in a somewhat stable way, so something towards being coherent and consequentialist, although not perfectly so by any measure. The other parts would oftentimes work at cross purposes, treading on each other toes. “Working at cross purposes”, again, in other words means not being consequentialist and coherent; from the point of view of the parts, there may not even be a notion of “cross purposes” if there is no purpose.

By the nature of coherence, the ensemble of coherent and aligned parts would get to their purpose much more efficiently than the other parts are not-getting to that purpose and being a hindrance, assuming the purpose was reachable enough. This means that coherent agents are not just reflectively consistent, but also stable: once there’s some seed of coherence, it can win other the non-coherent parts.

Conclusion 1: Intelligent systems in the real world do not converge towards strong coherence

It seems to me that humans are more coherent and consequentialist than other animals. Humans are not perfectly coherent, but the direction is towards more coherence. Actually, I’d expect that any sufficiently sophisticated bounded agent would not introspectively look coherent to itself if it spent enough time to think about it. Would the trend break after us?

Would you take a pill that would make you an expected utility maximiser?

Would you take a pill that made you a bit less coherent? Would you take a pill that made you a bit more coherent? (Not rhetorical questions.)
- DragonGod 11 Apr 2023 12:52 UTC
  3 points
  −1
  Parent
  By the nature of coherence, the ensemble of coherent and aligned parts would get to their purpose much more efficiently than the other parts are not-getting to that purpose and being a hindrance, assuming the purpose was reachable enough. This means that coherent agents are not just reflectively consistent, but also stable: once there’s some seed of coherence, it can win other the non-coherent parts.
  
  I think this fails to adequately engage with the hypothesis that values are inherently contextual.
  
  Alternatively, the kind of cooperation you describe where a subset of values consistently optimise the system’s outputs in a consequentialist manner towards a fixed terminal goal is highly unrealistic for nontrivial terminal goals.
  
  Shards “cooperating” manifest in a qualitatively different manner.
  
  More generally a problem with aggregate coherence hypotheses is that a core claim of shard theory is that the different shards are weighted differently in different contexts.
  
  In general shards activate more strongly in particular contexts, less strongly in others.
  
  So there is no fixed weight assigned to the shards, even when just looking at the subset of shards that cooperate with each other.
  
  As such, I don’t think the behaviour of learning agents within the shard ontology can be well aggregated into a single fixed utility function over agent states.
  
  Not even in any sort of limit of reflection or enhancement, because values within the shard ontology are inherently contextual.
  
  It seems to me that humans are more coherent and consequentialist than other animals. Humans are not perfectly coherent, but the direction is towards more coherence.
  
  Motivate this claim please.
  
  Would you take a pill that made you a bit less coherent? Would you take a pill that made you a bit more coherent? (Not rhetorical questions.)
  
  Nope in both cases. I’d take pills to edit particular values^[1] but wouldn’t directly edit my coherence in an unqualified fashion.
  ↩︎
  I’m way too horny, and it’s honestly pretty maladaptive and inhibits my ability to execute on values I reflectively endorse more.
  - rotatingpaguro 11 Apr 2023 15:18 UTC
    1 point
    0
    Parent
    
    Alternatively, the kind of cooperation you describe where a subset of values consistently optimise the system’s outputs in a consequentialist manner towards a fixed terminal goal is highly unrealistic for nontrivial terminal goals.
    
    I agree it’s unrealistic in some sense. That’s why I qualified “assuming the purpose was reachable enough”. In this “evolutionary” interpretation of coherence, there’s a compromise between attainability of the goal and the cooperation needed to achieve it. Some goals are easier. So in my framework, where I consider humans the pinnacle of known coherence, I do not consider as valid saying that a rock is more coherent because it is very good at just being a rock. About realism, I consider humans very unlikely a priori (we seem to be alone), but once there are humans around, the important low probability thing already happened.
    
    As such, I don’t think the behaviour of learning agents within the shard ontology can be well aggregated into a single fixed utility function over agent states.
    
    In this part of your answer, I am not sure whether you are saying “emerging coherence is forbidden in shard theory” or “I think emerging coherence is false in the real world”.
    
    Answering to “emerging coherence is forbidden”: I’m not sure because I don’t know shard theory beyond what you are saying here, but: “values are inherently contextual” does not mean your system is not flexible enough to allow implementing coherent values within it, even if they do not correspond to the things you labeled “values” when defining the system. It can be unlikely, which leads back to the previous item, which leads back to the disagreement about humans being coherent.
    
    Answering to “I think emerging coherence is false in the real world”: this leads back again to to the disagreement about humans being coherent.
    
    It seems to me that humans are more coherent and consequentialist than other animals. Humans are not perfectly coherent, but the direction is towards more coherence.
    
    Motivate this claim please.
    
    The crux! I said that purely out of intuition. I find this difficult to argue because, for any specific example I think of where I say “humans are more coherent and consequentialist than the cat here”, I imagine you replying “No, humans are more intelligent than the cat, and so can deploy more effective strategies for their goals, but these goals and strategies are still all sharded, maybe even more than in the cat”. Maybe the best argument I can make is: it seems to me humans have more of a conscious outer loop than other animals, with more power over the shards, and the additional consequentiality and coherence (weighted by task difficulty) are mostly due to this outer loop, not to a collection of more capable shards. But this is not a precise empirical argument.
    
    Nope in both cases. I’d take pills to edit particular values[1] but wouldn’t directly edit my coherence in an unqualified fashion.
    
    I think you answered the question “would you take a pill, where the only thing you know about the pill, is that it will “change your coherence” without other qualifications, and without even knowing precisely what “coherence” is?” Instead I meant to ask “how would the coherence-changing side effects of a pill you wanted to take for some other reason influence your decision”. It seems to me your note about why you would take a dehornying pill points in the direction of making you more coherent. The next question would then be “of all the value-changing pills you can imagine yourself taking, how many increase coherence, and how many decrease it?”, and the next “where does the random walk in pill space bring you?”
    - anonymousaisafety 11 Apr 2023 19:48 UTC
      5 points
      1
      Parent
      It seems to me that humans are more coherent and consequentialist than other animals. Humans are not perfectly coherent, but the direction is towards more coherence.
      This isn’t a universally held view. Someone wrote a fairly compelling argument against it here: https://sohl-dickstein.github.io/2023/03/09/coherence.html
      - rotatingpaguro 12 Apr 2023 14:21 UTC
        1 point
        0
        Parent
        For context: the linked post exposes a well-designed survey of experts about the intelligence and coherence of various entities. The answers show a clear coherence-intelligence anti-correlation. The questions they ask the experts are:
        
        Intelligence:
        
        “How intelligent is this entity? (This question is about capability. It is explicitly not about competence. To the extent possible do not consider how effective the entity is at utilizing its intelligence.)”
        
        Coherence:
        
        “This is one question, but I’m going to phrase it a few different ways, in the hopes it reduces ambiguity in what I’m trying to ask: How well can the entity’s behavior be explained as trying to optimize a single fixed utility function? How well aligned is the entity’s behavior with a coherent and self-consistent set of goals? To what degree is the entity not a hot mess of self-undermining behavior? (for machine learning models, consider the behavior of the model on downstream tasks, not when the model is being trained)”
        
        Of course there’s the problem of what are peoples’ judgements of “coherence” measuring. In considering possible ways of making the definition more clear, the post says:
        
        For machine learning models within a single domain, we could use robustness of performance to small changes in task specification, training random seed, or other aspects of the problem specification. For living things (including humans) and organizations, we could first identify limiting resources for their life cycle. For living things these might be things like time, food, sunlight, water, or fixed nitrogen. For organizations, they could be headcount, money, or time. We could then estimate the fraction of that limiting resource expended on activities not directly linked to survival+reproduction, or to an organization’s mission. This fraction is a measure of incoherence.
        
        It seems to me the kind of measure proposed for machine learning systems is at odds with the one for living beings. For ML, it’s “robustness to environmental changes”. For animals, it’s “spending all resources on survival”. For organizations, “spending all resources on the stated mission”. By the for-ML definition, humans, I’d say, win: they are the best entity at adapting, whatever their goal. By the for-animals definition, humans would lose completely. So these are strongly inconsistent definitions. I think the problem is fixing the goal a priori: you don’t get to ask “what is the entity pursuing, actually?”, but proclaim “the entity is pursuing survival and reproduction”, “the organization is pursuing what it says on paper”. Even though they are only speculative definitions, not used in the survey, I think they are evidence of confusion in the mind of who wrote them, and potentially in the survey respondents (alternative hypothesis: sloppiness, “survival+reproduction” was intended for most animals but not humans).
        
        So, what did the experts read in the question?
        
        “How well can the entity’s behavior be explained as trying to optimize a single fixed utility function? How well aligned is the entity’s behavior with a coherent and self-consistent set of goals? To what degree is the entity not a hot mess of self-undermining behavior?”
        
        Take two entities at opposite ends in the figure: the “single ant” (judged most coherent) and a human (judged least coherent).
        
        ..............
        
        SINGLE ANT vs. HUMAN
        
        How well can your behavior be explained as trying to optimize a single fixed utility function?
        
        ANT: A great heap, sir! I have a simple and clear utility function! Feed my mother the queen!
        
        HUMAN: Wait, wait, wait. I bet you would stop feeding your queen as soon as I put you somewhere else. It’s not utility, it’s just learned patterns of behavior.
        
        ANT: Ohi, that’s not valid sir! That’s cheating! You can do that just because you are more intelligent and powerful. An what would be your utility function, dare I ask?
        
        HUMAN: Well, uhm, I value many things. Happiness, but sometimes also going through adversity; love; good food… I don’t know how to state my utility function. I just know that I happen to want things, and when I do, you sure can describe me as actually trying to get them, not just “doing the usual, and, you know, stuff happens”.
        
        ANT: You are again conflating coherence with power! Truth is, many things make you powerless, like many things make me! You are big in front of me, but small in front of the universe! If I had more power, I’d be very, very good at feeding the queen!
        
        HUMAN: As I see it, it’s you who’s conflating coherence with complexity. I’m complex, and I also happen to have a complex utility. If I set myself to a goal, I can do it even if it’s “against my nature”. I’m retargetable. I can be compactly described as goals separate from capabilities. If you magically became stronger and more intelligent, I bet you would be very, very bent on making tracks, duper gung-oh on touching other ants with your antennas in weird patterns you like, and so on. You would not get creative about it. Your supposed “utility” would shatter.
        
        ANT: So you said yourself that if I became as intelligent as you, I’d shatter my utility, and so appear less coherent, like you are! Checkmate human!
        
        HUMAN: Aaargh, no, you are looking at it all wrong. You would not be like me. I can recognize in myself all the patterns of shattered goals, all my shards, but I can also see beyond that. I can transcend. You, unevolved ant, magically scaled in some not well defined brute-force just-zooming sense, would be left with nothing in your ming but the small-ant shards, and insist on them.
        
        ANT: What’s with that “not well defined etc.” nonsense? You don’t actually know! For all you know about how this works, scaling my mind could make me get bent on feeding the queen, not just “amplify” my current behaviors!
        
        HUMAN: And conceding that possibility, would you not be more coherent then?
        
        ANT: No way! I would be as coherent as now, just more intelligent!
        
        HUMAN: Whatevs.
        
        How well aligned is your behavior with a coherent and self-consistent set of goals?
        
        ANT: I’m super-self-consistent! I don’t care about anything but queen-feeding! I’ll happily sacrifice myself to that end! Actually, I’d not even let myself die happily, I’d die caring-for-the-queen-ly!
        
        HUMAN: Uff, I bet my position will be misunderstood again, but anyway: I don’t know how to compactly specify my goals, I internally perceive my value as many separate pieces, so I can’t say I’m consistent in my value-seeking with a straight face. However, I’m positive that I can decide to suppress any of my value-pieces to get more whole-value, even suppress all of my value-pieces at once. This proves there’s a single consistent something I value. I just don’t know how to summarize or communicate what it is.
        
        ANT: “That” “proves” you “value” the heck what? That proves you don’t just have many inconsistent goals, you even come equipped with inconsistent meta-goals!
        
        HUMAN: To know what that proves, you have to look at my behavior, and my success at achieving goals I set myself to. In the few cases where I make a public precommitment, you have nice clear evidence I can ignore a lot of immediate desires for something else. That’s evidence for my mind-system doing that overall, even if I can’t specify a single, unique goal for everything I ever do at once.
        
        ANT: If your “proof” works, then it works for me too! I surely try to avoid dying in general, yet I’ll die for the queen! Very inconsistent subgoals, very clear global goal! You’re at net disadvantage because you can not specify your goal, ant-human 2-1!
        
        HUMAN: This is an artefact of you not being an actual ant but a rhetorical “ANT” implemented by a human. You are even more simple than a real ant, yet contained in something much larger and self-reflective. As a real ant, I expect you would have both a more complicated global goal that what appears by saying “feed the queen”, and that you would not be able to self-reflect on the totality of it.
        
        ANT: Sophistry! You are still recognizing the greater simplicity of the real-me goal, which makes me more consistent!
        
        HUMAN: We always come to that. I’m more complex, not less consistent.
        
        To what degree are you not a hot mess of self-undermining behavior?
        
        ANT: No cycles wasted, a single track, a single anthill, a single queen, that’s your favorite ant’s jingle!
        
        HUMAN: Funny but no. Your inter-ants communications are totally inefficient. You waste tons of time wandering almost randomly, touching the other ants here and there, to get the emergent swarm behavior. I expect nanotechnology in principle could make you able to communicate via radio. We humans invented tech to make inter-humans communications efficient to pursue our goals, you can’t, your behaviors are undermining.
        
        ANT: All my allowed behaviors are not undermining! My mind is perfect, my body is flawed! Your mind undermines itself from the inside!
        
        HUMAN: The question says “behaviors”, which I’d interpret as outward actions, but let’s concede the interpretation as internal behaviors of the mind. I know it’s speculative, but again, I expect real-ant to have a less clean mind-state than you make it appear, in proportion to its behavioral complexity.
        
        ANT: No comment, apart from underlining “speculative”! Since you admitted “suppressing your goals” before, isn’t that “undermining” at it fullest?
        
        HUMAN: You said that of yourself too.
        
        ANT: But you seemed to imply you have a lot more of these goals-to-suppress!
        
        HUMAN: Again: my values are more complex, and your simplicity is in part an artefact.
        
        ............
        
        The cruxes I see in the ant-human comparison are:
        
        we reflect on ourselves, while we do not perceive the ant as ants;
        
        our value is more complex, and our intelligence allows us to do more complicated things to get it.
        
        I think the experts mostly saw “behavioral simplicity” and “simply stated goals” into the question, but not the “adaptability in pursuing whatever it’s doing” proposed later for ML systems. I’d argue instead that something being a “goal” instead of a “behavior” is captured by there being many different paths taking to it, and coherence is about preferring things in some order and so modifying your behavior to that end, rather than having a prefixed simple plan.
        
        I can’t see how to clearly disentangle complexity, coherence, intelligence. Right now I’m confused enough that I would not even know what to think if someone from the future told me “yup, science confirms humans are definitely more/less coherent than ants”.
        
        I don’t understand what is the “discount factor” to apply when deciding how coherent is a more complex entity.
        
        … an entity with more complex values.
        
        … an entity with more available actions.
        
        … an entity that makes more complicated plans.
        
        What would be the implication of this complexity-discounted coherence notion, anyway? Do I want some “raw” coherence measure instead to understand what an entity does?