Ilio comments on FAQ: What the heck is goal agnosticism?

Ilio 11 Oct 2023 3:42 UTC
3 points
0

the process itself, being an optimizer over world states, is not goal agnostic either.

That’s the crux I think: I don’t get why you reject (programmable) learning processes as goal agnostic.

you must be unable to describe me as having unconditional preferences over world states for me to be goal agnostic.

Let’s say I clone you_genes a few billions time, each time twisting your environment and education until I’m statistically happy with the recipe. What unconditional preferences would you expect to remain?

Let’a say you_adult are actually a digital brain in some matrix, with an unpleasant boss who stop and randomly restart your emulation each time your preference get over his. Could that process make you_matrix goal agnostic?
- porby 13 Oct 2023 15:12 UTC
  5 points
  0
  Parent
  That’s the crux I think: I don’t get why you reject (programmable) learning processes as goal agnostic.
  It’s important to draw a box around the specific agent under consideration. Suppose I train a model with predictive loss such that the model is goal agnostic. Three things can be simultaneously true:
  1. Viewed in isolation, the optimizer responsible for training the model isn’t goal agnostic because it can be described as having preferences over external world state (the model).
  2. The model is goal agnostic because it meets the stated requirements (and is asserted by the hypothetical).
  3. A simulacrum arising from sequences predicted by that goal agnostic predictor when conditioned to predict non-goal agnostic behavior is not goal agnostic.
  Let’s say I clone you_genes a few billions time, each time twisting your environment and education until I’m statistically happy with the recipe. What unconditional preferences would you expect to remain?
  The resulting person would still be human, and presumably not goal agnostic as a result. A simulacrum produced by an ideal goal agnostic predictor that is conditioned to reproduce the behavior of that human would also not be goal agnostic.
  The fact that that those preferences arose conditionally based on your selection process isn’t relevant to whether the person is goal agnostic. The relevant kind of conditionality is within the agent under consideration.
  Let’a say you_adult are actually a digital brain in some matrix, with an unpleasant boss who stop and randomly restart your emulation each time your preference get over his. Could that process make you_matrix goal agnostic?
  No; “I” still have preferences over world states. They’re just being overridden.
  Bumping up a level and drawing the box around the unpleasant boss and myself combined, still no, because the system expresses my preferences filtered by my boss’s preferences.
  Some behavior being conditional isn’t sufficient for goal agnosticism; there must be no way to describe the agent under consideration as having unconditional preferences over external world states.
  - Ilio 13 Oct 2023 16:45 UTC
    5 points
    0
    Parent
    
    Viewed in isolation, the optimizer responsible for training the model isn’t goal agnostic because it can be described as having preferences over external world state (the model).
    
    This is where I am lost. In this scenario, it seems that we could describe both the model and the optimizer as either having an unconditional preference for goal agnosticism, or both as having preferences over the state of external words(to include goal agnostic models). I don’t understand what axiom or reasoning leads to treating these two things differently.
    
    The resulting person would still be human, and presumably not goal agnostic as a result. (…) No; “I” still have preferences over world states. They’re just being overridden.
    
    My bad I did not clarify that upfront, but I was specifically thinking of selecting/overidding for goal agnosticism. From your answers, I understand that you treat goal agnostic agent as an oxymoron, correct?
    - porby 16 Oct 2023 17:26 UTC
      7 points
      0
      Parent
      it seems that we could describe both the model and the optimizer as either having an unconditional preference for goal agnosticism, or both as having preferences over the state of external words(to include goal agnostic models). I don’t understand what axiom or reasoning leads to treating these two things differently.
      The difference is subtle but important, in the same way that an agent that “performs bayesian inference” is different from an agent that “wants to perform bayesian inference.”
      A goal agnostic model does not want to be goal agnostic, it just is. If the model is describable as wanting to be goal agnostic, in terms of a utility function, it is not goal agnostic.
      The observable difference between the two is the presence of instrumental behavior towards whatever goals it has. A model that “wants to perform bayesian inference” might, say, maximize the amount of inference it can do, which (in the pathological limit) eats the universe.
      A model that wants to be goal agnostic has fewer paths to absurd outcomes since self-modifying to be goal agnostic is a more local process that doesn’t require eating the universe and it may have other values that suggest eating the universe is bad, but it’s still not immediately goal agnostic.
      From your answers, I understand that you treat goal agnostic agent as an oxymoron, correct?
      Agent doesn’t have a constant definition across all contexts, but it can be valid to describe a goal agnostic system as a rational agent in the VNM sense. Taking the “ideal predictor” as an example, it has a utility function that it maximizes. In the limit, it very likely represents a strong optimizing process. It just so happens that the goal agnostic utility function does not directly imply maximization with respect to external world states, and does not take instrumental actions that route through external world states (unless the system is conditioned into an agent that is not goal agnostic).
      - Ilio 18 Oct 2023 14:43 UTC
        3 points
        0
        Parent
        Thanks for your patience and clarifications.
        
        The observable difference between the two is the presence of instrumental behavior towards whatever goals it has.
        
        Say again? On my left an agent that “just is goal agnostic”. On my right an agent that “just want to be goal agnostic”. At first both are still -the first because it is goal agnostic, the second because they want to look as if they were goal agnostic. Then I ask something. The first respond because they don’t mind doing what I ask. The second respond because they want to look as if they don’t mind doing what I ask. Where’s the observable difference?
        porby 23 Oct 2023 0:35 UTC
        5 points
        0
        Parent
        If you have a model that “wants” to be goal agnostic in a way that means it behaves in a goal agnostic way in all circumstances, it is goal agnostic. It never exhibits any instrumental behavior arising from unconditional preferences over external world states.
        For the purposes of goal agnosticism, that form of “wanting” is an implementation detail. The definition places no requirement on how the goal agnostic behavior is achieved.
        In other words:
        If the model is describable as wanting to be goal agnostic, in terms of a utility function, it is not goal agnostic.
        A model that “wants” to be goal agnostic such that its behavior is goal agnostic can’t be described as “wanting” to be goal agnostic in terms of its utility function; there will be no meaningful additional terms for “being goal agnostic,” just the consequences of being goal agnostic.
        As a result of how I was using the words, the fact that there is an observable difference between “being” and “wanting to be” is pretty much tautological.
        Ilio 23 Oct 2023 22:26 UTC
        1 point
        0
        Parent
        
        A model that “wants” to be goal agnostic such that its behavior is goal agnostic can’t be described as “wanting”
        
        Ok, I did not expect you were using a tautology there. I’m not sure I get how to use it. Would you say a thermostat can’t be described as wanting because it’s being goal agnostic?
        porby 24 Oct 2023 22:36 UTC
        3 points
        0
        Parent
        If you were using “wanting” the way I was using the word in the previous post, then yes, it would be wrong to describe a goal agnostic system as “wanting” something, because the way I was using that word would imply some kind of preference over external world states.
        I have no particular ownership over the definition of “wanting” and people are free to use words however they’d like, but it’s at least slightly unintuitive to me to describe a system as “wanting X” in a way that is not distinct from “being X,” hence my usage.
        Ilio 25 Oct 2023 13:53 UTC
        1 point
        0
        Parent
        
        the way I was using that word would imply some kind of preference over external world states.
        
        It’s 100% ok to have your own set of useful definitions, just trying to understand it. In this very sense, one cannot want an external world state that is already in place, correct?
        
        it’s at least slightly unintuitive to me to describe a system as “wanting X” in a way that is not distinct from “being X,”
        
        Let’s say we want to maximize the number of digits of pi we explicitly know. You could say being continuously curious about the next digits is a continuous state of being, so in disguise this is actually not a goal (or at least not in the sense you’re using this word). Or you could say the state of the world does not include all the digits of pi, so that’s a valid want to want to know more. Which one is a better match for your intuition?
        
        Also, what about the thermostat question above?
        porby 26 Oct 2023 16:37 UTC
        3 points
        0
        Parent
        In this very sense, one cannot want an external world state that is already in place, correct?
        An agent can have unconditional preferences over world states that are already fulfilled. A maximizer doesn’t stop being a maximizer if it’s maximizing.
        Let’s say we want to maximize the number of digits of pi we explicitly know.
        That’s definitely a goal, and I’d describe an agent with that goal as both “wanting” in the previous sense and not goal agnostic.
        Also, what about the thermostat question above?
        If the thermostat is describable as goal agnostic, then I wouldn’t say it’s “wanting” by my previous definition. If the question is whether the thermostat’s full system is goal agnostic, I suppose it is, but in an uninteresting way.
        (Note that if we draw the agent-box around ‘thermostat with temperature set to 72’ rather than just ‘thermostat’ alone, it is not goal agnostic anymore. Conditioning a goal agnostic agent can produce non-goal agnostic agents.)
        Ilio 27 Oct 2023 17:21 UTC
        3 points
        0
        Parent
        
        An agent can have unconditional preferences over world states that are already fulfilled. A maximizer doesn’t stop being a maximizer if it’s maximizing.
        
        Well said! In my view, if we’d feed a good enough maximizer with the goal of learning to look as if they were a unified goal agnostic agent, then I’d expect the behavior of the resulting algorithm to handle the paradox well enough it’ll make sense.
        
        If the question is whether the thermostat’s full system is goal agnostic, I suppose it is, but in an uninteresting way.
        
        I beg to differ. In my view our volitions look as if from a set of internal thermostats that impulse our behaviors, like the generalization to low n of the spontaneous fighting danse of two thermostats. If the latter can be described as goal agnostic, I don’t think the former shall not (hence my examples of environmental constraints that could let someone use your or my personality as a certified subprogram).
        
        Conditioning a goal agnostic agent can produce non-goal agnostic agents.
        
        Yes, but shall we also agree that non-goal agnostic agents can produce goal agnostic agent?
        Expand this thread
        porby 2 Nov 2023 22:53 UTC
        3 points
        1
        Parent
        In my view, if we’d feed a good enough maximizer with the goal of learning to look as if they were a unified goal agnostic agent, then I’d expect the behavior of the resulting algorithm to handle the paradox well enough it’ll make sense.
        If you successfully gave a strong maximizer the goal of maximizing a goal agnostic utility function, yes, you could then draw a box around the resulting system and correctly call it goal agnostic.
        In my view our volitions look as if from a set of internal thermostats that impulse our behaviors, like the generalization to low n of the spontaneous fighting danse of two thermostats. If the latter can be described as goal agnostic, I don’t think the former shall not (hence my examples of environmental constraints that could let someone use your or my personality as a certified subprogram).
        Composing multiple goal agnostic systems into a new system, or just giving a single goal agnostic system some trivial scaffolding, does not necessarily yield goal agnosticism in the new system. It won’t necessarily eliminate it, either; it depends on what the resulting system is.
        Yes, but shall we also agree that non-goal agnostic agents can produce goal agnostic agent?
        Yes; during training, a non-goal agnostic optimizer can produce a goal agnostic predictor.
        Ilio 3 Nov 2023 20:58 UTC
        1 point
        0
        Parent
        Thanks, that helps.
        
        Yes; during training, a non-goal agnostic optimizer can produce a goal agnostic predictor.
        
        Suppose an agent is made robustly curious about what humans will next chose when free from external pressures and nauseous if its own actions could be interpreted as if experimenting on humans or its own code, do you agree it would be a good candidate for goal agnosticism?
        porby 8 Nov 2023 21:00 UTC
        3 points
        0
        Parent
        Probably not? It’s tough to come up with an interpretation of those properties that wouldn’t result in the kind of unconditional preferences that break goal agnosticism.
        Ilio 9 Nov 2023 21:39 UTC
        3 points
        0
        Parent
        As you might guess, it’s not obvious to me. Would you mind to provide some details on these interpretations and how you see the breakage happens?
        
        Also, we’ve been going back and forth without feeling the need to upvote each other, which I thought was fine but turns out being interpreted negatively. [to clarify: it seems to be one of the criterion here: https://www.lesswrong.com/posts/hHyYph9CcYfdnoC5j/automatic-rate-limiting-on-lesswrong] If that’s you thoughts too, we can close at this point, otherwise let’s give each other some high fives. Your call and thanks for the discussion in any case.
        porby 10 Nov 2023 0:01 UTC
        3 points
        0
        Parent
        For example, a system that avoids experimenting on humans—even when prompted to do so otherwise—is expressing a preference about humans being experimented on by itself.
        Being meaningfully curious will also come along with some behavioral shift. If you tried to induce that behavior in a goal agnostic predictor through conditioning for being curious in that way and embed it in an agentic scaffold, it wouldn’t be terribly surprising for it to, say, set up low-interference observation mechanisms.
        Not all violations of goal agnosticism necessarily yield doom, but even prosocial deviations from goal agnosticism are still deviations.
        Ilio 10 Nov 2023 17:26 UTC
        3 points
        0
        Parent
        …but I thought the criterion was unconditional preference? The idea of nausea is precisely because agents can decide to act despite nausea, they’d just rather find a better solution (if their intelligence is up to the task).
        
        I agree that curiosity, period seems highly vulnerable (You read Scott Alexander? He wrote an hilarious hit piece about this idea a few weeks or months ago). But I did not say curious, period. I said curious about what humans will freely chose next.
        
        In other words, the idea is that it should prefer not to trick humans, because if it does (for example by interfering with our perception) then it won’t know what we would have freely chosen next.
        
        It also seems to cover security (if we’re dead it won’t know), health (if we’re incapacitated it won’t know) and prosperity (if we’re under economical constraints that impacts our free will). But I’m interested to consider possible failure modes.
        
        (« Sorry, I’d rather not do your wills, for that would impact the free will of other humans. But thanks for letting me know that was your decision! You can’t imagine how good it feels when you tell me that sort of things! »)
        
        Notice you don’t see me campaigning for this idea, because I don’t like any solution that does not also take care of AI well being. But when I first read « goal agnosticism » it strikes me as an excellent fit for describing the behavior of an agent acting under these particular drives.
        porby 14 Nov 2023 2:41 UTC
        3 points
        0
        Parent
        …but I thought the criterion was unconditional preference? The idea of nausea is precisely because agents can decide to act despite nausea, they’d just rather find a better solution (if their intelligence is up to the task).
        Right; a preference being conditionally overwhelmed by other preferences does not make the presence of the overwhelmed preference conditional.
        Or to phrase it another way, suppose I don’t like eating bread^[1] (-1 utilons), but I do like eating cheese (100 utilons) and garlic (1000 utilons).
        You ask me to choose between garlic bread (1000 − 1 = 999 utilons) and cheese (100 utilons); I pick the garlic bread.
        The fact that I don’t like bread isn’t erased by the fact that I chose to eat garlic bread in this context.
        It also seems to cover security (if we’re dead it won’t know), health (if we’re incapacitated it won’t know) and prosperity (if we’re under economical constraints that impacts our free will). But I’m interested to consider possible failure modes.
        This is aiming at a different problem than goal agnosticism; it’s trying to come up with an agent that is reasonably safe in other ways.
        In order for these kinds of bounds (curiosity, nausea) to work, they need to incorporate enough of the human intent behind the concepts.
        So perhaps there is an interpretation of those words that is helpful, but there remains the question “how do you get the AI to obey that interpretation,” and even then, that interpretation doesn’t fit the restrictive definition of goal agnosticism.
        The usefulness of strong goal agnostic systems (like ideal predictors) is that, while they do not have properties like those by default, they make it possible to incrementally implement those properties.
        ^
        utterly false for the record
        Ilio 14 Nov 2023 21:25 UTC
        3 points
        0
        Parent
        
        This is aiming at a different problem than goal agnosticism; it’s trying to come up with an agent that is reasonably safe in other ways.
        
        Well, assuming a robust implementation, I still think it obeys your criterions, but now you mention « restrictive », my understanding is that you want this expression to specifically refers to pure predictors. Correct?
        
        If yes, I’m not sure that’s the best choice for clarity (why not « pure predictors »?) but of course that’s your choice. If not, can you give some examples of goal agnostic agents other than pure predictors?
        porby 25 Nov 2023 0:22 UTC
        3 points
        0
        Parent
        you mention « restrictive », my understanding is that you want this expression to specifically refers to pure predictors. Correct?
        Goal agnosticism can, in principle, apply to things which are not pure predictors, and there are things which could reasonably be called predictors which are not goal agnostic.
        A subset of predictors are indeed the most powerful known goal agnostic systems. I can’t currently point you toward another competitive goal agnostic system (rocks are uselessly goal agnostic), but the properties of goal agnosticism do, in concept, extend beyond predictors, so I leave the door open.
        Also, by using the term “goal agnosticism” I try to highlight the value that arises directly from the goal-related properties, like statistical passivity and the lack of instrumental representational obfuscation. I could just try to use the more limited and implementation specific “ideal predictors” I’ve used before, but in order to properly specify what I mean by an “ideal” predictor, I’d need to specify goal agnosticism.
        Ilio 25 Nov 2023 23:42 UTC
        3 points
        0
        Parent
        I’d be happy if you could point out a non competitive one, or explain why my proposal above does not obey your axioms. But we seem to get diminished returns to sort these questions out, so maybe it’s time to close at this point and wish you luck. Thanks for the discussion!