Quintin Pope comments on Seriously, what goes wrong with “reward the agent when it makes you smile”?

Quintin Pope Aug 12, 2022, 3:32 AM
5 points
1
We can probably rule out “a spread of situationally-activated computations which steer its actions towards historical reward-correlates”, insofar as that spread is a much less compact policy-encoding than an explicit search process + simple objective(s).
Seems like you can have a yet-simpler policy by factoring the fixed “simple objective(s)” into implicit, modular elements that compress many different objectives that may be useful across many different environments. Then at runtime, you feed the environmental state into your factored representation of possible objectives and produce a mix of objectives tailored to your current environment, which steer towards behaviors that achieved high reward on training runs similar to the current environment.
That would seem quite close to “a spread of situationally-activated computations which steer its actions towards historical reward-correlates”, and it seems pretty similar to how my own values / goals arise in an environmentally-dependent manner without me having access to any explicitly represented “simple objective(s)” that I retain across environments.
- Thane Ruthenis Aug 12, 2022, 12:47 PM
  4 points
  0
  Parent
  That seems like a semantical difference? We may just as well call these modular elements the “objectives”, with them having different environment-specific local implementations.
  E. g., if my goal is “winning”, it would unfold into different short-term objectives depending on whether I’m playing chess or football, but we can still meaningfully call it a “goal”.
  - Quintin Pope Aug 13, 2022, 5:10 AM
    3 points
    1
    Parent
    I’m confident that this is not a semantic difference. The modular elements I was describing represent a process for determining ones objectives, depending on the environment and your current beliefs. It would be a type error to call them “objectives”, just as it would be a type error to call a search process your “plans”. They each represent compressions of possible objectives / plans, but are not those things themselves.
    Similarly, it would be incorrect to call a GPT model a “collection of sentences”, even though they are essentially compressions over many possible sentences.
    - Thane Ruthenis Aug 13, 2022, 6:56 AM
      1 point
      0
      Parent
      Okay, suppose we feed many environment-states into some factored representation of possible objectives, and generate a lot of (environment, objectives) mappings for a given agent. In your model, is it possible to summarize these results somehow; is it possible to say something general about what the agent is trying to do in all of these environments? (E. g., like my football & chess example.)
      - Quintin Pope Aug 13, 2022, 7:47 AM
        2 points
        0
        Parent
        Yes, it’s possible to do summary statistics on the outputted goals, just like you can do summary statistics on the outputs of GPT-3, or in the plans produced by a given search algorithm. That doesn’t make generators of these things have the same type signature as the things themselves.
        
        My counterpoint to John is specifically about the sort of computational structures that can represent goals, while being both simple AND environment/belief-dependent. I’m saying simplicity does not push against representing goals in an environment-dependent way, because your generator of goals can be conditioned on the environment.
        Thane Ruthenis Aug 13, 2022, 9:34 AM
        2 points
        0
        Parent
        Yes, it’s possible to do summary statistics on the outputted goals
        How “meaningful” would that summary be? Does my “winning at chess vs football” analogy fit what you’re describing, with “winning” being the compressed objective-generator and the actual win conditions of chess/football being the environment-specific objectives?
        Quintin Pope Aug 15, 2022, 6:42 AM
        3 points
        1
        Parent
        My point is that you can have “goals” (things your search process steers the world towards) and “generators of goals”. These are different things, and you should not use the same name for them.
        
        More specifically, there is a difference in the computational type signature between generators and the things they generate. You can call these two things by whatever label you like, but they are not the same thing.
        
        You can look a person’s plans / behavior in many different games and conclude that it demonstrates a common thread which you might label “winning”. But you should not call the latent cognitive generators responsible for this common thread by the same name you use for the world states the person’s search process steers towards in different environments.
        Thane Ruthenis Aug 15, 2022, 7:28 AM
        3 points
        0
        Parent
        Alright, then it is a semantics debate from my perspective. I don’t think we’re actually disagreeing, now. Your “objective-generators” cleanly map to my “goals”, and your “objectives” to my “local implementations of goals” (or maybe “values” and “local interpretations of values”). That distinction definitely makes sense at the ground level. In my ontology, it’s a distinction between what you want and how achieving it looks like in a given situation.
        I think it makes more sense to describe it my way, though, since I suspect a continuum of ever-more-specific/local objectives (“winning” as an environment-independent goal, “winning” in this type of game, “winning” against the specific opponent you have, “winning” given this game and opponent and the tactic they’re using), rather than a dichotomy of “objective-generator” vs “objective”, but that’s a finer point.
        Thane Ruthenis Aug 15, 2022, 8:32 AM
        1 point
        0
        Parent
        Although, digging into the previously-mentioned finer points, I think there is room for some meaningful disagreement.
        I don’t think there are goal-generators as you describe them. I think there are just goals, and then some plan-making/search mechanism which does goal translation/adaptation/interpretation for any given environment the agent is in. I. e., the “goal generators” are separate pieces from the “ur-goals” they take as input.
        And as I’d suggested, there’s a continuum of ever-more specific objectives. In this view, I think the line between “goals” and “plans” blurs, even, so that the most specific “objectives” are just “plans”. In this case, the “goal generator” is just the generic plan-making process working in a particular goal-interpreting regime.
        (Edited-in example: “I want to be a winner” → “I want to win at chess” → “I want to win this game of chess” → “I want to decisively progress towards winning in this turn” → “I want to make this specific move”. The early steps here are clear examples of goal-generation/translation (what does winning mean in chess?), the latter clear examples of problem-solving (how do I do well this turn?), but they’re just extreme ends of a continuum.)
        The initial goal-representations from which that process starts could be many things — mathematically-precise environment-independent utility functions, or goals defined over some default environment (as I suspect is the case with humans), or even step-one objective-generators, as you’re suggesting. But the initial representation being an objective-generator itself seems like a weirdly special case, not how this process works in general.
- Daniel Kokotajlo Aug 15, 2022, 11:18 PM
  4 points
  0
  Parent
  Seems like you can have a yet-simpler policy by factoring the fixed “simple objective(s)” into implicit, modular elements that compress many different objectives that may be useful across many different environments. Then at runtime, you feed the environmental state into your factored representation of possible objectives and produce a mix of objectives tailored to your current environment, which steer towards behaviors that achieved high reward on training runs similar to the current environment.
  Can you explain why this policy is yet-simpler? It sounds more complicated to me.
  - Quintin Pope Aug 16, 2022, 7:39 PM
    2 points
    0
    Parent
    I’m saying that it’s simpler to have a goal generator that can be conditioned on the current environment, rather than memorizing each goal individually.
- johnswentworth Aug 12, 2022, 4:45 AM
  3 points
  4
  Parent
  That sure does sound like a description of a search algorithm, right there.
  - Quintin Pope Aug 12, 2022, 5:03 AM
    5 points
    1
    Parent
    I’m not objecting to your assertion that some sort of search takes place. I’m objecting to your characterization of what sorts of objectives the search ends up pointed towards. Basically, I’m saying that “situationally activated heuristics that steer towards environment-dependent goals” is totally in-line with a simplicity prior over cognitive structures leading to a search-like process.
    The whole reason you say that we should expect search processes is because they can compress many different environment and beliefs dependent plans into a simpler generator of such plans (the search), which takes in environment info, beliefs, and the agent’s simple, supposedly environment-independent, objectives, and produces a plan. So, the agent only needs to store the search process and its environment-independent objectives.
    I’m saying you can apply a similar “compress into an environment / beliefs conditioned generator” trick to the objectives as well, and get a generator of objectives that condition on the environment and current beliefs to produce objectives for the search process.
    Thus, objectives remain environment-dependent, and will probably steer towards world states that resemble those which were rewarded during training. I think this is quite similar to “a spread of situationally-activated computations which steer its actions towards historical reward-correlates”, if involving rather more sophisticated cognition than phrases like “contextually activated heuristics” often imply.