evhub comments on Agents vs. Predictors: Concrete differentiating factors

evhub 7 Mar 2023 1:25 UTC
LW: 6 AF: 5
1
AF
Here’s a particularly nice concrete example of the first thing here that you can test concretely right now (thanks to (edit: Jacob Pfau and) Ethan Perez for this example): give a model a prompt full of examples of it acting poorly. An agent shouldn’t care and should still act well regardless of whether it’s previously acted poorly, but a predictor should reason that probably the examples of it acting poorly mean it’s predicting a bad agent, so it should continue to act poorly.
- Jacob Pfau 8 Mar 2023 16:24 UTC
  11 points
  4
  Parent
  Generalizing this point, a broader differentiating factor between agents and predictors is: You can, in-context, limit and direct the kinds of optimization used by a predictor. For example, consider the case where you know myopically/locally-informed edits to a code-base can safely improve runtime of the code, but globally-informed edits aimed at efficiency may break some safety properties. You can constrain a predictor via instructions, and demonstrations of myopic edits; an agent fine-tuned on efficiency gain will be hard to constrain in this way.
  
  It’s harder to prevent an agent from specification gaming / doing arbitrary optimization whereas a predictor has a disincentive against specification gaming insofar as the in-context demonstration provides evidence against it. I think of this distinction as the key differentiating factor between agents and simulated agents; also to some extent imitative amplification and arbitrary amplification
  
  Nitpick on the history of the example in your comment; I am fairly confident that I originally proposed it to both you and Ethan c.f. bottom of your NYU experiments Google doc.
  - evhub 17 Mar 2023 20:39 UTC
    2 points
    0
    Parent
    
    Nitpick on the history of the example in your comment; I am fairly confident that I originally proposed it to both you and Ethan c.f. bottom of your NYU experiments Google doc.
    
    Edited!