johnswentworth comments on Behavioral Sufficient Statistics for Goal-Directedness

johnswentworth 11 Mar 2021 18:51 UTC
LW: 16 AF: 8
AF
I think you are very confused about the conceptual significance of a “sufficient statistic”.
Let’s start with the prototypical setup of a sufficient statistic. Suppose I have a bunch of IID variables ${X_{i}}$ drawn from a maximum-entropy distribution with features $f (X)$ (i.e. the “true” distribution is maxentropic subject to a constraint on the expectation of $f (X)$ ), BUT I don’t know the parameters of the distribution (i.e. I don’t know the expected value $E [f (X)]$ ). For instance, maybe I know that the variables are drawn from a normal distribution, but I don’t know the mean and variance of the distribution. In a Bayesian sense, the variables ${X_{i}}$ are not actually independent: learning the value of one (or a few) data points $X_{i}$ tells me something about the distribution parameters (i.e. mean and variance in the Gaussian case), which in turn gives me information about the other (unobserved) data points $X_{j}$ .
However… if I have a few data points $X_{i}$ , then all of the information from those $X_{i}$ which is relevant to other (unobserved) data points $X_{j}$ is summarized by the sufficient statistic $\frac{1}{N} \sum_{i} f (X_{i})$ . Or, to put it differently: while $X_{i}$ and $X_{j}$ are not independent in a Bayesian sense, they are conditionally independent given the summary statistic $\frac{1}{N} \sum_{i} f (X_{i})$ . This is a special property of maximum entropy distributions, and is one of the main things which makes them pleasant to work with mathematically.
So: the conceptual significance of a “sufficient statistic” is that it summarizes all of the information from some data $X_{i}$ which is relevant to some other data/parameter/question $X_{j}$ .
Coming back to the post: if you want to claim that a set of variables together constitute “sufficient statistics for goal-directedness”, then you need to argue that those variables together summarize all information from the underlying system which could possibly be relevant to goal directedness. You have to argue that, once we know the sufficient statistics, then there is not any other information about the underlying system which could possibly be relevant to determining how goal-directed the system is. The main challenge is not to argue that all these statistics are relevant, but rather to argue that there cannot possibly be any other relevant information not already fully accounted for by these statistics. As far as I can tell, the post did not even attempt such an argument.
BTW, I do think you should attempt such an argument. The “sufficient statistics” in this post sound like ad-hoc measures which roughly capture some intuitions about goal-directedness, but there’s no obvious reason to think they’re the right measures. Take the explainability factor, for instance. It’s using maximums and averages all over the place; why these operations, rather than a softmax, or weighted average, or order statistic, or log transform, or …? As far as I can tell, this was an ad-hoc choice, and I expect these sorts of ad-hoc choices to diverge from our intuitive interpretations in corner cases.
The sort of argument needed to justify the term “sufficient statistic”—i.e. arguing that no other information can possibly be relevant—is exactly the sort of argument which makes it clear that we’re using the right statistics, rather than ad-hoc metrics which probably diverge from our interpretations in lots of corner cases.
What links here?
- Traps of Formalization in Deconfusion by adamShimi (5 Aug 2021 22:40 UTC; 28 points)
- adamShimi 12 Mar 2021 15:09 UTC
  LW: 4 AF: 3
  AF Parent
  Thanks for the spot-on pushback!
  I do understand what a sufficient statistics is—which probably means I’m even more guilty of what you’re accusing me of. And I agree completely that I don’t defend correctly that the statistics I provide are really sufficient.
  If I try to explain myself, what I want to say in this post is probably something like
  - Knowing these intuitive properties about $π$ and the goals seems sufficient to express and address basically any question we have related to goals and goal-directedness. (in a very vague intuitive way that I can’t really justify).
  - To think about that in a grounded way, here are formulas for each property that look like they capture these properties.
  - Now what’s left to do is to attack the aforementioned questions about goals and goal-directedness with these statistics, and see if they’re enough. (Which is the topic of the next few posts)
  Honestly, I don’t think there’s an argument to show these are literally sufficient statistics. Yet I still think staking the claim that they are is quite productive for further research. It gives concreteness to an exploration of goal-directedness, carving more grounded questions:
  - Given a question about goals and goal-directedness, are these properties enough to frame and study this question? If yes, then study it. If not, then study what’s missing.
  - Are my formula adequate formalization of the intuitive properties?
  This post mostly focuses on the second aspect, and to be honest, not even in as much detail as one could go.
  Maybe that means this post shouldn’t exist, and I should have waited to see if I could literally formalize every question about goals and goal-directedness. But posting it to gather feedback on whether these statistics makes sense to people, and if they feel like something’s missing, seemed valuable.
  That being said, my mistake (and what caused your knee-jerk reaction) was to just say these are literally sufficient statistics instead of presenting it the way I did in this comment. I’ll try to rewrite a couple of sentences to make that clear (and add another note at the beginning so your comment doesn’t look obsolete.
  - johnswentworth 12 Mar 2021 15:55 UTC
    LW: 12 AF: 9
    AF Parent
    I still feel like you’re missing something important here.
    For instance… in the explainability factor, you measure “the average deviation of $π$ from the actions favored by the action-value function $q_{μ}$ of $μ$ ”, using the formula
    $p r e d_{g}^{E} (π, μ, s) = \frac{1}{T} T \sum t = 0 \frac{max a q_{μ} (s_{t}, a) - q_{μ} (s_{t}, a c t i o n_{π})}{max a q_{μ} (s_{t}, a)}$
    . But why this particular formula? Why not take the log of $q_{μ}$ first, or use $3 + max a q_{μ} (s_{t}, a)$ in the denominator? Indeed, there’s a strong argument to be made this formula is a bad choice: the value function $q_{μ}$ is invariant under multiplying by a scalar or adding a constant (i.e. these operations leave the preferences encoded by $q_{μ}$ unchanged), yet this value is not invariant to adding a constant to $q_{μ}$ . So we could change our representation of the “goal” to which we’re comparing, in a way which should still represent the same goal, yet the supposed answer to “how well does this goal explain the system’s behavior” changes.
    Don’t get too caught up on this one specific issue—there’s a broader problem I’m pointing to here. The problem is with trying to use arbitrary formulas to represent intuitive concepts. If multiple non-equivalent formulas seem like similarly-plausible quantifications of an intuitive concept, then at least one of them is wrong; we have not yet understood the intuitive concept well enough to correctly quantify it. Unless every degree of freedom in the formula is nailed down (up to mathematical equivalence), we haven’t actually quantified the intuitive concept, we’ve just come up with a proxy.
    That’s what these numbers are: they’re not sufficient statistics, they’re proxies, in exactly the same sense that “how often a human pushes an approval button” is a proxy for how good an AI’s actions are. And they will break down, as proxies always do.
    That puts this part in a somewhat different perspective:
    Honestly, I don’t think there’s an argument to show these are literally sufficient statistics. Yet I still think staking the claim that they are is quite productive for further research. It gives concreteness to an exploration of goal-directedness, carving more grounded questions:
    Given a question about goals and goal-directedness, are these properties enough to frame and study this question? If yes, then study it. If not, then study what’s missing.
    Are my formula adequate formalization of the intuitive properties?
    I claim it makes more sense to word these questions as:
    Given a question about goals and goal-directedness, are these proxies enough to frame and study this question?
    Are these proxies adequate formalizations of the intuitive properties?
    The answer to the first question may sometimes be “yes”. The answer to the second is definitely “no”; these are proxies, and they absolutely will not hold up if we try to put optimization pressure on them. Goodhart’s law will kick in. For instance, tying back to the earlier example, at some point there may be a degree of freedom in how the goal is represented, without changing the substantive meaning of the goal (e.g. adding a constant to $q_{μ}$ ). Normally, that won’t be much of a problem, but if we put optimization pressure on it, then we’ll end up with some big constant added to $μ$ in order to change the explainability factor, and then the proxy will break down—the explainability factor will cease to be a good measure of explainability.
    - adamShimi 15 Mar 2021 18:11 UTC
      LW: 12 AF: 4
      AF Parent
      To people reading this thread: we had a private conversation with John (faster and easier), which resulted in me agreeing with him.
      The summary is that you can see the arguments made and constraints invoked as a set of equations, such that the adequate formalization is a solution of this set. But if the set has more than one solution (maybe a lot), then it’s misleading to call that the solution.
      So I’ve been working these last few days at arguing for the properties (generalization, explainability, efficiency) in such a way that the corresponding set of equations only has one solution.
      What links here?
      What’s So Bad About Ad-Hoc Mathematical Definitions? by johnswentworth (15 Mar 2021 21:51 UTC; 200 points)
      What Is The True Name of Modularity? by CallumMcDougall (1 Jul 2022 14:55 UTC; 39 points)
      - johnswentworth 15 Mar 2021 18:43 UTC
        LW: 2 AF: 2
        AF Parent
        I’m working on writing it up properly, should have a post at some point.
        EDIT: it’s up.