Vanessa Kosoy comments on Human values & biases are inaccessible to the genome

Vanessa Kosoy Jul 8, 2022, 6:51 AM
LW: 7 AF: 3
0
AF
I think the way it works is approximately as follows. There is a fixed “ontological” infra-POMDP which is a coarse hard-coded world-model sufficient to define the concepts on which the reward depends (for humans, it would includes concepts such as “other humans”). Then there is a prior which is composed of refinements of this infra-POMDP. The reward depends on state of the ontological IPOMDP, so it is allowed to depend on the concepts of the hard-cord world-model (but not on the concepts which only exist in the refined models). Ofc, this leaves open the question of identifying the conditions for learnability and what to do when we don’t have learnability (which is something that we need to handle anyway because of traps).

Another way to “point at outside concepts” is infra-Bayesian physicalism where outside concepts are represented as computations. But, I don’t think the human brain in hard-coded to do IBP. These two approaches are also related, as can be seen in section 3, but exploring the relation further is another open problem.
- TurnTrout Jul 11, 2022, 3:08 AM
  LW: 6 AF: 3
  2
  AF Parent
  Without knowing the details of infra-POMDPs or your other work, by what Bayesian evidence do you raise this particular hypothesis to consideration? (I say this not to imply that you do not have such evidence, only that I do not presently see why I should consider this particular hypothesis.)
  - Vanessa Kosoy Jul 11, 2022, 8:12 AM
    LW: 10 AF: 6
    4
    AF Parent
    My reasoning can be roughly described as:
    
    There is a simple mathematical theory of agency, similarly to how there is are simple mathematical theories of e.g. probability of computational complexity
    This theory should include, explaining how agents can have goals defined not in terms of sensory data
    I have a current best guess to what the outline of this theory looks like, based on (i) simplicity (ii) satisfying natural-seeming desiderata and (iii) ability to prove relevant non-trivial theorems (for example, infra-Bayesian reinforcement learning theory is an ingredient)
    This theory of non-sensory goals seems to fit well into the rest of the picture, and I couldn’t find a better alternative (for example, it allows talking about learnability, regret bounds and approximating Bayes-optimality)
    
    I admit this explanation is not very legible, since writing a legible explanation would be an entire project. One way to proceed with the debate is, you naming any theory that seems to you at equally good or better (since you seem to have the feeling that there are a lot of equally good or better theories) and me trying to explain why it’s actually worse.
    - Quintin Pope Jul 15, 2022, 12:32 AM
      LW: 6 AF: 4
      4
      AF Parent
      I’d note that it’s possible for an organism to learn to behave (and think) in accordance with the “simple mathematical theory of agency” you’re talking about, without said theory being directly specified by the genome. If the theory of agency really is computationally simple, then many learning processes probably converge towards implementing something like that theory, simply as a result of being optimized to act coherently in an environment over time.
      - Vanessa Kosoy Jul 16, 2022, 11:42 AM
        LW: 7 AF: 5
        −2
        AF Parent
        Well, how do you define “directly specified”? If human brains reliably converge towards a certain algorithm, then effectively this algorithm is specified by the genome. The real question is, which parts depends only on genes and which parts depend on the environment. My tentative opinion is that the majority is in the genes, since humans are, broadly speaking, pretty similar to each other. One environment effect is, feral humans grow up with serious mental problems. But, my guess is, this is not because of missing “values” or “biases”, but (to 1st approximation) because they lack the ability to think in language. Another contender for the environment-dependent part is cultural values. But even here, I suspect that humans just follow social incentives rather than acquire cultural values as an immutable part of their own utility function. I admit that it’s difficult to be sure about this.
        TurnTrout Jul 17, 2022, 5:57 PM
        LW: 2 AF: 2
        0
        AF Parent
        I don’t classify “convergently learned” as an instance of “directly specified”, but rather “indirectly specified, in conjunction with the requisite environmental data.” Here’s an example. I think that humans’ reliably-learned edge detectors in V1 are not “directly specified”, in the same way that vision models don’t have directly specified curve detectors, but these detectors are convergently learned in order to do well on vision tasks.
        If I say “sunk cost is directly specified”, I mean something like “the genome specifies neural circuitry which will eventually, in situations where sunk cost arises, fire so as to influence decision-making.” However, if, for example, the genome lays out the macrostructure of the connectome and the broad-scale learning process and some reward circuitry and regional learning hyperparameters and some other details, and then this brain eventually comes to implement a sunk-cost bias, I don’t call that “direct specification.”
        I wish I had been more explicit about “direct specification”, and perhaps this comment is still not clear. Please let me know if so!
        Vanessa Kosoy Jul 23, 2022, 10:59 AM
        LW: 5 AF: 5
        0
        AF Parent
        I think that “directly specified” is just an ill-defined concept. You can ask whether A specifies B using encoding C. But if you don’t fix C? Then any A can be said to “specify” any B (you can always put the information into C). Algorithmic information theory might come to the rescue by rephrasing the question as: “what is the relative Kolmogorov complexity K(B|A)?” Here, however, we have more ground to stand on, namely there is some function $f : G \times E \to B$ where $G$ is the space of genomes, $E$ is the space of environments and $B$ is the space of brains. Also we might be interested in a particular property of the brain, which we can think of as a function $h : B \to P$ , for example $h$ might be something about values and/or biases. We can then ask e.g. how much mutual information is there between $g \in G$ and $h (g, e)$ vs. between $e \in E$ and $h (g, e)$ . Or, we can ask what is more difficult: changing $h (g, e)$ by changing $g$ or by changing $e$ . Where the amount of “difficulty” can be measured by e.g. what fraction of inputs produce the desired output.
        
        So, there are certainly questions that can be asked about, what information comes from the genome and what information comes from the environment. I’m not sure whether this is what you’re going for, or you imagine some notion of information that comes from neither (but I have no idea what would that mean)? In any case, I think your thesis would benefit if you specified it more precisely. Given such a specification, it would be possible to assess the evidence more carefully.