TurnTrout comments on Humans aren’t fitness maximizers

TurnTrout 4 Oct 2022 18:31 UTC
LW: 17 AF: 8
14
AF
But, do you really fundamentally care that your kids have genomes?
Seems not relevant? I think we’re running into an under-definition of IGF (and the fact that it doesn’t actually have a utility function, even over local mutations on a fixed genotype). Does IGF have to involve genomes, or just information patterns as written in nucleotides or in binary? The “outer objective” of IGF suffers a classic identifiability issue common to many “outer objectives”, where the ancestral “training signal” history is fully compatible with “IGF just for genomes” and also “IGF for all relevant information patterns made of components of your current pattern.”
(After reading more, you later seem to acknowledge this point—that evolution wasn’t “shouting” anything about genomes in particular. But then why raise this point earlier?)
Now, there’s a reasonable counterargument to this point, which is that there’s no psychologically-small tweak to human psychology that dramatically increases that human’s IGF. (We’d expect evolution to have gathered that low-hanging fruit.)
I don’t know if I disagree, it depends what you mean here. If “psychologically small” is “small” in a metric of direct tweaks to high-level cognitive properties (like propensity to cheat given abstract knowledge of resources X and mating opportunities Y), then I think that isn’t true. By information inaccessibility, I think that evolution can’t optimize directly over high-level cognitive properties.
Optima often occur at extremes, and concepts tend to differ pretty widely at the extremes, etc. When the AI gets out of the training regime and starts really optimizing, then any mismatch between its ends and our values are likely to get exaggerated.
This kind of argument seems sketchy to me. Doesn’t it prove too much? Suppose there’s a copy of me which also values coffee to the tune of $40/month and reflectively endorses that value at that strength. Are my copy and I now pairwise misaligned in any future where one of us “gets out of the training regime and starts really optimizing”? (ETA: that is, significantly more pairwise misaligned than I would be with an exact copy of myself in such a situation. For more selfish people, I imagine this prompt would produce misalignment due to some desires like coffee/sex being first-person.)
And all this is to say nothing about how humans’ values are much more complex and fragile than IGF, and thus much trickier to transmit
Complexity is probably relative to the learning process and inductive biases in question. While any given set of values will be difficult to transmit in full (which is perhaps your point), the fact that humans did end up with their values shows evidence that human values are the kind of thing which can be transmitted/formed easily in at least one architecture.
- David Johnston 4 Oct 2022 21:00 UTC
  1 point
  0
  Parent
  Regarding identifiably, there’s a maybe slightly useful question you could ask which is something like “if evolution was designed by an actual human computer scientist, what do you think they wanted to achieve?”
  
  …. But I feel like that’s ultimately just begging the questions that “IGF maximisation” is supposed to help answer.