Wei Dai comments on Counting arguments provide no evidence for AI doom

Wei Dai 28 Feb 2024 0:48 UTC
LW: 12 AF: 10
5
AF

In reality, the problem faced by evolution and by SGD is much easier than this: producing systems that behave the right way in all scenarios they are likely to encounter. In virtue of their aligned behavior, these systems will be “aimed at the right things” in every sense that matters in practice.

I find this passage remarkable, given that so many people are choosing to to have few or no children that fertility has fallen to 0.78 in Korea and 1.0 in China. Presumably you’re aware of these (or similar) facts and intended the meaning of this passage to be compatible with them, but I’m having trouble figuring out how...

By contrast, goal realism leads only to unfalsifiable speculation about an “inner actress” with utterly alien motivations.

In order for such speculation to be unfalsifiable, it seemingly has to be the case that we’re unable to ever develop good enough interpretability tools to definitively say whether the AI in question has such internal motivations. This could well turn out to be true, but I don’t understand how you’re able to predict this now. (Or maybe you mean something else by “unfalsifiable” but I can’t see what it could be. ETA: Maybe you mean “unfalsifiable with existing methods”?)

On the other hand, with your own proposed alignment method, we have to speculate about what scenarios an AI is likely to encounter. You could say that this is falsifiable (we just have to wait for the future to unfold), but is this actually an advantage?
- Nora Belrose 28 Feb 2024 0:57 UTC
  LW: 5 AF: 3
  −10
  AF Parent
  The point of that section is that “goals” are not ontologically fundamental entities with precise contents, in fact they could not possibly be so given a naturalistic worldview. So you don’t need to “target the inner search,” you just need to get the system to act the way you want in all the relevant scenarios.
  The modern world is not a relevant scenario for evolution. “Evolution” did not need to, was not “intending to,” and could not have designed human brains so that they would do high inclusive genetic fitness stuff even when the environment wildly dramatically changes and culture becomes completely different from the ancestral environment.
  - Wei Dai 28 Feb 2024 1:41 UTC
    19 points
    15
    Parent
    
    So you don’t need to “target the inner search,” you just need to get the system to act the way you want in all the relevant scenarios.
    
    Your original phrase was “all scenarios they are likely to encounter”, but now you’ve switched to “relevant scenarios”. Do you not acknowledge that these two phrases are semantically very different (or likely to be interpreted very differently by many readers), since the modern world is arguably a scenario that “they are likely to encounter” (given that they actually did encounter it) but you say “the modern world is not a relevant scenario for evolution”?
    
    Going forward, do you prefer to talk about “all scenarios they are likely to encounter”, or “relevant scenarios”, or both? If the latter, please clarify what you mean by “relevant”? (And please answer with respect to both evolution and AI alignment, in case the answer is different in the two cases. I’ll probably have more substantive things to say once we’ve cleared up the linguistic issues.)
    - Nora Belrose 28 Feb 2024 2:31 UTC
      2 points
      −9
      Parent
      No, I don’t think they are semantically very different. This seems like nitpicking. Obviously “they are likely to encounter” has to have some sort of time horizon attached to it, otherwise it would include times well past the heat death of the universe, or something.
      - Wei Dai 28 Feb 2024 3:04 UTC
        12 points
        6
        Parent
        It was not at all clear to me that you intended “they are likely to encounter” to have some sort of time horizon attached to it (as opposed to some other kind of restriction, or that you meant something pretty different from the literal meaning, or that your argument/idea itself was wrong), and it’s still not clear to me what sort of time horizon you have in mind.
        David Johnston 29 Feb 2024 7:06 UTC
        5 points
        4
        Parent
        The AI system builders’ time horizon seems to be a reasonable starting point