TurnTrout comments on [AN #167]: Concrete ML safety problems and their relevance to x-risk

TurnTrout 20 Oct 2021 21:32 UTC
LW: 6 AF: 5
AF
My point in that post is that coherence arguments alone are not enough, you need to combine them with some other assumption (for example, that there exists some “resource” over which the agent has no terminal preferences).
Coherence arguments sometimes are enough, depending on what the agent is coherent over.
- Rohin Shah 21 Oct 2021 9:16 UTC
  LW: 2 AF: 2
  AF Parent
  depending on what the agent is coherent over.
  That’s an assumption :P (And it’s also not one that’s obviously true, at least according to me.)
  - TurnTrout 21 Oct 2021 12:31 UTC
    LW: 2 AF: 2
    AF Parent
    What is the extra assumption? If you’re making a coherence argument, that already specifies the domain of coherence, no? And so I’m not making any more assumptions than the original coherence argument did (whatever that argument was). I agree that the original coherence argument can fail, though.
    - Rohin Shah 22 Oct 2021 22:30 UTC
      LW: 4 AF: 4
      AF Parent
      I think we’re just debating semantics of the word “assumption”.
      Consider the argument:
      A superintelligent AI will be VNM-rational, and therefore it will pursue convergent instrumental subgoals
      I think we both agree this is not a valid argument, or is at least missing some details about what the AI is VNM-rational over before it becomes a valid argument. That’s all I’m trying to say.
      Unimportant aside on terminology: I think in colloquial English it is reasonable to say that this is “missing an assumption”. I assume that you want to think of this as math. My best guess at how to turn the argument above into math would be something that looks like:
      $? ⟹ VNM rational over state-based outcomes$
      $VNM rational over state-based outcomes ⟹ Convergent instrumental subgoals$
      This still seems like “missing assumption”, since the thing filling the ? seems like an “assumption”.
      Maybe you’re like “Well, if you start with the setup of an agent that satisfies the VNM axioms over state-based outcomes, then you really do just need VNM to conclude ‘convergent instrumental subgoals’, so there’s no extra assumptions needed”. I just don’t start with such a setup; I’m always looking for arguments with the conclusion “in the real world, we have a non-trivial chance of building an agent that causes an existential catastrophe”. (Maybe readers don’t have the same inclination? That would surprise me, but is possible.)