My point in that post is that coherence arguments alone are not enough, you need to combine them with some other assumption (for example, that there exists some “resource” over which the agent has no terminal preferences).
Coherence arguments sometimes are enough, depending on what the agent is coherent over.
What is the extra assumption? If you’re making a coherence argument, that already specifies the domain of coherence, no? And so I’m not making any more assumptions than the original coherence argument did (whatever that argument was). I agree that the original coherence argument can fail, though.
I think we’re just debating semantics of the word “assumption”.
Consider the argument:
A superintelligent AI will be VNM-rational, and therefore it will pursue convergent instrumental subgoals
I think we both agree this is not a valid argument, or is at least missing some details about what the AI is VNM-rational over before it becomes a valid argument. That’s all I’m trying to say.
Unimportant aside on terminology: I think in colloquial English it is reasonable to say that this is “missing an assumption”. I assume that you want to think of this as math. My best guess at how to turn the argument above into math would be something that looks like:
?⟹VNM rational over state-based outcomes
VNM rational over state-based outcomes⟹Convergent instrumental subgoals
This still seems like “missing assumption”, since the thing filling the ? seems like an “assumption”.
Maybe you’re like “Well, if you start with the setup of an agent that satisfies the VNM axioms over state-based outcomes, then you really do just need VNM to conclude ‘convergent instrumental subgoals’, so there’s no extra assumptions needed”. I just don’t start with such a setup; I’m always looking for arguments with the conclusion “in the real world, we have a non-trivial chance of building an agent that causes an existential catastrophe”. (Maybe readers don’t have the same inclination? That would surprise me, but is possible.)
Coherence arguments sometimes are enough, depending on what the agent is coherent over.
That’s an assumption :P (And it’s also not one that’s obviously true, at least according to me.)
What is the extra assumption? If you’re making a coherence argument, that already specifies the domain of coherence, no? And so I’m not making any more assumptions than the original coherence argument did (whatever that argument was). I agree that the original coherence argument can fail, though.
I think we’re just debating semantics of the word “assumption”.
Consider the argument:
I think we both agree this is not a valid argument, or is at least missing some details about what the AI is VNM-rational over before it becomes a valid argument. That’s all I’m trying to say.
Unimportant aside on terminology: I think in colloquial English it is reasonable to say that this is “missing an assumption”. I assume that you want to think of this as math. My best guess at how to turn the argument above into math would be something that looks like:
?⟹VNM rational over state-based outcomes
VNM rational over state-based outcomes⟹Convergent instrumental subgoals
This still seems like “missing assumption”, since the thing filling the ? seems like an “assumption”.
Maybe you’re like “Well, if you start with the setup of an agent that satisfies the VNM axioms over state-based outcomes, then you really do just need VNM to conclude ‘convergent instrumental subgoals’, so there’s no extra assumptions needed”. I just don’t start with such a setup; I’m always looking for arguments with the conclusion “in the real world, we have a non-trivial chance of building an agent that causes an existential catastrophe”. (Maybe readers don’t have the same inclination? That would surprise me, but is possible.)