My problem with this post is that you seem to be applying a standard that basically asks,
“Are there formal arguments demonstrating that coherence ⟹goal-directedness?”
Whereas the question that I would ask is,
“Does coherence ⟹goal-directedness?”
This paragraph, for example
There’s a final issue with the whole setup of an agent traversing states: in the real world, and in examples like non-transitive travel, we never actually end up in quite the same state we started in. Perhaps we’ve gotten sunburned along the journey. Perhaps we spent a few minutes editing our next blog post. At the very least, we’re now slightly older, and we have new memories, and the sun’s position has changed a little. And so, just like with definition 2, no series of choices can ever demonstrate incoherent revealed preferences in the sense of definition 1, since every choice actually made is between a different set of possible states. (At the very least, they differ in the agent’s memories of which path it took to get there.⁴ And note that outcomes which are identical except for slight differences in memories should sometimes be treated in very different ways, since having even a few bits of additional information from exploration can be incredibly advantageous.)
Seems quite important when discussing the first question, but almost entirely irrelevant for the second question. It may be difficult to formalize the MDP such that it captures this, but obviously we don’t expect a superintelligent AI to be predictably stupid in the way Eliezer lines out, and this is true we can formalize it or not.
Speaking just about the latter question, I mainly see the ‘states don’t capture everything we care about’ argument as relevant:
When we do so, we find that it has several shortcomings—in particular, it rules out some preferences which seem to be reasonable and natural ones. For example, suppose you want to write a book which is so timeless that at least one person reads it every year for the next thousand years. [...]
But even with this one—it’s an argument that [the smallest possible class of which we can say that it will capture everything the agent cares about] will be larger than the set of states. That does not mean that it will be so large that the implication of the VNM theorem stops being scary. We know that it won’t be as large as [state-trajectories] because that would allow AI’s that are visibly stupid, so it’ll be some intermediate class. Is that large enough for the implication of the VNM to be substantially less meaningful? My honest best guess is still ‘probably not’.
What is missing here is an argument that the VNM theorem does have important implications in settings where its assumptions are not true. Nobody has made this argument. I agree it’s suggestive, but that’s very far from demonstrating that AGIs will necessarily be ruthlessly maximising some simple utility function.
“obviously we don’t expect a superintelligent AI to be predictably stupid in the way Eliezer lines out”
Eliezer argued that superintelligences will have certain types of goals, because of the VNM theorem. If they have different types of goals, then behaviour which violates VNM is no longer “predictably stupid”. For example, if I have a deontological goal, then maybe violating VNM is the best strategy.
My problem with this post is that you seem to be applying a standard that basically asks,
“Are there formal arguments demonstrating that coherence ⟹goal-directedness?”
Whereas the question that I would ask is,
“Does coherence ⟹goal-directedness?”
This paragraph, for example
Seems quite important when discussing the first question, but almost entirely irrelevant for the second question. It may be difficult to formalize the MDP such that it captures this, but obviously we don’t expect a superintelligent AI to be predictably stupid in the way Eliezer lines out, and this is true we can formalize it or not.
Speaking just about the latter question, I mainly see the ‘states don’t capture everything we care about’ argument as relevant:
But even with this one—it’s an argument that [the smallest possible class of which we can say that it will capture everything the agent cares about] will be larger than the set of states. That does not mean that it will be so large that the implication of the VNM theorem stops being scary. We know that it won’t be as large as [state-trajectories] because that would allow AI’s that are visibly stupid, so it’ll be some intermediate class. Is that large enough for the implication of the VNM to be substantially less meaningful? My honest best guess is still ‘probably not’.
What is missing here is an argument that the VNM theorem does have important implications in settings where its assumptions are not true. Nobody has made this argument. I agree it’s suggestive, but that’s very far from demonstrating that AGIs will necessarily be ruthlessly maximising some simple utility function.
“obviously we don’t expect a superintelligent AI to be predictably stupid in the way Eliezer lines out”
Eliezer argued that superintelligences will have certain types of goals, because of the VNM theorem. If they have different types of goals, then behaviour which violates VNM is no longer “predictably stupid”. For example, if I have a deontological goal, then maybe violating VNM is the best strategy.