A year later, I continue to agree with this post; I still think its primary argument is sound and important. I’m somewhat sad that I still think it is important; I thought this was an obvious-once-pointed-out point, but I do not think the community actually believes it yet.
I think the post is important, because it constrains the types of valid arguments that can be given for ‘freaking out about goal-directedness’, for lack of a better term.”
“Constraining the types of valid arguments” is exactly the right way to describe the post. Many responses to the post have been of the form “this is missing the point of EU maximization arguments”, and yes, the post is deliberately missing that point. The post is not saying that arguments for AI risk are wrong, just that they are based on intuitions and not provable theorems. While I do think that we are likely to build goal-directed agents, I do not think the VNM theorem and similar arguments support that claim: they simply describe how a goal-directed agent should think.
However, talks like AI Alignment: Why It’s Hard, and Where to Start and posts like Coherent decisions imply consistent utilities seem to claim that “VNM and similar theorems” implies “goal-directed agents”. While there has been some disagreement over whether this claim is actually present, it doesn’t really matter—readers come away with that impression. I see this post as correcting that claim; it would have been extremely useful for me to read this post a little over two years ago, and anecdotally I have heard that others have found it useful as well.
I am somewhat worried that if readers who read this post in isolation will get the wrong impression, since it really was meant as part of the sequence. For example, I think Brangus’ comment post is proposing an interpretation of “goal-directedness” that I proposed and argued against in the previous post (see also my response, which mostly quotes the previous post). Similarly, I sometimes hear the counterargument that there will be economic pressures towards goal-directed AI, even though this position is compatible with the post and addressed in the next post. I’m not sure how to solve this though, without just having both the previous and next posts appended to this post. (Part of the problem is that different people have different responses to the post, so it’s hard to address all of them without adding a ton of words.) ETA: Perhaps adding the thoughts in this comment?
+1, I would have written my own review, but I think I basically just agree with everything in this one (and to the extent I wanted to further elaborate on the post, I’ve already done so here).
A year later, I continue to agree with this post; I still think its primary argument is sound and important. I’m somewhat sad that I still think it is important; I thought this was an obvious-once-pointed-out point, but I do not think the community actually believes it yet.
I particularly agree with this sentence of Daniel’s review:
“Constraining the types of valid arguments” is exactly the right way to describe the post. Many responses to the post have been of the form “this is missing the point of EU maximization arguments”, and yes, the post is deliberately missing that point. The post is not saying that arguments for AI risk are wrong, just that they are based on intuitions and not provable theorems. While I do think that we are likely to build goal-directed agents, I do not think the VNM theorem and similar arguments support that claim: they simply describe how a goal-directed agent should think.
However, talks like AI Alignment: Why It’s Hard, and Where to Start and posts like Coherent decisions imply consistent utilities seem to claim that “VNM and similar theorems” implies “goal-directed agents”. While there has been some disagreement over whether this claim is actually present, it doesn’t really matter—readers come away with that impression. I see this post as correcting that claim; it would have been extremely useful for me to read this post a little over two years ago, and anecdotally I have heard that others have found it useful as well.
I am somewhat worried that if readers who read this post in isolation will get the wrong impression, since it really was meant as part of the sequence. For example, I think Brangus’ comment post is proposing an interpretation of “goal-directedness” that I proposed and argued against in the previous post (see also my response, which mostly quotes the previous post). Similarly, I sometimes hear the counterargument that there will be economic pressures towards goal-directed AI, even though this position is compatible with the post and addressed in the next post. I’m not sure how to solve this though, without just having both the previous and next posts appended to this post. (Part of the problem is that different people have different responses to the post, so it’s hard to address all of them without adding a ton of words.) ETA: Perhaps adding the thoughts in this comment?
+1, I would have written my own review, but I think I basically just agree with everything in this one (and to the extent I wanted to further elaborate on the post, I’ve already done so here).