the first links to a post which argues against VNM on the basis that it assumes probabilities and preferences are already in the model
I assume this is my comment + post; I’m not entirely sure what you mean here. Perhaps you mean that I’m not modeling the world as having “external” probabilities that the agent has to handle; I agree that is true, but that is because in the use case I’m imagining (looking at the behavior of an AI system and determining what it is optimizing) you don’t get these “external” probabilities.
I expect that if these two commenters read the full essay, and think carefully about how the theorems Yudkowsky is discussing differ from VNM, then their objections will look very different.
I assure you I read this full post (well, the Arbital version of it) and thought carefully about it before making my post; my objections remain. I discussed VNM specifically because that’s the best-understood coherence theorem and the one that I see misused in AI alignment most often. (That being said, I don’t know the formal statements of other coherence theorems, though I predict with ~98% confidence that any specific theorem you point me to would not change my objection.)
Yes, if you add in some additional detail about resources, assume that you do not have preferences over how those resources are used, and assume that there are preferences over other things that can be affected using resources, then coherence theorems tell you something about how such agents act. This doesn’t seem all that relevant to the specific, narrow setting which I was considering.
I agree that coherence arguments (including VNM) can be useful, for example by:
Helping people make better decisions (e.g. becoming more comfortable with taking risk)
Reasoning about what AI systems would do, given stronger assumptions than the ones I used (e.g. if you assume there are “resources” that the AI system has no preferences over).
Nonetheless, within AI alignment, prior to my post I heard the VNM argument being misused all the time (by rationalists / LWers, less so by others); this has gone down since then but still happens.
I think e.g. this talk is sneaking in the “resources” assumption, without arguing for it or acknowledging its existence, and this often misleads people (including me) into thinking that AI risk is implied by math based on very simple axioms that are hard to disagree with.
----
On the review: I don’t think this post should be in the Alignment section of the review, without a significant rewrite / addition clarifying why exactly coherence arguments are useful or important for AI alignment. As such I will vote against it.
I would however support it being part of the non-Alignment section of the review; as I’ve said before, I generally really like coherence arguments and they influence my own decision-making a lot (in fact, a big part of the reason I started working in AI alignment was thinking about the Arrhenius paradox, which has a very similar coherence flavor).
I was referring mainly to Richard’s post here. You do seem to understand the issue of assuming (rather than deriving) probabilities.
I discussed VNM specifically because that’s the best-understood coherence theorem and the one that I see misused in AI alignment most often.
This I certainly agree with.
I don’t know the formal statements of other coherence theorems, though I predict with ~98% confidence that any specific theorem you point me to would not change my objection.
Exactly which objection are you talking about here?
If it’s something like “coherence theorems do not say that tool AI is not a thing”, that seems true. Even today humans have plenty of useful tools with some amount of information processing in them which are probably not usefully model-able as expected utility maximizers.
But then you also make claims like “all behavior can be rationalized as EU maximization”, which is wildly misleading. Given a system, the coherence theorems map a notion of resources/efficiency/outcomes to a notion of EU maximization. Sure, we can model any system as an EU maximizer this way, but only if we use a trivial/uninteresting notion of resources/efficiency/outcomes. For instance, as you noted, it’s not very interesting when “outcomes” refers to “universe-histories”. (Also, the “preferences over universe-histories” argument doesn’t work as well when we specify the full counterfactual behavior of a system, which is something we can do quite well in practice.)
Combining these points: your argument largely seems to be “coherence arguments apply to any arbitrary system, therefore they don’t tell us interesting things about which systems are/aren’t <agenty/dangerous/etc>”. (That summary isn’t exactly meant to pass an ITT, but please complain if it’s way off the mark.) My argument is that coherence theorems do not apply nontrivially to any arbitrary system, so they could still potentially tell us interesting things about which systems are/aren’t <agenty/dangerous/etc>. There may be good arguments for why coherence theorems are the wrong way to think about goal-directedness, but “everything can be viewed as EU maximization” is not one of them.
Yes, if you add in some additional detail about resources, assume that you do not have preferences over how those resources are used, and assume that there are preferences over other things that can be affected using resources, then coherence theorems tell you something about how such agents act. This doesn’t seem all that relevant to the specific, narrow setting which I was considering.
Just how narrow a setting are you considering here? Limited resources are everywhere. Even an e-coli needs to efficiently use limited resources. Indeed, I expect coherence theorems to say nontrivial things about an e-coli swimming around in search of food (and this includes the possibility that the nontrivial things the theorem says could turn out to be empirically wrong, which in turn would tell us nontrivial things about e-coli and/or selection pressures, and possibly point to better coherence theorems).
Exactly which objection are you talking about here?
If it’s something like “coherence theorems do not say that tool AI is not a thing”, that seems true.
Yes, I think that is basically the main thing I’m claiming.
But then you also make claims like “all behavior can be rationalized as EU maximization”, which is wildly misleading.
I tried to be clear that my argument was “you need more assumptions beyond just coherence arguments on universe-histories; if you have literally no other assumptions then all behavior can be rationalized as EU maximization”. I think the phrase “all behavior can be rationalized as EU maximization” or something like it was basically necessary to get across the argument that I was making. I agree that taken in isolation it is misleading; I don’t really see what I could have done differently to prevent there from being something that in isolation was misleading, while still being able to point out the-thing-that-I-believe-is-fallacious. Nuance is hard.
(Also, it should be noted that you are not in the intended audience for that post; I expect that to you the point feels obvious enough so as not to be worth stating, and so overall it feels like I’m just being misleading. If everyone were similar to you I would not have bothered to write that post.)
Also, the “preferences over universe-histories” argument doesn’t work as well when we specify the full counterfactual behavior of a system, which is something we can do quite well in practice.
I agree that if you have counterfactual behavior EU maximization is not vacuous. I don’t think that this meaningfully changes the upshot (which is “coherence arguments, by themselves without any other assumptions on the structure of the world or the space of utility functions, do not imply AI risk”). It might meaningfully change the title of the post (perhaps they do imply goal-directed behavior in some sense), though in that case I’d change the title to “Coherence arguments do not imply AI risk” and I think it’s effectively the same post.
Mostly though, I’m wondering how exactly you use counterfactual behavior in an argument for AI risk. Like, the argument I was arguing against is extremely abstract, and just claims that the AI is “intelligent” / “coherent”. How do you use that to get counterfactual behavior for the AI system?
I agree that for any given AI system, we could probably gain a bunch of knowledge about its counterfactual behavior, and then reason about how coherent it is and how goal-directed it is. But this is a fundamentally different thing than the thing I was talking about (which is just: can we abstractly argue for AI risk without talking about details of the system beyond “it is intelligent”?)
My argument is that coherence theorems do not apply nontrivially to any arbitrary system, so they could still potentially tell us interesting things about which systems are/aren’t <agenty/dangerous/etc>.
I agree with this.
There may be good arguments for why coherence theorems are the wrong way to think about goal-directedness, but “everything can be viewed as EU maximization” is not one of them.
I actually also agree with this, and was not trying to argue that coherence arguments are irrelevant to “goal-directedness” or “being a good agent”—I’ve already mentioned that I personally do things differently thanks to my knowledge of coherence arguments.
Just how narrow a setting are you considering here? Limited resources are everywhere. Even an e-coli needs to efficiently use limited resources. Indeed, I expect coherence theorems to say nontrivial things about an e-coli swimming around in search of food (and this includes the possibility that the nontrivial things the theorem says could turn out to be empirically wrong, which in turn would tell us nontrivial things about e-coli and/or selection pressures, and possibly point to better coherence theorems).
I agree that if you take any particular system and try to make predictions, the necessary assumptions (such as “what counts as a limited resource”) will often be easy and obvious and the coherence theorems do have content in such situations. It’s the abstract argument that feels flawed to me.
I somewhat expect your response will be “why would anyone be applying coherence arguments in such a ridiculously abstract way rather than studying a concrete system”, to which I would say that you are not in the intended audience.
----
Fwiw thinking this through has made me feel better about including it in the Alignment book than I did before, though I’m still overall opposed. (I do still think it is a good fit for other books.)
I somewhat expect your response will be “why would anyone be applying coherence arguments in such a ridiculously abstract way rather than studying a concrete system”, to which I would say that you are not in the intended audience.
Ok, this is a fair answer. I think you and I, at least, are basically aligned here.
I do think a lot of people took away from your post something like “all behavior can be rationalized as EU maximization”, and in particular I think a lot of people walked away with the impression that usefully applying coherence arguments to systems in our particular universe is much more rare/difficult than it actually is. But I can’t fault you much for some of your readers not paying sufficiently close attention, especially when my review at the top of this thread is largely me complaining about how people missed nuances in this post.
On the review: I don’t think this post should be in the Alignment section of the review, without a significant rewrite / addition clarifying why exactly coherence arguments are useful or important for AI alignment.
Assuming that one accepts the arguments against coherence arguments being important for alignment (as I tentatively do), I don’t see why that means this shouldn’t be included in the Alignment section.
The motivation for this post was its relevance to alignment. People think about it in the context of alignment. If subsequent arguments indicate that it’s misguided, I don’t see why that means it shouldn’t be considered (from a historical perspective) to have been in the alignment stream of work (along with the arguments against it).
(Though, I suppose if there’s another category that seems like a more exact match, that seems like a fine reason to put it in that section rather than the Alignment section.)
Does that make sense? Is your concern that people will see this in the Alignment section, and not see the arguments against the connection, and continue to be misled?
I actually think it shouldn’t be in the alignment section, though for different reasons than Rohin. There’s lots of things which can be applied to AI, but are a lot more general, and I think it’s usually better to separate the “here’s the general idea” presentation from the “here’s how it applies to AI” presentation. That way, people working on other interesting things can come along and notice the idea and try to apply it in their own area rather than getting scared off by the label.
For instance, I think there’s probably gains to be had from applying coherence theorems to biological systems. I would love it if some rationalist biologist came along, read Yudkowsky’s post, and said “wait a minute, cells need to make efficient use of energy/limited molecules/etc, can I apply that?”. That sort of thing becomes less likely if this sort of post is hiding in “the alignment section”.
Zooming out further… today, alignment is the only technical research area with a lot of discussion on LW, and I think it would be a near-pareto improvement if more such fields were drawn in. Taking things which are alignment-relevant-but-not-just-alignment and lumping them all under the alignment heading makes that less likely.
It seems weird to include a post in the book if we believe that it is misguided, just because people historically believed it. If I were making this book, I would not include such posts; I’d want an “LW Review” to focus on things that are true and useful, rather than historically interesting.
That being said, I haven’t thought much about the goals of the book, and if we want to include posts for the sake of history, then sure, include the post. That was just not my impression about the goal.
Is your concern that people will see this in the Alignment section, and not see the arguments against the connection, and continue to be misled?
I would have this concern, yes, but I’m happy to defer (in the sense of “not pushing”, rather than the sense of “adopting their beliefs as my own”) to the opinions of the people who have thought way more than me about the purpose of this review and the book, and have caused it to happen. If they are interested in including historically important essays that we now think are misguided, I wouldn’t object. I predict that they would prefer not to include such essays but of course I could be wrong about that.
I assume this is my comment + post; I’m not entirely sure what you mean here. Perhaps you mean that I’m not modeling the world as having “external” probabilities that the agent has to handle; I agree that is true, but that is because in the use case I’m imagining (looking at the behavior of an AI system and determining what it is optimizing) you don’t get these “external” probabilities.
I assure you I read this full post (well, the Arbital version of it) and thought carefully about it before making my post; my objections remain. I discussed VNM specifically because that’s the best-understood coherence theorem and the one that I see misused in AI alignment most often. (That being said, I don’t know the formal statements of other coherence theorems, though I predict with ~98% confidence that any specific theorem you point me to would not change my objection.)
Yes, if you add in some additional detail about resources, assume that you do not have preferences over how those resources are used, and assume that there are preferences over other things that can be affected using resources, then coherence theorems tell you something about how such agents act. This doesn’t seem all that relevant to the specific, narrow setting which I was considering.
I agree that coherence arguments (including VNM) can be useful, for example by:
Helping people make better decisions (e.g. becoming more comfortable with taking risk)
Reasoning about what AI systems would do, given stronger assumptions than the ones I used (e.g. if you assume there are “resources” that the AI system has no preferences over).
Nonetheless, within AI alignment, prior to my post I heard the VNM argument being misused all the time (by rationalists / LWers, less so by others); this has gone down since then but still happens.
I think e.g. this talk is sneaking in the “resources” assumption, without arguing for it or acknowledging its existence, and this often misleads people (including me) into thinking that AI risk is implied by math based on very simple axioms that are hard to disagree with.
----
On the review: I don’t think this post should be in the Alignment section of the review, without a significant rewrite / addition clarifying why exactly coherence arguments are useful or important for AI alignment. As such I will vote against it.
I would however support it being part of the non-Alignment section of the review; as I’ve said before, I generally really like coherence arguments and they influence my own decision-making a lot (in fact, a big part of the reason I started working in AI alignment was thinking about the Arrhenius paradox, which has a very similar coherence flavor).
I was referring mainly to Richard’s post here. You do seem to understand the issue of assuming (rather than deriving) probabilities.
This I certainly agree with.
Exactly which objection are you talking about here?
If it’s something like “coherence theorems do not say that tool AI is not a thing”, that seems true. Even today humans have plenty of useful tools with some amount of information processing in them which are probably not usefully model-able as expected utility maximizers.
But then you also make claims like “all behavior can be rationalized as EU maximization”, which is wildly misleading. Given a system, the coherence theorems map a notion of resources/efficiency/outcomes to a notion of EU maximization. Sure, we can model any system as an EU maximizer this way, but only if we use a trivial/uninteresting notion of resources/efficiency/outcomes. For instance, as you noted, it’s not very interesting when “outcomes” refers to “universe-histories”. (Also, the “preferences over universe-histories” argument doesn’t work as well when we specify the full counterfactual behavior of a system, which is something we can do quite well in practice.)
Combining these points: your argument largely seems to be “coherence arguments apply to any arbitrary system, therefore they don’t tell us interesting things about which systems are/aren’t <agenty/dangerous/etc>”. (That summary isn’t exactly meant to pass an ITT, but please complain if it’s way off the mark.) My argument is that coherence theorems do not apply nontrivially to any arbitrary system, so they could still potentially tell us interesting things about which systems are/aren’t <agenty/dangerous/etc>. There may be good arguments for why coherence theorems are the wrong way to think about goal-directedness, but “everything can be viewed as EU maximization” is not one of them.
Just how narrow a setting are you considering here? Limited resources are everywhere. Even an e-coli needs to efficiently use limited resources. Indeed, I expect coherence theorems to say nontrivial things about an e-coli swimming around in search of food (and this includes the possibility that the nontrivial things the theorem says could turn out to be empirically wrong, which in turn would tell us nontrivial things about e-coli and/or selection pressures, and possibly point to better coherence theorems).
Yes, I think that is basically the main thing I’m claiming.
I tried to be clear that my argument was “you need more assumptions beyond just coherence arguments on universe-histories; if you have literally no other assumptions then all behavior can be rationalized as EU maximization”. I think the phrase “all behavior can be rationalized as EU maximization” or something like it was basically necessary to get across the argument that I was making. I agree that taken in isolation it is misleading; I don’t really see what I could have done differently to prevent there from being something that in isolation was misleading, while still being able to point out the-thing-that-I-believe-is-fallacious. Nuance is hard.
(Also, it should be noted that you are not in the intended audience for that post; I expect that to you the point feels obvious enough so as not to be worth stating, and so overall it feels like I’m just being misleading. If everyone were similar to you I would not have bothered to write that post.)
I agree that if you have counterfactual behavior EU maximization is not vacuous. I don’t think that this meaningfully changes the upshot (which is “coherence arguments, by themselves without any other assumptions on the structure of the world or the space of utility functions, do not imply AI risk”). It might meaningfully change the title of the post (perhaps they do imply goal-directed behavior in some sense), though in that case I’d change the title to “Coherence arguments do not imply AI risk” and I think it’s effectively the same post.
Mostly though, I’m wondering how exactly you use counterfactual behavior in an argument for AI risk. Like, the argument I was arguing against is extremely abstract, and just claims that the AI is “intelligent” / “coherent”. How do you use that to get counterfactual behavior for the AI system?
I agree that for any given AI system, we could probably gain a bunch of knowledge about its counterfactual behavior, and then reason about how coherent it is and how goal-directed it is. But this is a fundamentally different thing than the thing I was talking about (which is just: can we abstractly argue for AI risk without talking about details of the system beyond “it is intelligent”?)
I agree with this.
I actually also agree with this, and was not trying to argue that coherence arguments are irrelevant to “goal-directedness” or “being a good agent”—I’ve already mentioned that I personally do things differently thanks to my knowledge of coherence arguments.
I agree that if you take any particular system and try to make predictions, the necessary assumptions (such as “what counts as a limited resource”) will often be easy and obvious and the coherence theorems do have content in such situations. It’s the abstract argument that feels flawed to me.
I somewhat expect your response will be “why would anyone be applying coherence arguments in such a ridiculously abstract way rather than studying a concrete system”, to which I would say that you are not in the intended audience.
----
Fwiw thinking this through has made me feel better about including it in the Alignment book than I did before, though I’m still overall opposed. (I do still think it is a good fit for other books.)
Ok, this is a fair answer. I think you and I, at least, are basically aligned here.
I do think a lot of people took away from your post something like “all behavior can be rationalized as EU maximization”, and in particular I think a lot of people walked away with the impression that usefully applying coherence arguments to systems in our particular universe is much more rare/difficult than it actually is. But I can’t fault you much for some of your readers not paying sufficiently close attention, especially when my review at the top of this thread is largely me complaining about how people missed nuances in this post.
(Once again, great use of that link)
Assuming that one accepts the arguments against coherence arguments being important for alignment (as I tentatively do), I don’t see why that means this shouldn’t be included in the Alignment section.
The motivation for this post was its relevance to alignment. People think about it in the context of alignment. If subsequent arguments indicate that it’s misguided, I don’t see why that means it shouldn’t be considered (from a historical perspective) to have been in the alignment stream of work (along with the arguments against it).
(Though, I suppose if there’s another category that seems like a more exact match, that seems like a fine reason to put it in that section rather than the Alignment section.)
Does that make sense? Is your concern that people will see this in the Alignment section, and not see the arguments against the connection, and continue to be misled?
I actually think it shouldn’t be in the alignment section, though for different reasons than Rohin. There’s lots of things which can be applied to AI, but are a lot more general, and I think it’s usually better to separate the “here’s the general idea” presentation from the “here’s how it applies to AI” presentation. That way, people working on other interesting things can come along and notice the idea and try to apply it in their own area rather than getting scared off by the label.
For instance, I think there’s probably gains to be had from applying coherence theorems to biological systems. I would love it if some rationalist biologist came along, read Yudkowsky’s post, and said “wait a minute, cells need to make efficient use of energy/limited molecules/etc, can I apply that?”. That sort of thing becomes less likely if this sort of post is hiding in “the alignment section”.
Zooming out further… today, alignment is the only technical research area with a lot of discussion on LW, and I think it would be a near-pareto improvement if more such fields were drawn in. Taking things which are alignment-relevant-but-not-just-alignment and lumping them all under the alignment heading makes that less likely.
That makes a lot of sense to me. Good points!
It seems weird to include a post in the book if we believe that it is misguided, just because people historically believed it. If I were making this book, I would not include such posts; I’d want an “LW Review” to focus on things that are true and useful, rather than historically interesting.
That being said, I haven’t thought much about the goals of the book, and if we want to include posts for the sake of history, then sure, include the post. That was just not my impression about the goal.
I would have this concern, yes, but I’m happy to defer (in the sense of “not pushing”, rather than the sense of “adopting their beliefs as my own”) to the opinions of the people who have thought way more than me about the purpose of this review and the book, and have caused it to happen. If they are interested in including historically important essays that we now think are misguided, I wouldn’t object. I predict that they would prefer not to include such essays but of course I could be wrong about that.