Rohin Shah comments on Coherent decisions imply consistent utilities

Rohin Shah 14 Jan 2021 22:28 UTC
17 points
0
Exactly which objection are you talking about here?
If it’s something like “coherence theorems do not say that tool AI is not a thing”, that seems true.
Yes, I think that is basically the main thing I’m claiming.
But then you also make claims like “all behavior can be rationalized as EU maximization”, which is wildly misleading.
I tried to be clear that my argument was “you need more assumptions beyond just coherence arguments on universe-histories; if you have literally no other assumptions then all behavior can be rationalized as EU maximization”. I think the phrase “all behavior can be rationalized as EU maximization” or something like it was basically necessary to get across the argument that I was making. I agree that taken in isolation it is misleading; I don’t really see what I could have done differently to prevent there from being something that in isolation was misleading, while still being able to point out the-thing-that-I-believe-is-fallacious. Nuance is hard.
(Also, it should be noted that you are not in the intended audience for that post; I expect that to you the point feels obvious enough so as not to be worth stating, and so overall it feels like I’m just being misleading. If everyone were similar to you I would not have bothered to write that post.)
Also, the “preferences over universe-histories” argument doesn’t work as well when we specify the full counterfactual behavior of a system, which is something we can do quite well in practice.
I agree that if you have counterfactual behavior EU maximization is not vacuous. I don’t think that this meaningfully changes the upshot (which is “coherence arguments, by themselves without any other assumptions on the structure of the world or the space of utility functions, do not imply AI risk”). It might meaningfully change the title of the post (perhaps they do imply goal-directed behavior in some sense), though in that case I’d change the title to “Coherence arguments do not imply AI risk” and I think it’s effectively the same post.
Mostly though, I’m wondering how exactly you use counterfactual behavior in an argument for AI risk. Like, the argument I was arguing against is extremely abstract, and just claims that the AI is “intelligent” / “coherent”. How do you use that to get counterfactual behavior for the AI system?
I agree that for any given AI system, we could probably gain a bunch of knowledge about its counterfactual behavior, and then reason about how coherent it is and how goal-directed it is. But this is a fundamentally different thing than the thing I was talking about (which is just: can we abstractly argue for AI risk without talking about details of the system beyond “it is intelligent”?)
My argument is that coherence theorems do not apply nontrivially to any arbitrary system, so they could still potentially tell us interesting things about which systems are/aren’t <agenty/dangerous/etc>.
I agree with this.
There may be good arguments for why coherence theorems are the wrong way to think about goal-directedness, but “everything can be viewed as EU maximization” is not one of them.
I actually also agree with this, and was not trying to argue that coherence arguments are irrelevant to “goal-directedness” or “being a good agent”—I’ve already mentioned that I personally do things differently thanks to my knowledge of coherence arguments.
Just how narrow a setting are you considering here? Limited resources are everywhere. Even an e-coli needs to efficiently use limited resources. Indeed, I expect coherence theorems to say nontrivial things about an e-coli swimming around in search of food (and this includes the possibility that the nontrivial things the theorem says could turn out to be empirically wrong, which in turn would tell us nontrivial things about e-coli and/or selection pressures, and possibly point to better coherence theorems).
I agree that if you take any particular system and try to make predictions, the necessary assumptions (such as “what counts as a limited resource”) will often be easy and obvious and the coherence theorems do have content in such situations. It’s the abstract argument that feels flawed to me.
I somewhat expect your response will be “why would anyone be applying coherence arguments in such a ridiculously abstract way rather than studying a concrete system”, to which I would say that you are not in the intended audience.
----
Fwiw thinking this through has made me feel better about including it in the Alignment book than I did before, though I’m still overall opposed. (I do still think it is a good fit for other books.)
What links here?
- A Simple Toy Coherence Theorem by johnswentworth (2 Aug 2024 17:47 UTC; 78 points)
- johnswentworth 14 Jan 2021 23:21 UTC
  12 points
  0
  Parent
  I somewhat expect your response will be “why would anyone be applying coherence arguments in such a ridiculously abstract way rather than studying a concrete system”, to which I would say that you are not in the intended audience.
  Ok, this is a fair answer. I think you and I, at least, are basically aligned here.
  I do think a lot of people took away from your post something like “all behavior can be rationalized as EU maximization”, and in particular I think a lot of people walked away with the impression that usefully applying coherence arguments to systems in our particular universe is much more rare/difficult than it actually is. But I can’t fault you much for some of your readers not paying sufficiently close attention, especially when my review at the top of this thread is largely me complaining about how people missed nuances in this post.
  - Ben Pace 15 Jan 2021 0:43 UTC
    2 points
    0
    Parent
    (Once again, great use of that link)