habryka comments on There are no coherence theorems

habryka 21 Feb 2023 1:01 UTC
LW: 60 AF: 28
55
AF
Crossposting this comment from the EA Forum:
Nuno says:
I appreciate the whole post. But I personally really enjoyed the appendix. In particular, I found it informative that Yudkowsk can speak/write with that level of authoritativeness, confidence, and disdain for others who disagree, and still be wrong (if this post is right).
I respond:
(if this post is right)
The post does actually seem wrong though.
I expect someone to write a comment with the details at some point (I am pretty busy right now, so can only give a quick meta-level gleam), but mostly, I feel like in order to argue that something is wrong with these arguments is that you have to argue more compellingly against completeness and possible alternative ways to establish dutch-book arguments.
Also, the title of “there are no coherence arguments” is just straightforwardly wrong. The theorems cited are of course real theorems, they are relevant to agents acting with a certain kind of coherence, and I don’t really understand the semantic argument that is happening where it’s trying to say that the cited theorems aren’t talking about “coherence”, when like, they clearly are.
You can argue that the theorems are wrong, or that the explicit assumptions of the theorems don’t hold, which many people have done, but like, there are still coherence theorems, and IMO completeness seems quite reasonable to me and the argument here seems very weak (and I would urge the author to create an actual concrete situation that doesn’t seem very dumb in which a highly intelligence, powerful and economically useful system has non-complete preferences).
The whole section at the end feels very confused to me. The author asserts that there is “an error” where people assert that “there are coherence theorems”, but man, that just seems like such a weird thing to argue for. Of course there are theorems that are relevant to the question of agent coherence, all of these seem really quite relevant. They might not prove the things in-practice, as many theorems tend to do, and you are open to arguing about that, but that doesn’t really change whether they are theorems.
Like, I feel like with the same type of argument that is made in the post I could write a post saying “there are no voting impossibility theorems” and then go ahead and argue that the Arrow’s Impossibility Theorem assumptions are not universally proven, and then accuse everyone who ever talked about voting impossibility theorems that they are making “an error” since “those things are not real theorems”. And I think everyone working on voting-adjacent impossibility theorems would be pretty justifiedly annoyed by this.
What links here?
- sunwillrise's comment on Alignment: “Do what I would have wanted you to do” by Oleg Trott (13 Jul 2024 19:29 UTC; 5 points)
- Elliott Thornley (EJT) 21 Feb 2023 1:22 UTC
  LW: 41 AF: 16
  35
  AF Parent
  I’m following previous authors in defining ‘coherence theorems’ as
  theorems which state that, unless an agent can be represented as maximizing expected utility, that agent is liable to pursue strategies that are dominated by some other available strategy.
  On that definition, there are no coherence theorems. VNM is not a coherence theorem, nor is Savage’s Theorem, nor is Bolker-Jeffrey, nor are Dutch Book Arguments, nor is Cox’s Theorem, nor is the Complete Class Theorem.
  there are theorems that are relevant to the question of agent coherence
  I’d have no problem with authors making that claim.
  I would urge the author to create an actual concrete situation that doesn’t seem very dumb in which a highly intelligence, powerful and economically useful system has non-complete preferences
  Working on it.
  - Adele Lopez 21 Feb 2023 1:52 UTC
    LW: 17 AF: 12
    6
    AF Parent
    
    theorems which state that, unless an agent can be represented as maximizing expected utility, that agent is liable to pursue strategies that are dominated by some other available strategy.
    
    While I agree that such theorems would count as coherence theorems, I wouldn’t consider this to cover most things I think of as coherence theorems, and as such is simply a bad definition.
    
    I think of coherence theorems loosely as things that say if an agent follows such and such principles, then we can prove it will have a certain property. The usefulness comes from both directions: to the extent the principles seem like good things to have, we’re justified in assuming a certain property, and to the extent that the property seems too strong or whatever, then one of these principles will have to break.
    - Elliott Thornley (EJT) 21 Feb 2023 2:53 UTC
      LW: 6 AF: 5
      10
      AF Parent
      I think of coherence theorems loosely as things that say if an agent follows such and such principles, then we can prove it will have a certain property.
      If you use this definition, then VNM (etc.) counts as a coherence theorem. But Premise 1 of the coherence argument (as I’ve rendered it) remains false, and so you can’t use the coherence argument to get the conclusion that sufficiently-advanced artificial agents will be representable as maximizing expected utility.
      - habryka 21 Feb 2023 3:29 UTC
        16 points
        13
        Parent
        I don’t think the majority of the papers that you cite made the argument that coherence arguments prove that any sufficiently-advanced AI will be representable as maximizing expected utility. Indeed I am very confident almost everyone you cite does not believe this, since it is a very strong claim. Many of the quotes you give even explicitly say this:
        then you will make strictly worse choices by your own lights than if you followed some alternate EU-maximizing strategy (at least in some situations, though they may not arise)
        The emphasis here is important.
        I don’t think really any of the other quotes you cite make the strong claim you are arguing against. Indeed it is trivially easy to think of an extremely powerful AI that is VNM rational in all situations except for one tiny thing that does not matter or will never come up. Technically it’s preferences can now not be represented by a utility function, but that’s not very relevant to the core arguments at hand, and I feel like in your arguments you are trying to tear down some strawman of some extreme position that I don’t think anyone holds.
        Eliezer has also explicitly written about it being possible to design superintelligences that reflectively coherently believe in logical falsehoods. He thinks this is possible, just very difficult. That alone would also violate VNM rationality.
        Elliott Thornley (EJT) 21 Feb 2023 19:47 UTC
        22 points
        14
        Parent
        You misunderstand me (and I apologize for that. I now think I should have made this clear in the post). I’m arguing against the following weak claim:
        For any agent who cannot be represented as maximizing expected utility, there is at least some situation in which that agent will pursue a dominated strategy.
        And my argument is:
        There are no theorems which state or imply that claim. VNM doesn’t, Savage doesn’t, Bolker-Jeffrey doesn’t, Dutch Books don’t, Cox doesn’t, Complete Class doesn’t.
        Money-pump arguments for the claim are not particularly convincing (for the reasons that I give in the post).
        ‘The relevant situations may not arise’ is a different objection. It’s not the one that I’m making.
- Said Achmiz 21 Feb 2023 4:00 UTC
  30 points
  3
  Parent
  
  (and I would urge the author to create an actual concrete situation that doesn’t seem very dumb in which a highly intelligence, powerful and economically useful system has non-complete preferences)
  
  Please see this old comment and this one.
  - habryka 21 Feb 2023 4:19 UTC
    8 points
    2
    Parent
    These are both great! I now find that I have strong-upvoted them both at the time. Indeed, I think this kind of concreteness feels like it does actually help the discussion quite a bit.
    I also quite liked John’s post on this topic: https://www.lesswrong.com/posts/3xF66BNSC5caZuKyC/why-subagents
- NunoSempere 21 Feb 2023 2:42 UTC
  LW: 21 AF: 10
  15
  AF Parent
  Copying my response from the EA forum:
  (if this post is right)
  The post does actually seem wrong though.
  Glad that I added the caveat.
  Also, the title of “there are no coherence arguments” is just straightforwardly wrong. The theorems cited are of course real theorems, they are relevant to agents acting with a certain kind of coherence, and I don’t really understand the semantic argument that is happening where it’s trying to say that the cited theorems aren’t talking about “coherence”, when like, they clearly are.
  Well, part of the semantic nuance is that we don’t care as much about the coherence theorems that do exist if they will fail to apply to current and future machines
  IMO completeness seems quite reasonable to me and the argument here seems very weak (and I would urge the author to create an actual concrete situation that doesn’t seem very dumb in which a highly intelligence, powerful and economically useful system has non-complete preferences).
  Here are some scenarios:
  - Our highly intelligent system notices that to have complete preferences over all trades would be too computationally expensive, and thus is willing to accept some, even a large degree of incompleteness.
  - The highly intelligent system learns to mimic the values of human, which end up having non-complete preferences, which the agent mimics
  - You train a powerful system to do some stuff, but also to detect when it is out of distribution and in that case do nothing. Assuming you can do that, their preference is incomplete, since when offered tradeoffs they always take the default option when out of distribution.
  The whole section at the end feels very confused to me. The author asserts that there is “an error” where people assert that “there are coherence theorems”, but man, that just seems like such a weird thing to argue for. Of course there are theorems that are relevant to the question of agent coherence, all of these seem really quite relevant. They might not prove the things in-practice, as many theorems tend to do.
  Mmh, then it would be good to differentiate between:
  - There are coherence theorems that talk about some agents with some properties
  - There are coherence theorems that prove that AI systems as will soon exist in the future will be optimizing utility functions
  You could also say a third thing, which would be: there are coherence theorems that strongly hint that AI systems as will soon exist in the future will be optimizing utility functions. They don’t prove it, but they make it highly probable because of such and such. In which case having more detail on the such and such would deflate most of the arguments in this post, for me.
  For instance:
  “‘Coherence arguments’ mean that if you don’t maximize ‘expected utility’ (EU)—that is, if you don’t make every choice in accordance with what gets the highest average score, given consistent preferability scores that you assign to all outcomes—then you will make strictly worse choices by your own lights than if you followed some alternate EU-maximizing strategy (at least in some situations, though they may not arise). For instance, you’ll be vulnerable to ‘money-pumping’—being predictably parted from your money for nothing.
  This is just false, because it is not taking into account the cost of doing expected value maximization, since giving consistent preferability scores is just very expensive and hard to do reliably. Like, when I poll people for their preferability scores, they give inconsistent estimates. Instead, they could be doing some expected utility maximization, but the evaluation steps are so expensive that I now basically don’t bother to do some more hardcore approximation of expected value for individuals, but for large projects and organizations. And even then, I’m still taking shortcuts and monkey-patches, and not doing pure expected value maximization.
  “This post gets somewhat technical and mathematical, but the point can be summarised as:
  You are vulnerable to money pumps only to the extent to which you deviate from the von Neumann-Morgenstern axioms of expected utility.
  In other words, using alternate decision theories is bad for your wealth.”
  The “in other words” doesn’t follow, since EV maximization can be more expensive than the shortcuts.
  Then there are other parts that give the strong impression that this expected value maximization will be binding in practice:
  “Rephrasing again: we have a wide variety of mathematical theorems all spotlighting, from different angles, the fact that a plan lacking in clumsiness, is possessing of coherence.”
  “The overall message here is that there is a set of qualitative behaviors and as long you do not engage in these qualitatively destructive behaviors, you will be behaving as if you have a utility function.”
  “The view that utility maximizers are inevitable is supported by a number of coherence theories developed early on in game theory which show that any agent without a consistent utility function is exploitable in some sense.”
  Here are some words I wrote that don’t quite sit right but which I thought I’d still share: Like, part of the MIRI beat as I understand it is to hold that there is some shining guiding light, some deep nature of intelligence that models will instantiate and make them highly dangerous. But it’s not clear to me whether you will in fact get models that instantiate that shining light. Like, you could imagine an alternative view of intelligence where it’s just useful monkey patches all the way down, and as we train more powerful models, they get more of the monkey patches, but without the fundamentals. The view in between would be that there are some monkey patches, and there are some deep generalizations, but then I want to know whether the coherence systems will bind to those kinds of agents.
  No need to respond/deeply engage, but I’d appreciate if you let me know if the above comments were too nitpicky.
  - Ben Pace 21 Feb 2023 18:20 UTC
    LW: 17 AF: 8
    12
    AF Parent
    Well, part of the semantic nuance is that we don’t care as much about the coherence theorems that do exist if they will fail to apply to current and future machines
    The correct response to learning that some theorems do not apply as much to reality as you thought, surely mustn’t be to change language so as to deny those theorems’ existence. Insofar as this is what’s going on, these are pretty bad norms of language in my opinion.
    - NunoSempere 21 Feb 2023 18:59 UTC
      5 points
      3
      Parent
      I am not defending the language of the OP’s title, I am defending the content of the post.
      - NunoSempere 21 Feb 2023 19:00 UTC
        4 points
        2
        Parent
        See this comment: <https://www.lesswrong.com/posts/yCuzmCsE86BTu9PfA/there-are-no-coherence-theorems?commentId=v2mgDWqirqibHTmKb>
  - habryka 21 Feb 2023 3:33 UTC
    LW: 8 AF: 2
    2
    AF Parent
    This is just false, because it is not taking into account the cost of doing expected value maximization, since giving consistent preferability scores is just very expensive and hard to do reliably.
    I do really want to put emphasis on the parenthetical remark “(at least in some situations, though they may not arise)”. Katja is totally aware that the coherence arguments require a bunch of preconditions that are not guaranteed to be the case for all situations, or even any situation ever, and her post is about how there is still a relevant argument here.
- Said Achmiz 24 Feb 2023 7:56 UTC
  13 points
  7
  Parent
  
  Also, the title of “there are no coherence arguments” is just straightforwardly wrong. The theorems cited are of course real theorems, they are relevant to agents acting with a certain kind of coherence, and I don’t really understand the semantic argument that is happening where it’s trying to say that the cited theorems aren’t talking about “coherence”, when like, they clearly are.
  
  This seems wrong to me. The post’s argument is that the cited theorems aren’t talking about “coherence”, and it does indeed seem clear that (at least most of, possibly all but I could see disagreeing about maybe one or two) these theorems are not, in fact, talking about “coherence”.
  What links here?
  - Said Achmiz's comment on Moderation notes re: recent Said/Duncan threads by Raemon (17 Apr 2023 9:00 UTC; 7 points)
- keith_wynroe 21 Feb 2023 22:44 UTC
  11 points
  11
  Parent
  Ngl kinda confused how these points imply the post seems wrong, the bulk of this seems to be (1) a semantic quibble + (2) a disagreement on who has the burden of proof when it comes to arguing about the plausibility of coherence + (3) maybe just misunderstanding the point that’s being made?
  (1) I agree the title is a bit needlessly provocative and in one sense of course VNM/Savage etc count as coherence theorems. But the point is that there is another sense that people use “coherence theorem/argument” in this field which corresponds to something like “If you’re not behaving like an EV-maximiser you’re shooting yourself in the foot by your own lights”, which is what brings in all the scary normativity and is what the OP is saying doesn’t follow from any existing theorem unless you make a bunch of other assumptions
  (2) The only real substantive objection to the content here seems to be “IMO completeness seems quite reasonable to me”. Why? Having complete preferences seems like a pretty narrow target within the space of all partial orders you could have as your preference relation, so what’s the reason why we should expect minds to steer towards this? Do humans have complete preferences?
  (3) In some other comments you’re saying that this post is straw-manning some extreme position because people who use coherence arguments already accept you could have e.g.
  >an extremely powerful AI that is VNM rational in all situations except for one tiny thing that does not >matter or will never come up
  This seems to be entirely missing the point/confused—OP isn’t saying that agents can realistically get away with not being VNM-rational because its inconsistencies/incompletenesses aren’t efficiently exploitable, they’re saying that you can have an agent that aren’t VNM-rational and aren’t exploitable in principle—i.e., your example is an agent that could in theory be money-pumped by another sufficiently powerful agent that was able to steer the world to where their corner-case weirdness came out—the point being made about incompleteness here is that you can have a non VNM-rational agent that’s not just un-Dutch-Bookable as a matter of empirical reality but in principle. The former still gets you claims like “A sufficiently smart agent will appear VNM-rational to you, they can’t have any obvious public-facing failings”, the latter undermines this
- NunoSempere 21 Feb 2023 2:43 UTC
  5 points
  2
  Parent
  Copying my second response from the EA forum:
  Like, I feel like with the same type of argument that is made in the post I could write a post saying “there are no voting impossibility theorems” and then go ahead and argue that the Arrow’s Impossibility Theorem assumptions are not universally proven, and then accuse everyone who ever talked about voting impossibility theorems that they are making “an error” since “those things are not real theorems”. And I think everyone working on voting-adjacent impossibility theorems would be pretty justifiedly annoyed by this.
  I think that there is some sense in which the character in your example would be right, since:
  - Arrow’s theorem doesn’t bind approval voting.
  - Generalizations of Arrow’s theorem don’t bind probabilistic results, e.g., each candidate is chosen with some probability corresponding to the amount of votes he gets.
  Like, if you had someone saying there was “a deep core of electoral process” which means that as they scale to important decisions means that you will necessarily get “highly defective electoral processes”, as illustrated in the classic example of the “dangers of the first pass the post system”. Well in that case it would be reasonable to wonder whether the assumptions of the theorem bind, or whether there is some system like approval voting which is much less shitty than the theorem provers were expecting, because the assumptions don’t hold.
  The analogy is imperfect, though, since approval voting is a known decent system, whereas for AI systems we don’t have an example friendly AI.
  - habryka 21 Feb 2023 3:22 UTC
    10 points
    15
    Parent
    Sorry, this might have not been obvious, but I indeed think the voting impossibility theorems have holes in them because of the lotteries case and that’s specifically why I chose that example.
    I think that intellectual point matters, but I also think writing a post with the title “There are no voting impossibility theorems”, defining “voting impossibility theorems” as “theorems that imply that all voting systems must make these known tradeoffs”, and then citing everyone who ever talked about “voting impossibility theorems” as having made “an error” would just be pretty unproductive. I would make a post like the ones that Scott Garrabrant made being like “I think voting impossibility theorems don’t account for these cases”, and that seems great, and I have been glad about contributions of this type.
  - Noosphere89 21 Feb 2023 16:30 UTC
    1 point
    0
    Parent
    
    Like, if you had someone saying there was “a deep core of electoral process” which means that as they scale to important decisions means that you will necessarily get “highly defective electoral processes”, as illustrated in the classic example of the “dangers of the first pass the post system”. Well in that case it would be reasonable to wonder whether the assumptions of the theorem bind, or whether there is some system like approval voting which is much less shitty than the theorem provers were expecting, because the assumptions don’t hold.
    
    Unfortunately, most democratic countries do use first past the post.
    
    The 2 things that are inevitable is condorcet cycles and strategic voting (Though condorcet cycles are less of a problem as you scale up the population, and I have a sneaking suspicion that condorcet cycles go away if we allow a real numbered infinite amount of people.)
    - Douglas_Knight 21 Feb 2023 19:43 UTC
      11 points
      1
      Parent
      I think most democratic countries use proportional representation, not FTPT. But talking about “most” is an FTPT error. Enough countries use proportional representation that you can study the effect of voting systems. And the results are shocking to me. The theoretical predictions are completely wrong. Duverger’s law is false in every FTPT country except America. On the flip side, while PR does lead to more parties, they still form 1-dimensional spectrum. For example, a Green Party is usually a far-left party with slightly different preferences, instead of a single issue party that is willing to form coalitions with the right.
      
      If politics were two dimensional, why wouldn’t you expect Condorcet cycles? Why would population get rid of them? If you have two candidates, a tie between them is on a razor’s edge. The larger the population of voters, the less likely. But if you have three candidates and three roughly equally common preferences, the cyclic shifts of A > B > C, then this is a robust tie. You only get a Condorcet winner when one of the factions becomes as big as the other two combined. Of course I have assumed away the other three preferences, but this is robust to them being small, not merely nonexistent.
      
      I don’t know what happens in the following model: there are three issues A,B,C. Everyone, both voter and candidate, is for all of them, but in a zero-sum way, represented a vector a,b,c, with a+b+c = 11, a,b,c>=0. Start with the voters as above, at (10,1,0), (0,10,1), (1,0,10). Then the candidates (11,0,0), (0,11,0), (0,0,11) form a Condorcet cycle. By symmetry there is no Condorcet winner over all possible candidates. Randomly shift the proportion of voters. Is there a candidate that beats the three given candidates? One that beats all possible candidates? I doubt it. Add noise to make the individual voters unique. Now, I don’t know.
    - NunoSempere 21 Feb 2023 18:56 UTC
      2 points
      0
      Parent
      You don’t have strategic voting with probabilistic results. And the degree of strategic voting can also be mitigated.
      - Noosphere89 21 Feb 2023 19:02 UTC
        3 points
        0
        Parent
        Hm, I remember Wikipedia talked about Hylland’s theorem that generalizes the Gibbard-Sattherwaite theorem to the probabilistic case, though Wikipedia might be wrong on that.
- [ ]
  [deleted]