Eliezer Yudkowsky comments on There are no coherence theorems

Eliezer Yudkowsky 28 Feb 2023 8:26 UTC
10 points
3
And this avoids the Complete Class Theorem conclusion of dominated strategies, how? Spell it out with a concrete example, maybe? Again, we care about domination, not representability at all.
- EJT 28 Feb 2023 21:34 UTC
  18 points
  23
  Parent
  And this avoids the Complete Class Theorem conclusion of dominated strategies, how?
  The Complete Class Theorem assumes that the agent’s preferences are complete. If the agent’s preferences are incomplete, the theorem doesn’t apply. So, you have to try to get Completeness some other way.
  You might try to get Completeness via some money-pump argument, but these arguments aren’t particularly convincing. Agents can make themselves immune to all possible money-pumps for Completeness by acting in accordance with the following policy: ‘if I previously turned down some option X, I will not choose any option that I strictly disprefer to X.’
  Again, we care about domination, not representability at all.
  Can you expand on this a little more? Agents cannot be (or appear to be) expected utility maximizers unless they are representable as expected utility maximizers, so if we care about whether agents will be (or will appear to be) expected utility maximizers, we have to care about whether they will be representable as expected utility maximizers.
  - Eliezer Yudkowsky 1 Mar 2023 7:21 UTC
    12 points
    8
    Parent
    In the limit, you take a rock, and say, “See, the complete class theorem doesn’t apply to it, because it doesn’t have any preferences ordered about anything!” What about your argument is any different from this—where is there a powerful, future-steering thing that isn’t viewable as Bayesian and also isn’t dominated? Spell it out more concretely: It has preferences ABC, two things aren’t ordered, it chooses X and then Y, etc. I can give concrete examples for my views; what exactly is a case in point of anything you’re claiming about the Complete Class Theorem’s supposed nonapplicability and hence nonexistence of any coherence theorems?
    - EJT 1 Mar 2023 18:49 UTC
      30 points
      23
      Parent
      In the limit
      You’re pushing towards the wrong limit. A rock can be represented as indifferent between all options and hence as having complete preferences.
      As I explain in the post, an agent’s preferences are incomplete if and only if they have a preferential gap between some pair of options, and an agent has a preferential gap between two options A and B if and only if they lack any strict preference between A and B and this lack of strict preference is insensitive to some sweetening or souring (such that, e.g., they strictly prefer A to A- and yet have no strict preferences either way between A and B, and between A- and B).
      Spell it out more concretely
      Sure. Imagine an agent as powerful and future-steering as you like. Among its options are A, A-, and B: the agent strictly prefers A to A-, and has a preferential gap between A and B, and between A- and B. Its preferences are incomplete, so the Complete Class Theorem doesn’t apply.
      [Suppose that you tried to use the proof of the Complete Class Theorem to prove that this agent would pursue a dominated strategy. Here’s why that won’t work:
      Without Completeness, we can’t get a real-valued utility function.
      Without a real-valued utility function, we can’t represent the agent’s policy with a vector of real numbers.
      Without a vector of real numbers representing the agent’s policy, we can’t get an equation representing a hyperplane that separates the set of available policies from the set of policies that strictly dominate the agent’s policy.
      Without a hyperplane equation, we can’t get a probability distribution relative to which the agent’s policy maximizes expected utility.]
      I anticipate that this answer won’t satisfy you and that you’ll ask for more concreteness in the example, but I don’t yet know what you want me to be more concrete about.
      - Eliezer Yudkowsky 7 Mar 2023 0:43 UTC
        1 point
        1
        Parent
        I want you to give me an example of something the agent actually does, under a couple of different sense inputs, given what you say are its preferences, and then I want you to gesture at that and say, “Lo, see how it is incoherent yet not dominated!”
        eapi 8 Mar 2023 0:09 UTC
        3 points
        3
        Parent
        Say more about what counts as incoherent yet not dominated? I assume “incoherent” is not being used here as an alias for “non-EU-maximizing” because then this whole discussion is circular.
        Eliezer Yudkowsky 8 Mar 2023 7:23 UTC
        6 points
        1
        Parent
        Suppose I describe your attempt to refute the existence of any coherence theorems: You point to a rock, and say that although it’s not coherent, it also can’t be dominated, because it has no preferences. Is there any sense in which you think you’ve disproved the existence of coherence theorems, which doesn’t consist of pointing to rocks, and various things that are intermediate between agents and rocks in the sense that they lack preferences about various things where you then refuse to say that they’re being dominated?
        eapi 9 Mar 2023 22:54 UTC
        3 points
        4
        Parent
        This is pretty unsatisfying as an expansion of “incoherent yet not dominated” given that it just uses the phrase “not coherent” instead.
        I find money-pump arguments to be the most compelling ones since they’re essentially tiny selection theorems for agents in adversarial environments, and we’ve got an example in the post of (the skeleton of) a proof that a lack-of-total-preferences doesn’t immediately lead to you being pumped. Perhaps there’s a more sophisticated argument that Actually No, You Still Get Pumped but I don’t think I’ve seen one in the comments here yet.
        If there are things which cannot-be-money-pumped, and yet which are not utility-maximizers, and problems like corrigibility are almost certainly unsolvable for utility-maximizers, perhaps it’s somewhat worth looking at ~~coherent~~ non-pumpable non-EU agents?
        Eliezer Yudkowsky 13 Mar 2023 2:46 UTC
        5 points
        0
        Parent
        Things are dominated when they forego free money and not just when money gets pumped out of them.
        keith_wynroe 18 Mar 2023 10:39 UTC
        6 points
        4
        Parent
        How is the toy example agent sketched in the post dominated?
        eapi 13 Mar 2023 3:12 UTC
        6 points
        5
        Parent
        ...wait, you were just asking for an example of an agent being “incoherent but not dominated” in those two senses of being money-pumped? And this is an exercise meant to hint that such “incoherent” agents are always dominatable?
        I continue to not see the problem, because the obvious examples don’t work. If I have $(1 a p p l e, $ 0)$ as incomparable to $(1 b a n a n a, $ 0)$ that doesn’t mean I turn down the trade of $- 1 a p p l e, + 1 b a n a n a, + $ 10000$ (which I assume is what you’re hinting at re. foregoing free money).
        If one then says “ah but if I offer $9999 and you turn that down, then we have identified your secret equivalent utili-” no, this is just a bid/ask spread, and I’m pretty sure plenty of ink has been spilled justifying EUM agents using uncertainty to price inaction like this.
        What’s an example of a non-EUM agent turning down free money which doesn’t just reduce to comparing against an EUM with reckless preferences/a low price of uncertainty?
        keith_wynroe 28 Mar 2023 11:14 UTC
        5 points
        1
        Parent
        Want to bump this because it seems important? How do you see the agent in the post as being dominated?
        keith_wynroe 10 Mar 2023 1:05 UTC
        2 points
        1
        Parent
        This seems totally different to the point OP is making which is that you can in theory have things that definitely are agents, definitely do have preferences, and are incoherent (hence not EV-maximisers) whilst not “predictably shooting themselves in the foot” as you claim must follow from this
        I agree the framing of “there are no coherence theorems” is a bit needlessly strong/overly provocative in a sense, but I’m unclear what your actual objection is here—are you claiming these hypothetical agents are in fact still vulnerable to money-pumping? That they are in fact not possible?
        eapi 9 Mar 2023 23:25 UTC
        2 points
        1
        Parent
        The rock doesn’t seem like a useful example here. The rock is “incoherent and not dominated” if you view it as having no preferences and hence never acting out of indifference, it’s “coherent and not dominated” if you view it as having a constant utility function and hence never acting out of indifference, OK, I guess the rock is just a fancy Rorschach test.
        IIUC a prototypical Slightly Complicated utility-maximizing agent is one with, say, $u (a p p l e s, b a n a n a s) = min (a p p l e s, b a n a n a s)$ , and a prototypical Slightly Complicated not-obviously-pumpable non-utility-maximizing agent is one with, say, the partial order $(a_{1}, b_{1}) ≼ (a_{2}, b_{2}) = a_{1} ≼ a_{2} \land b_{1} ≼ b_{2}$ plus the path-dependent rule that EJT talks about in the post (Ah yes, non-pumpable non-EU agents might have higher complexity! Is that relevant to the point you’re making?).
        What’s the competitive advantage of the EU agent? If I put them both in a sandbox universe and crank up their intelligence, how does the EU agent eat the non-EU agent? How confident are you that that is what must occur?
        Eve Grey 9 Mar 2023 13:14 UTC
        1 point
        0
        Parent
        Hey, I’m really sorry if I sound stupid, because I’m very new to all this, but I have a few questions (also, I don’t know which one of all of you is right, I genuinely have no idea).
        Aren’t rocks inherently coherent, or rather, their parts are inherently coherent, for they align with the laws of the universe, whereas the “rock” is just some composite abstract form we came up with, as observers?
        Can’t we think of the universe in itself as an “agent” not in the sense of it being “god”, but in the sense of it having preferences and acting on them?
        Examples would be hot things liking to be apart and dispersion leading to coldness, or put more abstractly—one of the “preferences” of the universe is entropy. I’m sorry if I’m missing something super obvious, I failed out of university, haha!
        If we let the “universe” be an agent in itself, so essentially it’s a composite of all simples there are (even the ones we’re not aware of), then all smaller composites by definition will adhere to the “preferences” of the “universe”, because from our current understanding of science, it seems like the “preferences” (laws) of the “universe” do not change when you cut the universe in half, unless you reach quantum scales, but even then, it is my unfounded suspicion that our previous models are simply laughably wrong, instead of the universe losing homogeneity at some arbitrary scale.
        Of course, the “law” of the “universe” is very simple and uncomplex—it is akin to the most powerful “intelligence” or “agent” there is, but with the most “primitive” and “basic” “preferences”. Also apologies for using so many words in quotations, I do so, because I am unsure if I understand their intended meaning.
        It seems to me that you could say that we’re all ultimately “dominated” by the “universe” itself, but in a way that’s not really escapeable, but in opposite, the “universe” is also “dominated” by more complex “agents”, as individuals can make sandwiches, while it’d take the “universe” much more time to create such complex and abstract composites from its pure “preferences”.
        In a way, to me at least, it seems that both the “hyper-intelligent”, “powerful” “agent” needs the “complex”, “non-homogeneous”, “stupid” “agent”, because without that relationship, if there ever randomly came to exist a “non-homogeneous” “agent” with enough “intelligence” to “dominate” the “universe”, then we’d essentially experience… uh, give me a second, because this is a very complicated concept I read about long ago...
        We’d experience the drop in the current energy levels all around the “universe”, because if the “universe” wasn’t the most “powerful” “agent” so far, then we’ve been existing in a “false vacuum”—essentially, the “universe” would be “dominated” by a “better” “agent” that adheres closer to the “true” “preferences” of the “universe”.
        And the “preference” of the “true” “universe” seems to be to reach that “true vacuum” state, as it’s more in line with entropy, but it needs smaller and dumber agents that are essentially unknowingly “preferring” to “destroy” the universe as they know it, because it doesn’t seem to be possible to reach that state with only micro-perturbations, or it’d take such a long time, it’s more entropically sound to create bigger agents, that while really stupid, have far more “power” than the simple “universe”, because even though the simple agents do not grasp the nature of “fire”, “cold”, “entropy” or even “time”, they can easily make “sandwiches”, “chairs”, “rockets”, “civilizations” and “technology”.
        I’d really appreciate it if someone tried to explain my confusions on the subject in private messages, as the thread here is getting very hard to read (at least for me, I’m very stupid!).
        I really appreciate it if you read through my entire nonsensical garble, I hope someone’s charitable enough to enlighten me which assumptions I made are completely nonsensical.
        I am not trying to be funny, snarky, ironic, sarcastic, I genuinely do not understand, I just found this website—sorry if I come off that way.
        Have a great day!
        the gears to ascension 10 Mar 2023 20:12 UTC
        3 points
        0
        Parent
        The question is how to identify particular bubbles of seekingness in the universe. How can you tell which part of the universe will respond to changes in other parts’ shape by reshaping them, and how? How do you know when a cell wants something, in the sense that if the process of getting the thing is interfered with, it will generate physical motions that end up compensating for the interference. How do you know if it wants the thing, if it responds differently to different sizes of interference? Can we identify conflict between two bubbles of seekingness? etc.
        
        The key question is how to identify when a physical has a preference for one thing over another. The hope is that, if we find a sufficiently coherent causal mechanism description that specifies what physical systems qualify as
        
        For what it’s worth, I think you’re on a really good track here, and I’m very excited about views that have the one you’re starting with. I’d invite browsing my account and links, as this is something I talk about often, from various perspectives, though mostly I defer to others for getting the math right.
        
        Speaking of getting the math right: read Discovering Agents (or browse related papers), it’s a really great paper. it’s not an easy first paper to read, but I’m a big believer in out-of-order learning and jumping way ahead of your current level to get a sense of what’s out there. Also check out the related paper Interpreting systems as solving POMDPs (or browse ) related papers.
        
        If you’re also new to scholarship in general, I’d also suggest checking out some stuff on how to do scholarship efficiently as well. a friend and I trimmed an old paper I like on how to read papers efficiently, and posted it to LW the other day. You can also find more related stuff from the tags on that post. (I reference that article myself occasionally and find myself surprised by how dense it is as a checklist of visits if I’m trying to properly understand a paper.)
        Eve Grey 11 Mar 2023 5:17 UTC
        1 point
        0
        Parent
        I’ll read the papers once I get on the computer—don’t worry, I may have not finished uni, but I always loved reading papers over a cup of tea.
        
        I’m kind of writing about this subject right now, so maybe there you can find something that interests you.
        
        How do I know what parts of the universe will respond to what changes? To me, at least, this seems like a mostly false question, for you to have true knowledge of that, you’d need to become the Universe itself. If you don’t care about true knowledge just good % chances, then you do it with heuristic. First you come up with composites that are somewhat self similar, but nothing is exactly alike in the Universe, except the Universe itself. Then you create a heuristic for predicting those composites and you use it, as long as the composite is similar enough to the original composite that the heuristic was based on. Of course, heuristics work differently in different environments, but often there are only a few environments even relevant for each composite, for if you take a fish out of water, it will die—now you may want a heuristic for an alive fish in the air, but I see it as much more useful to recompile the fish into catch at that point.
        
        This of course applies on any level of composition, from specific specimens of fish, to ones from a specific family, to a single species, then to all fish, then to all living organisms, with as many steps in between these listed as you want. How do we discriminate between which composite level we ought to work with? Pure intuition and experiment, once you do it with logic, it all becomes useless, because logic will attempt to compression everything, even those things which have more utility being uncompressed.
        
        I’ll get to the rest of your comment on PC, my fingers hurt. Typing on this new big phone is so hard lol.
  - Max H 1 Mar 2023 0:18 UTC
    2 points
    0
    Parent
    Agents can make themselves immune to all possible money-pumps for Completeness by acting in accordance with the following policy: ‘if I previously turned down some option X, I will not choose any option that I strictly disprefer to X.’
    
    Plus some other assumptions (capable of backwards induction, knowing trades in advance), right?
    I’m curious whether these assumptions are actually stronger than, or related to, completeness.
    so if we care about whether agents will be (or will appear to be) expected utility maximizers, we have to care about whether they will be representable as expected utility maximizers.
    Both sets (representable and not) are non-empty. The question remains about which set the interesting agents are in. I think that CCT + VNM, money pump arguments, etc. strongly hint, but do not prove, that the EU maximizers are the interesting ones.
    Also, I personally don’t find the question itself particularly interesting, because it seems like one can move between these sets in a relatively shallow way (I’d be interested in seeing counterexamples, though). Perhaps that’s what Yudkowsky means by not caring about representability?
    What links here?
    Max H's comment on A decade of lurking, a month of posting by Max H (28 Apr 2023 14:31 UTC; 1 point)
    - EJT 1 Mar 2023 19:24 UTC
      2 points
      1
      Parent
      Plus some other assumptions (capable of backwards induction, knowing trades in advance), right?
      Yep, that’s right!
      I’m curious whether these assumptions are actually stronger than, or related to, completeness.
      Since the Completeness assumption is about preferences while the backward-induction and knowing-trades-in-advance assumptions are not, they don’t seem very closely related to me. The assumption that the agent’s strict preferences are transitive is more closely related, but it’s not stronger than Completeness in the sense of implying Completeness.
      Can you say a bit more about what you mean by ‘interesting agents’?
      From your other comment:
      That is, if you try to construct / find / evolve the most powerful agent that you can, without a very precise understanding of agents / cognition / alignment, you’ll probably get something very close to an EU maximizer.
      I think this could well be right. The main thought I want to argue against is more like:
      Even if you initially succeed in creating a powerful agent that doesn’t maximize expected utility, VNM/CCT/money-pump arguments make it likely that this powerful agent will later become an expected utility maximizer.
      - Max H 1 Mar 2023 21:14 UTC
        2 points
        1
        Parent
        I meant stronger in a loose sense: you argued that “completeness doesn’t come for free”, but it seems more like actually what you’ve shown is that not-pursuing-dominated-strategies is the thing that doesn’t come for free.
        You either need a bunch of assumptions about preferences, or you need one less of those assumptions, plus a few other assumptions about knowing trades, induction, and adherence to a specific policy.
        And even given all these other assumptions, the proposed agent with a preferential gap seems like it’s still only epsilon-different from an actual EU maximizer. To me this looks like a strong hint that these assumptions actually do point at a core of something simple which one might call “coherence”, which I expect to show up in (all minus epsilon) advanced agents, even if there are pathological points in advanced-agent-space which don’t have these properties (and even if expected utility theory as a whole isn’t quite correct).
        
        EJT 1 Mar 2023 23:19 UTC
        9 points
        9
        Parent
        You either need a bunch of assumptions about preferences, or you need one less of those assumptions, plus a few other assumptions about knowing trades, induction, and adherence to a specific policy.
        I see. I think this is right.
        the proposed agent with a preferential gap seems like it’s still only epsilon-different from an actual EU maximizer.
        I agree with this too, but note that the agent with a single preferential gap is just an example. Agents can have arbitrarily many preferential gaps and still avoid pursuing dominated strategies. And agents with many preferential gaps may behave quite differently to expected utility maximizers.
        What links here?
        Max H's comment on Why Not Subagents? by johnswentworth (23 Jun 2023 1:43 UTC; 8 points)
        Max H's comment on Trying to deconfuse some core AI x-risk problems by habryka (17 Oct 2023 20:27 UTC; 4 points)