You’ve mistaken acyclicity for transitivity. The money-pump establishes only acyclicity. Representability-as-an-expected-utility-maximizer requires transitivity.
As I note in the post, agents can make themselves immune to all possible money-pumps for completeness by acting in accordance with the following policy: ‘if I previously turned down some option X, I will not choose any option that I strictly disprefer to X.’ Acting in accordance with this policy need never require an agent to act against any of their preferences.
And this avoids the Complete Class Theorem conclusion of dominated strategies, how? Spell it out with a concrete example, maybe? Again, we care about domination, not representability at all.
And this avoids the Complete Class Theorem conclusion of dominated strategies, how?
The Complete Class Theorem assumes that the agent’s preferences are complete. If the agent’s preferences are incomplete, the theorem doesn’t apply. So, you have to try to get Completeness some other way.
You might try to get Completeness via some money-pump argument, but these arguments aren’t particularly convincing. Agents can make themselves immune to all possible money-pumps for Completeness by acting in accordance with the following policy: ‘if I previously turned down some option X, I will not choose any option that I strictly disprefer to X.’
Again, we care about domination, not representability at all.
Can you expand on this a little more? Agents cannot be (or appear to be) expected utility maximizers unless they are representable as expected utility maximizers, so if we care about whether agents will be (or will appear to be) expected utility maximizers, we have to care about whether they will be representable as expected utility maximizers.
In the limit, you take a rock, and say, “See, the complete class theorem doesn’t apply to it, because it doesn’t have any preferences ordered about anything!” What about your argument is any different from this—where is there a powerful, future-steering thing that isn’t viewable as Bayesian and also isn’t dominated? Spell it out more concretely: It has preferences ABC, two things aren’t ordered, it chooses X and then Y, etc. I can give concrete examples for my views; what exactly is a case in point of anything you’re claiming about the Complete Class Theorem’s supposed nonapplicability and hence nonexistence of any coherence theorems?
You’re pushing towards the wrong limit. A rock can be represented as indifferent between all options and hence as having complete preferences.
As I explain in the post, an agent’s preferences are incomplete if and only if they have a preferential gap between some pair of options, and an agent has a preferential gap between two options A and B if and only if they lack any strict preference between A and B and this lack of strict preference is insensitive to some sweetening or souring (such that, e.g., they strictly prefer A to A- and yet have no strict preferences either way between A and B, and between A- and B).
Spell it out more concretely
Sure. Imagine an agent as powerful and future-steering as you like. Among its options are A, A-, and B: the agent strictly prefers A to A-, and has a preferential gap between A and B, and between A- and B. Its preferences are incomplete, so the Complete Class Theorem doesn’t apply.
[Suppose that you tried to use the proof of the Complete Class Theorem to prove that this agent would pursue a dominated strategy. Here’s why that won’t work:
Without Completeness, we can’t get a real-valued utility function.
Without a real-valued utility function, we can’t represent the agent’s policy with a vector of real numbers.
Without a vector of real numbers representing the agent’s policy, we can’t get an equation representing a hyperplane that separates the set of available policies from the set of policies that strictly dominate the agent’s policy.
Without a hyperplane equation, we can’t get a probability distribution relative to which the agent’s policy maximizes expected utility.]
I anticipate that this answer won’t satisfy you and that you’ll ask for more concreteness in the example, but I don’t yet know what you want me to be more concrete about.
I want you to give me an example of something the agent actually does, under a couple of different sense inputs, given what you say are its preferences, and then I want you to gesture at that and say, “Lo, see how it is incoherent yet not dominated!”
Say more about what counts as incoherent yet not dominated? I assume “incoherent” is not being used here as an alias for “non-EU-maximizing” because then this whole discussion is circular.
Suppose I describe your attempt to refute the existence of any coherence theorems: You point to a rock, and say that although it’s not coherent, it also can’t be dominated, because it has no preferences. Is there any sense in which you think you’ve disproved the existence of coherence theorems, which doesn’t consist of pointing to rocks, and various things that are intermediate between agents and rocks in the sense that they lack preferences about various things where you then refuse to say that they’re being dominated?
This is pretty unsatisfying as an expansion of “incoherent yet not dominated” given that it just uses the phrase “not coherent” instead.
I find money-pump arguments to be the most compelling ones since they’re essentially tiny selection theorems for agents in adversarial environments, and we’ve got an example in the post of (the skeleton of) a proof that a lack-of-total-preferences doesn’t immediately lead to you being pumped. Perhaps there’s a more sophisticated argument that Actually No, You Still Get Pumped but I don’t think I’ve seen one in the comments here yet.
If there are things which cannot-be-money-pumped, and yet which are not utility-maximizers, and problems like corrigibility are almost certainly unsolvable for utility-maximizers, perhaps it’s somewhat worth looking at coherent non-pumpable non-EU agents?
...wait, you were just asking for an example of an agent being “incoherent but not dominated” in those two senses of being money-pumped? And this is an exercise meant to hint that such “incoherent” agents are always dominatable?
I continue to not see the problem, because the obvious examples don’t work. If I have (1apple,$0) as incomparable to (1banana,$0) that doesn’t mean I turn down the trade of −1apple,+1banana,+$10000 (which I assume is what you’re hinting at re. foregoing free money).
If one then says “ah but if I offer $9999 and you turn that down, then we have identified your secret equivalent utili-” no, this is just a bid/ask spread, and I’m pretty sure plenty of ink has been spilled justifying EUM agents using uncertainty to price inaction like this.
What’s an example of a non-EUM agent turning down free money which doesn’t just reduce to comparing against an EUM with reckless preferences/a low price of uncertainty?
This seems totally different to the point OP is making which is that you can in theory have things that definitely are agents, definitely do have preferences, and are incoherent (hence not EV-maximisers) whilst not “predictably shooting themselves in the foot” as you claim must follow from this
I agree the framing of “there are no coherence theorems” is a bit needlessly strong/overly provocative in a sense, but I’m unclear what your actual objection is here—are you claiming these hypothetical agents are in fact still vulnerable to money-pumping? That they are in fact not possible?
The rock doesn’t seem like a useful example here. The rock is “incoherent and not dominated” if you view it as having no preferences and hence never acting out of indifference, it’s “coherent and not dominated” if you view it as having a constant utility function and hence never acting out of indifference, OK, I guess the rock is just a fancy Rorschach test.
IIUC a prototypical Slightly Complicated utility-maximizing agent is one with, say, u(apples,bananas)=min(apples,bananas), and a prototypical Slightly Complicated not-obviously-pumpable non-utility-maximizing agent is one with, say, the partial order (a1,b1)≼(a2,b2)=a1≼a2∧b1≼b2 plus the path-dependent rule that EJT talks about in the post (Ah yes, non-pumpable non-EU agents might have higher complexity! Is that relevant to the point you’re making?).
What’s the competitive advantage of the EU agent? If I put them both in a sandbox universe and crank up their intelligence, how does the EU agent eat the non-EU agent? How confident are you that that is what must occur?
Hey, I’m really sorry if I sound stupid, because I’m very new to all this, but I have a few questions (also, I don’t know which one of all of you is right, I genuinely have no idea).
Aren’t rocks inherently coherent, or rather, their parts are inherently coherent, for they align with the laws of the universe, whereas the “rock” is just some composite abstract form we came up with, as observers?
Can’t we think of the universe in itself as an “agent” not in the sense of it being “god”, but in the sense of it having preferences and acting on them?
Examples would be hot things liking to be apart and dispersion leading to coldness, or put more abstractly—one of the “preferences” of the universe is entropy. I’m sorry if I’m missing something super obvious, I failed out of university, haha!
If we let the “universe” be an agent in itself, so essentially it’s a composite of all simples there are (even the ones we’re not aware of), then all smaller composites by definition will adhere to the “preferences” of the “universe”, because from our current understanding of science, it seems like the “preferences” (laws) of the “universe” do not change when you cut the universe in half, unless you reach quantum scales, but even then, it is my unfounded suspicion that our previous models are simply laughably wrong, instead of the universe losing homogeneity at some arbitrary scale.
Of course, the “law” of the “universe” is very simple and uncomplex—it is akin to the most powerful “intelligence” or “agent” there is, but with the most “primitive” and “basic” “preferences”. Also apologies for using so many words in quotations, I do so, because I am unsure if I understand their intended meaning.
It seems to me that you could say that we’re all ultimately “dominated” by the “universe” itself, but in a way that’s not really escapeable, but in opposite, the “universe” is also “dominated” by more complex “agents”, as individuals can make sandwiches, while it’d take the “universe” much more time to create such complex and abstract composites from its pure “preferences”.
In a way, to me at least, it seems that both the “hyper-intelligent”, “powerful” “agent” needs the “complex”, “non-homogeneous”, “stupid” “agent”, because without that relationship, if there ever randomly came to exist a “non-homogeneous” “agent” with enough “intelligence” to “dominate” the “universe”, then we’d essentially experience… uh, give me a second, because this is a very complicated concept I read about long ago...
We’d experience the drop in the current energy levels all around the “universe”, because if the “universe” wasn’t the most “powerful” “agent” so far, then we’ve been existing in a “false vacuum”—essentially, the “universe” would be “dominated” by a “better” “agent” that adheres closer to the “true” “preferences” of the “universe”.
And the “preference” of the “true” “universe” seems to be to reach that “true vacuum” state, as it’s more in line with entropy, but it needs smaller and dumber agents that are essentially unknowingly “preferring” to “destroy” the universe as they know it, because it doesn’t seem to be possible to reach that state with only micro-perturbations, or it’d take such a long time, it’s more entropically sound to create bigger agents, that while really stupid, have far more “power” than the simple “universe”, because even though the simple agents do not grasp the nature of “fire”, “cold”, “entropy” or even “time”, they can easily make “sandwiches”, “chairs”, “rockets”, “civilizations” and “technology”.
I’d really appreciate it if someone tried to explain my confusions on the subject in private messages, as the thread here is getting very hard to read (at least for me, I’m very stupid!).
I really appreciate it if you read through my entire nonsensical garble, I hope someone’s charitable enough to enlighten me which assumptions I made are completely nonsensical.
I am not trying to be funny, snarky, ironic, sarcastic, I genuinely do not understand, I just found this website—sorry if I come off that way.
The question is how to identify particular bubbles of seekingness in the universe. How can you tell which part of the universe will respond to changes in other parts’ shape by reshaping them, and how? How do you know when a cell wants something, in the sense that if the process of getting the thing is interfered with, it will generate physical motions that end up compensating for the interference. How do you know if it wants the thing, if it responds differently to different sizes of interference? Can we identify conflict between two bubbles of seekingness? etc.
The key question is how to identify when a physical has a preference for one thing over another. The hope is that, if we find a sufficiently coherent causal mechanism description that specifies what physical systems qualify as
For what it’s worth, I think you’re on a really good track here, and I’m very excited about views that have the one you’re starting with. I’d invite browsing my account and links, as this is something I talk about often, from various perspectives, though mostly I defer to others for getting the math right.
Speaking of getting the math right: read Discovering Agents (or browse related papers), it’s a really great paper. it’s not an easy first paper to read, but I’m a big believer in out-of-order learning and jumping way ahead of your current level to get a sense of what’s out there. Also check out the related paper Interpreting systems as solving POMDPs (or browse ) related papers.
If you’re also new to scholarship in general, I’d also suggest checking out some stuff on how to do scholarship efficiently as well. a friend and I trimmed an old paper I like on how to read papers efficiently, and posted it to LW the other day. You can also find more related stuff from the tags on that post. (I reference that article myself occasionally and find myself surprised by how dense it is as a checklist of visits if I’m trying to properly understand a paper.)
I’ll read the papers once I get on the computer—don’t worry, I may have not finished uni, but I always loved reading papers over a cup of tea.
I’m kind of writing about this subject right now, so maybe there you can find something that interests you.
How do I know what parts of the universe will respond to what changes?
To me, at least, this seems like a mostly false question, for you to have true knowledge of that, you’d need to become the Universe itself.
If you don’t care about true knowledge just good % chances, then you do it with heuristic.
First you come up with composites that are somewhat self similar, but nothing is exactly alike in the Universe, except the Universe itself. Then you create a heuristic for predicting those composites and you use it, as long as the composite is similar enough to the original composite that the heuristic was based on. Of course, heuristics work differently in different environments, but often there are only a few environments even relevant for each composite, for if you take a fish out of water, it will die—now you may want a heuristic for an alive fish in the air, but I see it as much more useful to recompile the fish into catch at that point.
This of course applies on any level of composition, from specific specimens of fish, to ones from a specific family, to a single species, then to all fish, then to all living organisms, with as many steps in between these listed as you want. How do we discriminate between which composite level we ought to work with? Pure intuition and experiment, once you do it with logic, it all becomes useless, because logic will attempt to compression everything, even those things which have more utility being uncompressed.
I’ll get to the rest of your comment on PC, my fingers hurt. Typing on this new big phone is so hard lol.
Agents can make themselves immune to all possible money-pumps for Completeness by acting in accordance with the following policy: ‘if I previously turned down some option X, I will not choose any option that I strictly disprefer to X.’
Plus some other assumptions (capable of backwards induction, knowing trades in advance), right?
I’m curious whether these assumptions are actually stronger than, or related to, completeness.
so if we care about whether agents will be (or will appear to be) expected utility maximizers, we have to care about whether they will be representable as expected utility maximizers.
Both sets (representable and not) are non-empty. The question remains about which set the interesting agents are in. I think that CCT + VNM, money pump arguments, etc. strongly hint, but do not prove, that the EU maximizers are the interesting ones.
Also, I personally don’t find the question itself particularly interesting, because it seems like one can move between these sets in a relatively shallow way (I’d be interested in seeing counterexamples, though). Perhaps that’s what Yudkowsky means by not caring about representability?
Plus some other assumptions (capable of backwards induction, knowing trades in advance), right?
Yep, that’s right!
I’m curious whether these assumptions are actually stronger than, or related to, completeness.
Since the Completeness assumption is about preferences while the backward-induction and knowing-trades-in-advance assumptions are not, they don’t seem very closely related to me. The assumption that the agent’s strict preferences are transitive is more closely related, but it’s not stronger than Completeness in the sense of implying Completeness.
Can you say a bit more about what you mean by ‘interesting agents’?
From your other comment:
That is, if you try to construct / find / evolve the most powerful agent that you can, without a very precise understanding of agents / cognition / alignment, you’ll probably get something very close to an EU maximizer.
I think this could well be right. The main thought I want to argue against is more like:
Even if you initially succeed in creating a powerful agent that doesn’t maximize expected utility, VNM/CCT/money-pump arguments make it likely that this powerful agent will later become an expected utility maximizer.
I meant stronger in a loose sense: you argued that “completeness doesn’t come for free”, but it seems more like actually what you’ve shown is that not-pursuing-dominated-strategies is the thing that doesn’t come for free.
You either need a bunch of assumptions about preferences, or you need one less of those assumptions, plus a few other assumptions about knowing trades, induction, and adherence to a specific policy.
And even given all these other assumptions, the proposed agent with a preferential gap seems like it’s still only epsilon-different from an actual EU maximizer. To me this looks like a strong hint that these assumptions actually do point at a core of something simple which one might call “coherence”, which I expect to show up in (all minus epsilon) advanced agents, even if there are pathological points in advanced-agent-space which don’t have these properties (and even if expected utility theory as a whole isn’t quite correct).
You either need a bunch of assumptions about preferences, or you need one less of those assumptions, plus a few other assumptions about knowing trades, induction, and adherence to a specific policy.
I see. I think this is right.
the proposed agent with a preferential gap seems like it’s still only epsilon-different from an actual EU maximizer.
I agree with this too, but note that the agent with a single preferential gap is just an example. Agents can have arbitrarily many preferential gaps and still avoid pursuing dominated strategies. And agents with many preferential gaps may behave quite differently to expected utility maximizers.
You need only non-transitivity for money pump. Let’s suppose that you prefer A to B, B to C and you are indifferent between A and C (not cyclic, not transitive preference). You start with C, you pay me 1 dollar to switch to B, then you pay 1 dollar to switch to A, then I pay you 1 dollar to switch to C (which you do, because A = C implies C + 1 > A) and I have 1 free dollar. Note that your proposed policy doesn’t work here, because you do not strictly disprefer C + 1.
Nice point but this money-pump only rules out one kind of transitivity-violation (the agent strictly prefers A to B, strictly prefers B to C, and is indifferent between A and C). It doesn’t rule out this other kind of transitivity-violation: the agent strictly prefers A to B, strictly prefers B to C, and has a preferential gap between A and C.
Wait, I can construct a money pump for that situation. First let the agent choose between A and C. If there’s a preferential gap, the agent should sometimes choose C. Then let the agent pay a penny to upgrade from C to B. Then let the agent pay a penny to upgrade from B to A. The agent is now where it could have been to begin with by choosing A in the first place, but 2 cents poorer.
Even if we ditch the completeness axiom, it sure seems like money pump arguments require us to assume a partial order.
So this won’t work if the agent knows in advance what trades they’ll be offered and is capable of reasoning by backward induction. In that case, the agent will reason that they’d choose A-2p over B-1p if they reached that node, and would choose B-1p over C if they reached that node. So (they will reason), the choice between A and C is actually a choice between A and A-2p, and so they will reliably choose A.
And plausibly we should make assumptions like ‘the agent knows in advance what trades they will be offered’ and ‘the agent is capable of backward induction’ if we’re arguing about whether agents are rationally required to conform their preferences to the VNM axioms.
(If the agent doesn’t know in advance what trades they will be offered or is incapable of backward induction, then their pursuit of a dominated strategy need not indicate any defect in their preferences. Their pursuit of a dominated strategy can instead be blamed on their lack of knowledge and/or reasoning ability.)
That said, I’ve recently become less convinced that ‘knowing trades in advance’ is a reasonable assumption in the context of predicting the behaviour of advanced artificial agents. And your money-pump seems to work if we assume that the agent doesn’t know what trades they will be offered in advance. So maybe we do in fact have reason to expect that advanced artificial agents will have transitive preferences. (I say ‘maybe’ because there are some other relevant considerations pushing the other way, discussed in a paper-in-progress by Adam Bales.)
I don’t know, this still seems kind of sketchy to me. Say we change the experiment so that it costs the agent a penny to choose A in the initial choice: it will still take that choice, since A-1p is still preferable to A-2p. Compare this to a game where the agent can freely choose between A and C, and there’s no cost in pennies to either choice. Since there’s a preferential gap between A and C, the agent will sometimes pick A and sometimes pick C. In the first game, on the other hand the agent always picks A. Yet in the first game, not only is picking A more costly, but we’ve only added options for the agent if it picks C. In other words, an agent that has A>B, B>C, and A~C sure looks like it’s paying to take options away from itself, since adding options makes it less likely to pick C, even when it costs a penny to avoid it.
Nice! This is a cool case. The behaviour does indeed seem weird. I’m inclined to call it irrational. But the agent isn’t pursuing a dominated strategy: in neither game does the agent settle on an option that they strictly disprefer to some other available option.
This discussion is interesting and I’m happy to keep having it, but perhaps it’s worth saying (if not for your sake then for other readers) that this is a side-thread. The main point of the post is that there are no money-pumps for Completeness. I think that there are probably no money-pumps for Transitivity either, but it’s the claim about Completeness that I really want to defend.
Cool. For me personally, I think that paying to avoid being given more options looks enough like being dominated that I’d want to keep the axiom of transitivity around, even if it’s not technically a money pump.
So in the case where we have transitivity but no completeness, it seems kind of like there might be a weaker coherence theorem, where the agent’s behaviour can be described by rolling a dice to pick a utility function before beginning a game, and then subsequently playing according to that utility function. Under this interpretation, if A > B then that means that A is preferred to B under all utility functions the agent could pick, while a preferential gap between A and B means that sometimes A will be ranked higher and sometimes B will be ranked higher, depending on which utility function the die roll happens to land on.
Does this match your intuition? Is there an obvious counterexample to this “coherence conjecture”?
This is cool. I don’t think violations of continuity are also in general exploitable, but I’d guess you should also be able to replace continuity with something weaker from Russell and Isaacs, 2020, just enough to rule out St. Petersburg-like lotteries, specifically any one of Countable Independence (which can also replace independence), the Extended Outcome Principle (which can also replace independence) or Limitedness, and then replace the real-valued utility functions with utility functions representable by “lexicographically ordered ordinal sequences of bounded real utilities”.
Does I understand correctly that preferential gaps have size, like, i do not prefer A to B, I do not prefer A to B+1, but some large N exists that I prefer B + N to A?
That can be true (and will often be true when it comes to—e.g. - a human agent with a preferential gap between a Fabergé egg and a long-lost wedding album), but it’s not a necessary feature of preferential gaps.
Kind of tangential but I’d be interested in your take on how strongly money-pumping etc is actually an argument against full-on cyclical preferences? One way to think about why getting money-pumped is bad is because you have an additional preference to not pay money to go nowhere. But it feels like all this tells us is that “something has to go”, and if an agent is rationally permitted to modify its own preferences to avoid these situations then it seems a priori acceptable for it to instead just say something like “well actually I weight my cyclical preferences more highly so I’ll modify the preference against arbitrarily paying”
In other words, it feels like the money-pumping arguments presume this other preference that in a sense takes “precedence” over the cyclical ones and I’m not sure how to think about that still
I find the money-pumping arguments compelling not as normative arguments about what preferences are “allowed”, but as engineering/security/survival arguments about what properties of preferences are necessary for them to be stable against an adversarial environment (which is distinct from what properties are sufficient for them to be stable, and possibly distinct from questions of self-modification).
Yeah I agree that even if they fall short of normative constraints there’s some empirical content around what happens in adversarial environments. I think I have doubts that this stuff translates to thinking about AGIs too much though, in the sense that there’s an obvious story of how an adversarial environment selected for (partial) coherence in us, but I don’t see the same kinds of selection pressures being a force on AGIs. Unless you assume that they’ll want to modify themselves in anticipation of adversarial environments which kinda begs the question
Hmm, I was going to reply with something like “money-pumps don’t just say something about adversarial environments, they also say something about avoiding leaking resources” (e.g. if you have circular preferences between proximity to apples, bananas, and carrots, then if you encounter all three of them in a single room you might get trapped walking between them forever) but that’s also begging your original question—we can always just update to enjoy leaking resources, transmuting a “leak” into an “expenditure”.
Another frame here is that if you make/encounter an agent, and that agent self-modifies into/starts off as something which is happy to leak pretty fundamental resources like time and energy and material-under-control, then you’re not as worried about it? It’s certainly not competing as strongly for the same resources as you whenever it’s “under the influence” of its circular preferences.
These arguments don’t work.
You’ve mistaken acyclicity for transitivity. The money-pump establishes only acyclicity. Representability-as-an-expected-utility-maximizer requires transitivity.
As I note in the post, agents can make themselves immune to all possible money-pumps for completeness by acting in accordance with the following policy: ‘if I previously turned down some option X, I will not choose any option that I strictly disprefer to X.’ Acting in accordance with this policy need never require an agent to act against any of their preferences.
And this avoids the Complete Class Theorem conclusion of dominated strategies, how? Spell it out with a concrete example, maybe? Again, we care about domination, not representability at all.
The Complete Class Theorem assumes that the agent’s preferences are complete. If the agent’s preferences are incomplete, the theorem doesn’t apply. So, you have to try to get Completeness some other way.
You might try to get Completeness via some money-pump argument, but these arguments aren’t particularly convincing. Agents can make themselves immune to all possible money-pumps for Completeness by acting in accordance with the following policy: ‘if I previously turned down some option X, I will not choose any option that I strictly disprefer to X.’
Can you expand on this a little more? Agents cannot be (or appear to be) expected utility maximizers unless they are representable as expected utility maximizers, so if we care about whether agents will be (or will appear to be) expected utility maximizers, we have to care about whether they will be representable as expected utility maximizers.
In the limit, you take a rock, and say, “See, the complete class theorem doesn’t apply to it, because it doesn’t have any preferences ordered about anything!” What about your argument is any different from this—where is there a powerful, future-steering thing that isn’t viewable as Bayesian and also isn’t dominated? Spell it out more concretely: It has preferences ABC, two things aren’t ordered, it chooses X and then Y, etc. I can give concrete examples for my views; what exactly is a case in point of anything you’re claiming about the Complete Class Theorem’s supposed nonapplicability and hence nonexistence of any coherence theorems?
You’re pushing towards the wrong limit. A rock can be represented as indifferent between all options and hence as having complete preferences.
As I explain in the post, an agent’s preferences are incomplete if and only if they have a preferential gap between some pair of options, and an agent has a preferential gap between two options A and B if and only if they lack any strict preference between A and B and this lack of strict preference is insensitive to some sweetening or souring (such that, e.g., they strictly prefer A to A- and yet have no strict preferences either way between A and B, and between A- and B).
Sure. Imagine an agent as powerful and future-steering as you like. Among its options are A, A-, and B: the agent strictly prefers A to A-, and has a preferential gap between A and B, and between A- and B. Its preferences are incomplete, so the Complete Class Theorem doesn’t apply.
[Suppose that you tried to use the proof of the Complete Class Theorem to prove that this agent would pursue a dominated strategy. Here’s why that won’t work:
Without Completeness, we can’t get a real-valued utility function.
Without a real-valued utility function, we can’t represent the agent’s policy with a vector of real numbers.
Without a vector of real numbers representing the agent’s policy, we can’t get an equation representing a hyperplane that separates the set of available policies from the set of policies that strictly dominate the agent’s policy.
Without a hyperplane equation, we can’t get a probability distribution relative to which the agent’s policy maximizes expected utility.]
I anticipate that this answer won’t satisfy you and that you’ll ask for more concreteness in the example, but I don’t yet know what you want me to be more concrete about.
I want you to give me an example of something the agent actually does, under a couple of different sense inputs, given what you say are its preferences, and then I want you to gesture at that and say, “Lo, see how it is incoherent yet not dominated!”
Say more about what counts as incoherent yet not dominated? I assume “incoherent” is not being used here as an alias for “non-EU-maximizing” because then this whole discussion is circular.
Suppose I describe your attempt to refute the existence of any coherence theorems: You point to a rock, and say that although it’s not coherent, it also can’t be dominated, because it has no preferences. Is there any sense in which you think you’ve disproved the existence of coherence theorems, which doesn’t consist of pointing to rocks, and various things that are intermediate between agents and rocks in the sense that they lack preferences about various things where you then refuse to say that they’re being dominated?
This is pretty unsatisfying as an expansion of “incoherent yet not dominated” given that it just uses the phrase “not coherent” instead.
I find money-pump arguments to be the most compelling ones since they’re essentially tiny selection theorems for agents in adversarial environments, and we’ve got an example in the post of (the skeleton of) a proof that a lack-of-total-preferences doesn’t immediately lead to you being pumped. Perhaps there’s a more sophisticated argument that Actually No, You Still Get Pumped but I don’t think I’ve seen one in the comments here yet.
If there are things which cannot-be-money-pumped, and yet which are not utility-maximizers, and problems like corrigibility are almost certainly unsolvable for utility-maximizers, perhaps it’s somewhat worth looking at
coherentnon-pumpable non-EU agents?Things are dominated when they forego free money and not just when money gets pumped out of them.
How is the toy example agent sketched in the post dominated?
...wait, you were just asking for an example of an agent being “incoherent but not dominated” in those two senses of being money-pumped? And this is an exercise meant to hint that such “incoherent” agents are always dominatable?
I continue to not see the problem, because the obvious examples don’t work. If I have (1 apple,$0) as incomparable to (1 banana,$0) that doesn’t mean I turn down the trade of −1 apple,+1 banana,+$10000 (which I assume is what you’re hinting at re. foregoing free money).
If one then says “ah but if I offer $9999 and you turn that down, then we have identified your secret equivalent utili-” no, this is just a bid/ask spread, and I’m pretty sure plenty of ink has been spilled justifying EUM agents using uncertainty to price inaction like this.
What’s an example of a non-EUM agent turning down free money which doesn’t just reduce to comparing against an EUM with reckless preferences/a low price of uncertainty?
Want to bump this because it seems important? How do you see the agent in the post as being dominated?
This seems totally different to the point OP is making which is that you can in theory have things that definitely are agents, definitely do have preferences, and are incoherent (hence not EV-maximisers) whilst not “predictably shooting themselves in the foot” as you claim must follow from this
I agree the framing of “there are no coherence theorems” is a bit needlessly strong/overly provocative in a sense, but I’m unclear what your actual objection is here—are you claiming these hypothetical agents are in fact still vulnerable to money-pumping? That they are in fact not possible?
The rock doesn’t seem like a useful example here. The rock is “incoherent and not dominated” if you view it as having no preferences and hence never acting out of indifference, it’s “coherent and not dominated” if you view it as having a constant utility function and hence never acting out of indifference, OK, I guess the rock is just a fancy Rorschach test.
IIUC a prototypical Slightly Complicated utility-maximizing agent is one with, say, u(apples,bananas)=min(apples,bananas), and a prototypical Slightly Complicated not-obviously-pumpable non-utility-maximizing agent is one with, say, the partial order (a1,b1)≼(a2,b2)=a1≼a2∧b1≼b2 plus the path-dependent rule that EJT talks about in the post (Ah yes, non-pumpable non-EU agents might have higher complexity! Is that relevant to the point you’re making?).
What’s the competitive advantage of the EU agent? If I put them both in a sandbox universe and crank up their intelligence, how does the EU agent eat the non-EU agent? How confident are you that that is what must occur?
Hey, I’m really sorry if I sound stupid, because I’m very new to all this, but I have a few questions (also, I don’t know which one of all of you is right, I genuinely have no idea).
Aren’t rocks inherently coherent, or rather, their parts are inherently coherent, for they align with the laws of the universe, whereas the “rock” is just some composite abstract form we came up with, as observers?
Can’t we think of the universe in itself as an “agent” not in the sense of it being “god”, but in the sense of it having preferences and acting on them?
Examples would be hot things liking to be apart and dispersion leading to coldness, or put more abstractly—one of the “preferences” of the universe is entropy. I’m sorry if I’m missing something super obvious, I failed out of university, haha!
If we let the “universe” be an agent in itself, so essentially it’s a composite of all simples there are (even the ones we’re not aware of), then all smaller composites by definition will adhere to the “preferences” of the “universe”, because from our current understanding of science, it seems like the “preferences” (laws) of the “universe” do not change when you cut the universe in half, unless you reach quantum scales, but even then, it is my unfounded suspicion that our previous models are simply laughably wrong, instead of the universe losing homogeneity at some arbitrary scale.
Of course, the “law” of the “universe” is very simple and uncomplex—it is akin to the most powerful “intelligence” or “agent” there is, but with the most “primitive” and “basic” “preferences”. Also apologies for using so many words in quotations, I do so, because I am unsure if I understand their intended meaning.
It seems to me that you could say that we’re all ultimately “dominated” by the “universe” itself, but in a way that’s not really escapeable, but in opposite, the “universe” is also “dominated” by more complex “agents”, as individuals can make sandwiches, while it’d take the “universe” much more time to create such complex and abstract composites from its pure “preferences”.
In a way, to me at least, it seems that both the “hyper-intelligent”, “powerful” “agent” needs the “complex”, “non-homogeneous”, “stupid” “agent”, because without that relationship, if there ever randomly came to exist a “non-homogeneous” “agent” with enough “intelligence” to “dominate” the “universe”, then we’d essentially experience… uh, give me a second, because this is a very complicated concept I read about long ago...
We’d experience the drop in the current energy levels all around the “universe”, because if the “universe” wasn’t the most “powerful” “agent” so far, then we’ve been existing in a “false vacuum”—essentially, the “universe” would be “dominated” by a “better” “agent” that adheres closer to the “true” “preferences” of the “universe”.
And the “preference” of the “true” “universe” seems to be to reach that “true vacuum” state, as it’s more in line with entropy, but it needs smaller and dumber agents that are essentially unknowingly “preferring” to “destroy” the universe as they know it, because it doesn’t seem to be possible to reach that state with only micro-perturbations, or it’d take such a long time, it’s more entropically sound to create bigger agents, that while really stupid, have far more “power” than the simple “universe”, because even though the simple agents do not grasp the nature of “fire”, “cold”, “entropy” or even “time”, they can easily make “sandwiches”, “chairs”, “rockets”, “civilizations” and “technology”.
I’d really appreciate it if someone tried to explain my confusions on the subject in private messages, as the thread here is getting very hard to read (at least for me, I’m very stupid!).
I really appreciate it if you read through my entire nonsensical garble, I hope someone’s charitable enough to enlighten me which assumptions I made are completely nonsensical.
I am not trying to be funny, snarky, ironic, sarcastic, I genuinely do not understand, I just found this website—sorry if I come off that way.
Have a great day!
The question is how to identify particular bubbles of seekingness in the universe. How can you tell which part of the universe will respond to changes in other parts’ shape by reshaping them, and how? How do you know when a cell wants something, in the sense that if the process of getting the thing is interfered with, it will generate physical motions that end up compensating for the interference. How do you know if it wants the thing, if it responds differently to different sizes of interference? Can we identify conflict between two bubbles of seekingness? etc.
The key question is how to identify when a physical has a preference for one thing over another. The hope is that, if we find a sufficiently coherent causal mechanism description that specifies what physical systems qualify as
For what it’s worth, I think you’re on a really good track here, and I’m very excited about views that have the one you’re starting with. I’d invite browsing my account and links, as this is something I talk about often, from various perspectives, though mostly I defer to others for getting the math right.
Speaking of getting the math right: read Discovering Agents (or browse related papers), it’s a really great paper. it’s not an easy first paper to read, but I’m a big believer in out-of-order learning and jumping way ahead of your current level to get a sense of what’s out there. Also check out the related paper Interpreting systems as solving POMDPs (or browse ) related papers.
If you’re also new to scholarship in general, I’d also suggest checking out some stuff on how to do scholarship efficiently as well. a friend and I trimmed an old paper I like on how to read papers efficiently, and posted it to LW the other day. You can also find more related stuff from the tags on that post. (I reference that article myself occasionally and find myself surprised by how dense it is as a checklist of visits if I’m trying to properly understand a paper.)
I’ll read the papers once I get on the computer—don’t worry, I may have not finished uni, but I always loved reading papers over a cup of tea.
I’m kind of writing about this subject right now, so maybe there you can find something that interests you.
How do I know what parts of the universe will respond to what changes? To me, at least, this seems like a mostly false question, for you to have true knowledge of that, you’d need to become the Universe itself. If you don’t care about true knowledge just good % chances, then you do it with heuristic. First you come up with composites that are somewhat self similar, but nothing is exactly alike in the Universe, except the Universe itself. Then you create a heuristic for predicting those composites and you use it, as long as the composite is similar enough to the original composite that the heuristic was based on. Of course, heuristics work differently in different environments, but often there are only a few environments even relevant for each composite, for if you take a fish out of water, it will die—now you may want a heuristic for an alive fish in the air, but I see it as much more useful to recompile the fish into catch at that point.
This of course applies on any level of composition, from specific specimens of fish, to ones from a specific family, to a single species, then to all fish, then to all living organisms, with as many steps in between these listed as you want. How do we discriminate between which composite level we ought to work with? Pure intuition and experiment, once you do it with logic, it all becomes useless, because logic will attempt to compression everything, even those things which have more utility being uncompressed.
I’ll get to the rest of your comment on PC, my fingers hurt. Typing on this new big phone is so hard lol.
Plus some other assumptions (capable of backwards induction, knowing trades in advance), right?
I’m curious whether these assumptions are actually stronger than, or related to, completeness.
Both sets (representable and not) are non-empty. The question remains about which set the interesting agents are in. I think that CCT + VNM, money pump arguments, etc. strongly hint, but do not prove, that the EU maximizers are the interesting ones.
Also, I personally don’t find the question itself particularly interesting, because it seems like one can move between these sets in a relatively shallow way (I’d be interested in seeing counterexamples, though). Perhaps that’s what Yudkowsky means by not caring about representability?
Yep, that’s right!
Since the Completeness assumption is about preferences while the backward-induction and knowing-trades-in-advance assumptions are not, they don’t seem very closely related to me. The assumption that the agent’s strict preferences are transitive is more closely related, but it’s not stronger than Completeness in the sense of implying Completeness.
Can you say a bit more about what you mean by ‘interesting agents’?
From your other comment:
I think this could well be right. The main thought I want to argue against is more like:
Even if you initially succeed in creating a powerful agent that doesn’t maximize expected utility, VNM/CCT/money-pump arguments make it likely that this powerful agent will later become an expected utility maximizer.
I meant stronger in a loose sense: you argued that “completeness doesn’t come for free”, but it seems more like actually what you’ve shown is that not-pursuing-dominated-strategies is the thing that doesn’t come for free.
You either need a bunch of assumptions about preferences, or you need one less of those assumptions, plus a few other assumptions about knowing trades, induction, and adherence to a specific policy.
And even given all these other assumptions, the proposed agent with a preferential gap seems like it’s still only epsilon-different from an actual EU maximizer. To me this looks like a strong hint that these assumptions actually do point at a core of something simple which one might call “coherence”, which I expect to show up in (all minus epsilon) advanced agents, even if there are pathological points in advanced-agent-space which don’t have these properties (and even if expected utility theory as a whole isn’t quite correct).
I see. I think this is right.
I agree with this too, but note that the agent with a single preferential gap is just an example. Agents can have arbitrarily many preferential gaps and still avoid pursuing dominated strategies. And agents with many preferential gaps may behave quite differently to expected utility maximizers.
You need only non-transitivity for money pump. Let’s suppose that you prefer A to B, B to C and you are indifferent between A and C (not cyclic, not transitive preference). You start with C, you pay me 1 dollar to switch to B, then you pay 1 dollar to switch to A, then I pay you 1 dollar to switch to C (which you do, because A = C implies C + 1 > A) and I have 1 free dollar. Note that your proposed policy doesn’t work here, because you do not strictly disprefer C + 1.
Nice point but this money-pump only rules out one kind of transitivity-violation (the agent strictly prefers A to B, strictly prefers B to C, and is indifferent between A and C). It doesn’t rule out this other kind of transitivity-violation: the agent strictly prefers A to B, strictly prefers B to C, and has a preferential gap between A and C.
Wait, I can construct a money pump for that situation. First let the agent choose between A and C. If there’s a preferential gap, the agent should sometimes choose C. Then let the agent pay a penny to upgrade from C to B. Then let the agent pay a penny to upgrade from B to A. The agent is now where it could have been to begin with by choosing A in the first place, but 2 cents poorer.
Even if we ditch the completeness axiom, it sure seems like money pump arguments require us to assume a partial order.
What am I missing?
So this won’t work if the agent knows in advance what trades they’ll be offered and is capable of reasoning by backward induction. In that case, the agent will reason that they’d choose A-2p over B-1p if they reached that node, and would choose B-1p over C if they reached that node. So (they will reason), the choice between A and C is actually a choice between A and A-2p, and so they will reliably choose A.
And plausibly we should make assumptions like ‘the agent knows in advance what trades they will be offered’ and ‘the agent is capable of backward induction’ if we’re arguing about whether agents are rationally required to conform their preferences to the VNM axioms.
(If the agent doesn’t know in advance what trades they will be offered or is incapable of backward induction, then their pursuit of a dominated strategy need not indicate any defect in their preferences. Their pursuit of a dominated strategy can instead be blamed on their lack of knowledge and/or reasoning ability.)
That said, I’ve recently become less convinced that ‘knowing trades in advance’ is a reasonable assumption in the context of predicting the behaviour of advanced artificial agents. And your money-pump seems to work if we assume that the agent doesn’t know what trades they will be offered in advance. So maybe we do in fact have reason to expect that advanced artificial agents will have transitive preferences. (I say ‘maybe’ because there are some other relevant considerations pushing the other way, discussed in a paper-in-progress by Adam Bales.)
I don’t know, this still seems kind of sketchy to me. Say we change the experiment so that it costs the agent a penny to choose A in the initial choice: it will still take that choice, since A-1p is still preferable to A-2p. Compare this to a game where the agent can freely choose between A and C, and there’s no cost in pennies to either choice. Since there’s a preferential gap between A and C, the agent will sometimes pick A and sometimes pick C. In the first game, on the other hand the agent always picks A. Yet in the first game, not only is picking A more costly, but we’ve only added options for the agent if it picks C. In other words, an agent that has A>B, B>C, and A~C sure looks like it’s paying to take options away from itself, since adding options makes it less likely to pick C, even when it costs a penny to avoid it.
Nice! This is a cool case. The behaviour does indeed seem weird. I’m inclined to call it irrational. But the agent isn’t pursuing a dominated strategy: in neither game does the agent settle on an option that they strictly disprefer to some other available option.
This discussion is interesting and I’m happy to keep having it, but perhaps it’s worth saying (if not for your sake then for other readers) that this is a side-thread. The main point of the post is that there are no money-pumps for Completeness. I think that there are probably no money-pumps for Transitivity either, but it’s the claim about Completeness that I really want to defend.
Cool. For me personally, I think that paying to avoid being given more options looks enough like being dominated that I’d want to keep the axiom of transitivity around, even if it’s not technically a money pump.
So in the case where we have transitivity but no completeness, it seems kind of like there might be a weaker coherence theorem, where the agent’s behaviour can be described by rolling a dice to pick a utility function before beginning a game, and then subsequently playing according to that utility function. Under this interpretation, if A > B then that means that A is preferred to B under all utility functions the agent could pick, while a preferential gap between A and B means that sometimes A will be ranked higher and sometimes B will be ranked higher, depending on which utility function the die roll happens to land on.
Does this match your intuition? Is there an obvious counterexample to this “coherence conjecture”?
Your coherence conjecture sounds good! It sounds like it roughly matches this theorem:
Screenshot is from this paper.
This is cool. I don’t think violations of continuity are also in general exploitable, but I’d guess you should also be able to replace continuity with something weaker from Russell and Isaacs, 2020, just enough to rule out St. Petersburg-like lotteries, specifically any one of Countable Independence (which can also replace independence), the Extended Outcome Principle (which can also replace independence) or Limitedness, and then replace the real-valued utility functions with utility functions representable by “lexicographically ordered ordinal sequences of bounded real utilities”.
This also looks like a generalization of stochastic dominance.
“paying to avoid being given more options looks enough like being dominated that I’d want to keep the axiom of transitivity around”
Maybe offtopic but paying to avoid being given more options is a common strategy in negotiation.
It’s not a money pump, because money pump implies infinite cycle of profit. If your loses are bounded, you are fine.
Does I understand correctly that preferential gaps have size, like, i do not prefer A to B, I do not prefer A to B+1, but some large N exists that I prefer B + N to A?
That can be true (and will often be true when it comes to—e.g. - a human agent with a preferential gap between a Fabergé egg and a long-lost wedding album), but it’s not a necessary feature of preferential gaps.
Kind of tangential but I’d be interested in your take on how strongly money-pumping etc is actually an argument against full-on cyclical preferences? One way to think about why getting money-pumped is bad is because you have an additional preference to not pay money to go nowhere. But it feels like all this tells us is that “something has to go”, and if an agent is rationally permitted to modify its own preferences to avoid these situations then it seems a priori acceptable for it to instead just say something like “well actually I weight my cyclical preferences more highly so I’ll modify the preference against arbitrarily paying”
In other words, it feels like the money-pumping arguments presume this other preference that in a sense takes “precedence” over the cyclical ones and I’m not sure how to think about that still
(I’m not EJT, but for what it’s worth:)
I find the money-pumping arguments compelling not as normative arguments about what preferences are “allowed”, but as engineering/security/survival arguments about what properties of preferences are necessary for them to be stable against an adversarial environment (which is distinct from what properties are sufficient for them to be stable, and possibly distinct from questions of self-modification).
Yeah I agree that even if they fall short of normative constraints there’s some empirical content around what happens in adversarial environments. I think I have doubts that this stuff translates to thinking about AGIs too much though, in the sense that there’s an obvious story of how an adversarial environment selected for (partial) coherence in us, but I don’t see the same kinds of selection pressures being a force on AGIs. Unless you assume that they’ll want to modify themselves in anticipation of adversarial environments which kinda begs the question
Hmm, I was going to reply with something like “money-pumps don’t just say something about adversarial environments, they also say something about avoiding leaking resources” (e.g. if you have circular preferences between proximity to apples, bananas, and carrots, then if you encounter all three of them in a single room you might get trapped walking between them forever) but that’s also begging your original question—we can always just update to enjoy leaking resources, transmuting a “leak” into an “expenditure”.
Another frame here is that if you make/encounter an agent, and that agent self-modifies into/starts off as something which is happy to leak pretty fundamental resources like time and energy and material-under-control, then you’re not as worried about it? It’s certainly not competing as strongly for the same resources as you whenever it’s “under the influence” of its circular preferences.