Ingredients of Timeless Decision Theory
Followup to: Newcomb’s Problem and Regret of Rationality, Towards a New Decision Theory
Wei Dai asked:
“Why didn’t you mention earlier that your timeless decision theory mainly had to do with logical uncertainty? It would have saved people a lot of time trying to guess what you were talking about.”
...
All right, fine, here’s a fast summary of the most important ingredients that go into my “timeless decision theory”. This isn’t so much an explanation of TDT, as a list of starting ideas that you could use to recreate TDT given sufficient background knowledge. It seems to me that this sort of thing really takes a mini-book, but perhaps I shall be proven wrong.
The one-sentence version is: Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.
The three-sentence version is: Factor your uncertainty over (impossible) possible worlds into a causal graph that includes nodes corresponding to the unknown outputs of known computations; condition on the known initial conditions of your decision computation to screen off factors influencing the decision-setup; compute the counterfactuals in your expected utility formula by surgery on the node representing the logical output of that computation.
To obtain the background knowledge if you don’t already have it, the two main things you’d need to study are the classical debates over Newcomblike problems, and the Judea Pearl synthesis of causality. Canonical sources would be “Paradoxes of Rationality and Cooperation” for Newcomblike problems and “Causality” for causality.
For those of you who don’t condescend to buy physical books, Marion Ledwig’s thesis on Newcomb’s Problem is a good summary of the existing attempts at decision theories, evidential decision theory and causal decision theory. You need to know that causal decision theories two-box on Newcomb’s Problem (which loses) and that evidential decision theories refrain from smoking on the smoking lesion problem (which is even crazier). You need to know that the expected utility formula is actually over a counterfactual on our actions, rather than an ordinary probability update on our actions.
I’m not sure what you’d use for online reading on causality. Mainly you need to know:
That a causal graph factorizes a correlated probability distribution into a deterministic mechanism of chained functions plus a set of uncorrelated unknowns as background factors.
Standard ideas about “screening off” variables (D-separation).
The standard way of computing counterfactuals (through surgery on causal graphs).
It will be helpful to have the standard Less Wrong background of defining rationality in terms of processes that systematically discover truths or achieve preferred outcomes, rather than processes that sound reasonable; understanding that you are embedded within physics; understanding that your philosophical intutions are how some particular cognitive algorithm feels from inside; and so on.
The first lemma is that a factorized probability distribution which includes logical uncertainty—uncertainty about the unknown output of known computations—appears to need cause-like nodes corresponding to this uncertainty.
Suppose I have a calculator on Mars and a calculator on Venus. Both calculators are set to compute 123 * 456. Since you know their exact initial conditions—perhaps even their exact initial physical state—a standard reading of the causal graph would insist that any uncertainties we have about the output of the two calculators, should be uncorrelated. (By standard D-separation; if you have observed all the ancestors of two nodes, but have not observed any common descendants, the two nodes should be independent.) However, if I tell you that the calculator at Mars flashes “56,088” on its LED display screen, you will conclude that the Venus calculator’s display is also flashing “56,088″. (And you will conclude this before any ray of light could communicate between the two events, too.)
If I was giving a long exposition I would go on about how if you have two envelopes originating on Earth and one goes to Mars and one goes to Venus, your conclusion about the one on Venus from observing the one on Mars does not of course indicate a faster-than-light physical event, but standard ideas about D-separation indicate that completely observing the initial state of the calculators ought to screen off any remaining uncertainty we have about their causal descendants so that the descendant nodes are uncorrelated, and the fact that they’re still correlated indicates that there is a common unobserved factor, and this is our logical uncertainty about the result of the abstract computation. I would also talk for a bit about how if there’s a small random factor in the transistors, and we saw three calculators, and two showed 56,088 and one showed 56,086, we would probably treat these as likelihood messages going up from nodes descending from the “Platonic” node standing for the ideal result of the computation—in short, it looks like our uncertainty about the unknown logical results of known computations, really does behave like a standard causal node from which the physical results descend as child nodes.
But this is a short exposition, so you can fill in that sort of thing yourself, if you like.
Having realized that our causal graphs contain nodes corresponding to logical uncertainties / the ideal result of Platonic computations, we next construe the counterfactuals of our expected utility formula to be counterfactuals over the logical result of the abstract computation corresponding to the expected utility calculation, rather than counterfactuals over any particular physical node.
You treat your choice as determining the result of the logical computation, and hence all instantiations of that computation, and all instantiations of other computations dependent on that logical computation.
Formally you’d use a Godelian diagonal to write:
Argmax[A in Actions] in Sum[O in Outcomes](Utility(O)*P(this computation yields A []-> O|rest of universe))
(where P( X=x []-> Y | Z ) means computing the counterfactual on the factored causal graph P, that surgically setting node X to x, leads to Y, given Z)
Setting this up correctly (in accordance with standard constraints on causal graphs, like noncircularity) will solve (yield reflectively consistent, epistemically intuitive, systematically winning answers to) 95% of the Newcomblike problems in the literature I’ve seen, including Newcomb’s Problem and other problems causing CDT to lose, the Smoking Lesion and other problems causing EDT to fail, Parfit’s Hitchhiker which causes both CDT and EDT to lose, etc.
Note that this does not solve the remaining open problems in TDT (though Nesov and Dai may have solved one such problem with their updateless decision theory). Also, although this theory goes into much more detail about how to compute its counterfactuals than classical CDT, there are still some visible incompletenesses when it comes to generating causal graphs that include the uncertain results of computations, computations dependent on other computations, computations uncertainly correlated to other computations, computations that reason abstractly about other computations without simulating them exactly, and so on. On the other hand, CDT just has the entire counterfactual distribution rain down on the theory as mana from heaven (e.g. James Joyce, Foundations of Causal Decision Theory), so TDT is at least an improvement; and standard classical logic and standard causal graphs offer quite a lot of pre-existing structure here. (In general, understanding the causal structure of reality is an AI-complete problem, and so in philosophical dilemmas the causal structure of the problem is implicitly given in the story description.)
Among the many other things I am skipping over:
Some actual examples of where CDT loses and TDT wins, EDT loses and TDT wins, both lose and TDT wins, what I mean by “setting up the causal graph correctly” and some potential pitfalls to avoid, etc.
A rather huge amount of reasoning which defines reflective consistency on a problem class; explains why reflective consistency is a rather strong desideratum for self-modifying AI; why the need to make “precommitments” is an expensive retreat to second-best and shows lack of reflective consistency; explains why it is desirable to win and get lots of money rather than just be “reasonable” (that is conform to pre-existing intuitions generated by a pre-existing algorithm); which notes that, considering the many pleas from people who want, but can’t find any good intermediate stage between CDT and EDT, it’s a fascinating little fact that if you were rewriting your own source code, you’d rewrite it to one-box on Newcomb’s Problem and smoke on the smoking lesion problem...
...and so, having given many considerations of desirability in a decision theory, shows that the behavior of TDT corresponds to reflective consistency on a problem class in which your payoff is determined by the type of decision you make, but not sensitive to the exact algorithm you use apart from that—that TDT is the compact way of computing this desirable behavior we have previously defined in terms of reflectively consistent systematic winning.
Showing that classical CDT, given self-modification ability, modifies into a crippled and inelegant form of TDT.
Using TDT to fix the non-naturalistic behavior of Pearl’s version of classical causality in which we’re supposed to pretend that our actions are divorced from the rest of the universe—the counterfactual surgery, written out Pearl’s way, will actually give poor predictions for some problems (like someone who two-boxes on Newcomb’s Problem and believes that box B has a base-rate probability of containing a million dollars, because the counterfactual surgery says that box B’s contents have to be independent of the action). TDT not only gives the correct prediction, but explains why the counterfactual surgery can have the form it does—if you condition on the initial state of the computation, this should screen off all the information you could get about outside things that affect your decision; then your actual output can be further determined only by the Godel-diagonal formula written out above, permitting the formula to contain a counterfactual surgery that assumes its own output, so that the formula does not need to infinitely recurse on calling itself.
An account of some brief ad-hoc experiments I performed on IRC to show that a majority of respondents exhibited a decision pattern best explained by TDT rather than EDT or CDT.
A rather huge amount of exposition of what TDT decision theory actually corresponds to in terms of philosophical intuitions, especially those about “free will”. For example, this is the theory I was using as hidden background when I wrote in “Causality and Moral Responsibility” that factors like education and upbringing can be thought of as determining which person makes a decision—that you rather than someone else makes a decision—but that the decision made by that particular person is up to you. This corresponds to conditioning on the known initial state of the computation, and performing the counterfactual surgery over its output. I’ve actually done a lot of this exposition on OBLW without explicitly mentioning TDT, like Timeless Control and Thou Art Physics for reconciling determinism with choice (actually effective choice requires determinism, but this confuses humans for reasons given in Possibility and Could-ness). But if you read the other parts of the solution to “free will”, and then furthermore explicitly formulate TDT, then this is what utterly, finally, completely, and without even a tiny trace of confusion or dissatisfaction or a sense of lingering questions, kills off entirely the question of “free will”.
Some concluding chiding of those philosophers who blithely decided that the “rational” course of action systematically loses; that rationalists defect on the Prisoner’s Dilemma and hence we need a separate concept of “social rationality”; that the “reasonable” thing to do is determined by consulting pre-existing intuitions of reasonableness, rather than first looking at which agents walk away with huge heaps of money and then working out how to do it systematically; people who take their intuitions about free will at face value; assuming that counterfactuals are fixed givens raining down from the sky rather than non-observable constructs which we can construe in whatever way generates a winning decision theory; et cetera. And celebrating of the fact that rationalists can cooperate with each other, vote in elections, and do many other nice things that philosophers have claimed they can’t. And suggesting that perhaps next time one should extend “rationality” a bit more credit before sighing and nodding wisely about its limitations.
In conclusion, rational agents are not incapable of cooperation, rational agents are not constantly fighting their own source code, rational agents do not go around helplessly wishing they were less rational, and finally, rational agents win.
Those of you who’ve read the quantum mechanics sequence can extrapolate from past experience that I’m not bluffing. But it’s not clear to me that writing this book would be my best possible expenditure of the required time.
- Eliezer’s Sequences and Mainstream Academia by 15 Sep 2012 0:32 UTC; 243 points) (
- The Commitment Races problem by 23 Aug 2019 1:58 UTC; 152 points) (
- How I Lost 100 Pounds Using TDT by 14 Mar 2011 15:50 UTC; 135 points) (
- Decision Theories: A Less Wrong Primer by 13 Mar 2012 23:31 UTC; 110 points) (
- What a reduction of “could” could look like by 12 Aug 2010 17:41 UTC; 84 points) (
- Common mistakes people make when thinking about decision theory by 27 Mar 2012 20:03 UTC; 67 points) (
- A problem with Timeless Decision Theory (TDT) by 4 Feb 2010 18:47 UTC; 48 points) (
- Original Research on Less Wrong by 29 Oct 2012 22:50 UTC; 48 points) (
- Decision theory: Why Pearl helps reduce “could” and “would”, but still leaves us with at least three alternatives by 6 Sep 2009 6:10 UTC; 48 points) (
- Decision Theory Paradox: PD with Three Implies Chaos? by 27 Aug 2011 19:22 UTC; 42 points) (
- What Program Are You? by 12 Oct 2009 0:29 UTC; 36 points) (
- VNM expected utility theory: uses, abuses, and interpretation by 17 Apr 2010 20:23 UTC; 36 points) (
- Recommended Reading for Friendly AI Research by 9 Oct 2010 13:46 UTC; 35 points) (
- 20 Apr 2012 21:22 UTC; 32 points) 's comment on Stupid Questions Open Thread Round 2 by (
- Decision theory: An outline of some upcoming posts by 25 Aug 2009 7:34 UTC; 31 points) (
- Decision Theories: A Semi-Formal Analysis, Part II by 6 Apr 2012 18:59 UTC; 26 points) (
- How Many of Me Are There? by 15 Apr 2011 19:00 UTC; 17 points) (
- 31 Aug 2010 6:00 UTC; 12 points) 's comment on Harry Potter and the Methods of Rationality discussion thread, part 3 by (
- 2 Dec 2012 0:46 UTC; 10 points) 's comment on Philosophy Needs to Trust Your Rationality Even Though It Shouldn’t by (
- 15 Nov 2011 20:28 UTC; 10 points) 's comment on Existential Risk by (
- (answered: yes) Has anyone written up a consideration of Downs’s “Paradox of Voting” from the perspective of MIRI-ish decision theories (UDT, FDT, or even just EDT)? by 6 Jul 2020 18:26 UTC; 10 points) (
- 10 Sep 2009 16:24 UTC; 8 points) 's comment on Outlawing Anthropics: An Updateless Dilemma by (
- 21 Sep 2023 16:39 UTC; 7 points) 's comment on Reflexive decision theory is an unsolved problem by (
- Acknowledgements & References by 14 Dec 2019 7:04 UTC; 6 points) (
- Rational lies by 23 Nov 2009 3:32 UTC; 6 points) (
- 1 Sep 2010 2:18 UTC; 5 points) 's comment on Harry Potter and the Methods of Rationality discussion thread, part 3 by (
- 31 Aug 2010 2:41 UTC; 4 points) 's comment on Morality as Parfitian-filtered Decision Theory? by (
- 3 Aug 2010 9:28 UTC; 2 points) 's comment on Open Thread, August 2010 by (
- 5 Feb 2010 7:17 UTC; 2 points) 's comment on A problem with Timeless Decision Theory (TDT) by (
- 16 May 2012 22:15 UTC; 2 points) 's comment on General purpose intelligence: arguing the Orthogonality thesis by (
- 10 Nov 2010 4:26 UTC; 1 point) 's comment on Rationality Quotes: November 2010 by (
- 18 May 2011 21:16 UTC; 0 points) 's comment on Newcomb’s Problem and Regret of Rationality by (
- 3 Sep 2010 1:38 UTC; 0 points) 's comment on Morality as Parfitian-filtered Decision Theory? by (
- 5 Dec 2009 13:31 UTC; 0 points) 's comment on Call for new SIAI Visiting Fellows, on a rolling basis by (
- 17 Aug 2010 17:30 UTC; 0 points) 's comment on Desirable Dispositions and Rational Actions by (
- 16 Mar 2014 23:57 UTC; 0 points) 's comment on Overcoming the Loebian obstacle using evidence logic by (
- 15 May 2011 17:06 UTC; -59 points) 's comment on Designing Rationalist Projects by (
Today I finally came up with a simple example where TDT clearly loses and CDT clearly wins, and as a bonus, proves that TDT isn’t reflectively consistent.
Omega comes to you and says
Say the payoffs of the PD are
5⁄5 0⁄6
6⁄0 1⁄1
Suppose you submit an AI running CDT. Then, Omega’s AIs will reason as follows: “I have 1⁄2 chance of playing against a TDT, and 1⁄2 chance of playing against a CDT. If I play C, then my opponent will play C if it’s a TDT, and D if it’s a CDT, therefore my expected payoff is 5/2+0/2=2.5. If I play D, then my opponent will play D, so my payoff is 1. Therefore I should play C.” Your AI then gets a payoff of 6, since it will play D.
Suppose you submit an AI running TDT instead. Then everyone will play C, so your AI will get a payoff of 5.
So you submit a CDT, whether you are running CDT or TDT. That’s because explicitly giving the source code of your submitted AI to the other AIs makes the consequences of your decision the same under CDT and under TDT.
Suppose you have to play this game yourself instead of delegating it, you can self-modify, and the payoffs are large enough, you’d modify yourself from running TDT to running some other DT that plays D in this game! (Notice that I specified that Omega’s AIs can’t self-modify, so your decision to self-modify won’t have the logical consequence that they also self-modify.)
It seems that I’ve given a counter-example to the claim that
Or does my example fall outside of the specified problem class?
If I wanted to defend the original thesis, I would say yes, because TDT doesn’t cooperate or defect depending directly on your decision, but cooperates or defects depending on how your decision depends on its decision (which was one of the open problems I listed—the original TDT is for cases where Omega offers you straightforward dilemmas in which its behavior is just a direct transform of your behavior). So where one algorithm has one payoff matrix for defection or cooperation, the other algorithm gets a different payoff matrix for defection or cooperation, which breaks the “problem class” under which the original TDT is automatically reflectively consistent.
Nonetheless it’s certainly an interesting dilemma.
Your comment here is actually pre-empting a comment that I’d planned to make after providing some of the background for the content of TDT. I’d thought about your dilemmas, and then did manage to translate into my terms a notion about how it might be possible to unilaterally defect in the Prisoner’s Dilemma and predictably get away with it, provided you did so for unusual reasons. But the conditions on “unusual reasons” are much more difficult than your posts seem to imply. We can’t all act on unusual reasons and end up doing the same thing, after all. How is it that these two TDT AIs got here, if not by act of Omega, if the sensible thing to do is always to submit a CDT AI?
To introduce yet another complication: What if the TDTs that you’re playing against, decide to defect unconditionally if you submit a CDT player, in order to give you an incentive to submit a TDT player? Given that your reason for submitting a CDT player involves your expectation about how the TDT players will respond, and that you can “get away with it”? It’s the TDT’s responses that make them “exploitable” by your decision to submit a CDT player—so what if they employ a different strategy instead? (This is another open problem—“who acts first” in timeless negotiations.)
There might be a certain sense in which being in a “small subgroup internally correlated but not correlated with larger groups” could possibly act as a sort of resource for getting away with defection in the true PD, because if you’re in a large group then defecting shifts the probability of an opponent likewise defecting by a lot, but if you’re in a small subgroup then it shifts the probability of the opponent defecting by a little, so there’s a lower penalty for defection, so in marginal cases a small subgroup might play defection while a large subgroup plays cooperate. (But again, the conditions on this are difficult. If all small subgroups reason this way, then all small subgroups form a large correlated group!)
Anyway—you can’t end up in a small subgroup if you start out in a large one, because if you decide to deliberately condition on noise in order to decrease the size of your subgroup, that itself is a correlated sort of decision with a clear line of reasoning and motive, and others in your correlated group will try doing the same thing, with predictable results. So to the extent that lots of AI designers in distant parts of Reality are discussing this same issue with the same logic, we are already in a group of a certain minimum size.
But this does lead to an argument for CEV (values extrapolating / Friendly AI) algorithms that don’t automatically, inherently correlate us with larger groups than we already started out being in. If uncorrelation is a nonrenewable resource then FAI programmers should at least be careful not to wantonly burn it. You can’t deliberately add noise, but you might be able to preserve existing uncorrelation.
Also, other TDTs can potentially set their “minimum cooperator frequency threshold” at just the right level that if any group of noticeable size chooses to defect, all the TDTs start defecting—though this itself is a possibility I am highly unsure of, and once again it has to do with “who goes first” in timeless strategies, which is an open problem.
But these are issues in which my understanding is still shaky, and it very rapidly gets us into very dangerous territory like trying to throw the steering wheel out the window while playing chicken.
So far as evolved biological organisms go, I suspect that the ones who create successful Friendly AIs (instead of losing control and dying at the hands of paperclip maximizers), would hardly start out seeing only the view from CDT—most of them/us would be making the decision “Should I build TDT, knowing that the decisions of other biological civilizations are correlated to this one?” and not “Should I build TDT, having never thought of that?” In other words, we may already be part of a large correlated subgroup—though I sometimes suspect that most of the AIs out there are paperclip maximizers born of experimental accidents, and in that case, if there is no way of verifying source code, nor of telling the difference between SIs containing bio-values-preserving civs and SIs containing paperclip maximizers, then we might be able to exploit the relative smallness of the “successful biological designer” group...
...but a lot of this presently has the quality of “No fucking way would I try that in real life”, at least based on my current understanding. The closest I would get might be trying for a CEV algorithm that did not inherently add correlation to decision systems with which we were not already correlated.
You’re right, I failed to realize that with timeless agents, we can’t do backwards induction using the physical order of decisions. We need some notion of the logical order of decisions.
Here’s an idea. The logical order of decisions is related to simulation ability. Suppose A can simulate B, meaning it has trustworthy information about B’s source code and has sufficient computing power to fully simulate B or sufficient intelligence to analyze B using reliable shortcuts, but B can’t simulate A. Then the logical order of decisions is B followed by A, because when B makes his decision, he can treat A’s decision as conditional on his. But when A makes her decision, she has to take B’s decision as a given.
Does that make sense?
Moving second is a disadvantage (at least it seems to always work out that way, counterexamples requested if you can find them) and A can always use less computing power. Rational agents should not regret having more computing power (because they can always use less) or more knowledge (because they can always implement the same strategy they would use with less knowledge) - this sort of thing is a sure sign of reflective inconsistency.
To see why moving logically second is a disadvantage, consider that it lets an opponent playing Chicken always toss their steering wheel out the window and get away with it.
That both players desire to move “logically first” argues strongly that neither one will; that the resolution here does not involve any particular fixed global logical order of decisions.
(I should comment in the future about the possibility that bio-values-derived civs, by virtue of having evolved to be crazy, can succeed in moving logically first using crazy reasoning, but that would be a whole ’nother story, and of course also falls into the “Way the fuck too dangerous to try in real life” category relative to my present knowledge.)
BTW, thanks for this compact way of putting it.
Being logically second only keeps being a disadvantage because examples keep being chosen to be of the kind that make it so.
One category of counterexample comes from warfare, where if you know what the enemy will do and he doesn’t know what you will do, you have the upper hand. (The logical versus temporal distinction is clear here: being temporally the first to reach an objective can be a big advantage.)
Another counterexample is in negotiation where a buyer and seller are both uncertain about fair market price; each may prefer the other to be first to suggest a price. (In practice this is often resolved by the party with more knowledge, or more at stake, or both—usually the seller—being first to suggest a price.)
You’re right. Rock-paper-scissors is another counter-example. In these cases, the relationship between between the logical order of moves and simulation ability seems pretty obvious and intuitive.
Except that the analogy to rock-paper-scissors would be that I get to move logically first by deciding my conditional strategy “rock if you play scissors” etc., and simulating you simulating me without running into an apparently non-halting computation (that would otherwise have to be stopped by my performing counterfactual surgery on the part of you that simulates my own decision), then playing rock if I simulate you playing scissors.
At least I think that’s how the analogy would work.
I suspect that this kind of problems will run into computational complexity issues, not clever decision theory issues. Like with a certain variation on St. Petersburg paradox (see the last two paragraphs), where you need to count to the greatest finite number to which you can count, and then stop.
Suppose I know that’s your strategy, and decide to play the move equal to (the first googleplex digits of pi mod 3), and I can actually compute that but you can’t. What are you going to do?
If you can predict what I do, then your conditional strategy works, which just shows that move order is related to simulation ability.
In this zero-sum game, yes, it’s possible that whoever has the most computing power wins, if neither can access unpredictable random or private variables. But what if both sides have exactly equal computing power? We could define a Timeless Paper-Scissors-Rock Tournament this way—standard language, no random function, each program gets access to the other’s source code and exactly 100 million ticks, if you halt without outputting a move then you lose 2 points.
This game is pretty easy to solve, I think. A simple equilibrium is for each side to do something like iterate x = SHA-512(x), with a random starting value, using an optimal implementation of SHA-512, until time is just about to run out, then output x mod 3. SHA-512 is easy to optimize (in the sense of writing the absolutely fastest implementation), and It seems very unlikely that there could be shortcuts to computing (SHA-512)^n until n gets so big (around 2^256 unless SHA-512 is badly designed) that the function starts to cycle.
I think I’ve answered your specific question, but the answer doesn’t seem that interesting, and I’m not sure why you asked it.
Schneier et al here prove that being able to calculate H^n(x) quickly leads to a faster way of finding collisions in H. http://www.schneier.com/paper-low-entropy.html
Well, it’s probably not all that interesting from a purely theoretical perspective, but if the prize money was divided up among only the top fifth of players, you’d actually have to try to win, and that would be an interesting challenge for computer programmers.
But if you are TDT, you can’t always use less computing power, because that might be correlated with your opponents also deciding to use less computing power, or will be distrusted by your opponent because it can’t simulate you.
But if you simply don’t have that much computing power (and opponent knows this) then you seem to have the advantage of logically moving first.
Lack of computing power could be considered a form of “crazy reasoning”...
Why does TDT lead to the phenomenon of “stupid winners”? If there’s a way to explain this as a reasonable outcome, I’d feel a lot better. But is that like a two-boxer asking for an explanation of why, when the stupid (from their perspective) one-boxers keep winning, that’s a reasonable outcome?
Substitute “move logically first” for “use less computing power”? Using less computing power seems like a red herring to me. TDT on simple problems (with the causal / logical structure already given) uses skeletally small amounts of computing power. “Who moves first” is a “battle”(?) over the causal / logical structure, not over who can manage to run out of computing power first. If you’re visualizing this using lots of computing power for the core logic, rather than computing the 20th decimal place of some threshold or verifying large proofs, then we’ve got different visualizations.
The idea of “if you do this, the opponent does the same” might apply to trying to move logically first, but in my world this has nothing to do with computing power, so at this point I think it’d be pretty odd if the agents were competing to be stupider.
Besides, you don’t want to respond to most logical threats, because that gives your opponent an incentive to make logical threats; you only want to respond to logical offers that you want your opponent to have an incentive to make. This gets into the scary issues I was hinting at before, like determining in advance that if you see your opponent predetermine to destroy the universe in a mutual suicide unless you pay a ransom, you’ll call their bet and die with them, even if they’ve predetermined to ignore your decision, etcetera; but if they offer to trade you silver for gold at a Ricardian-advantageous rate, you’ll predetermine to cooperate, etc. The point, though, is that “If I do X, they’ll do Y” is not a blank check to decide that minds do X, because you could choose a different form of responsiveness.
But anyway, I don’t see in the first place that agents should be having these sorts of contests over how little computing power to use. That doesn’t seem to me like a compelling advantage to reach for.
If you’ve got that little computing power then perhaps you can’t simulate your opponent’s skeletally small TDT decision, i.e., you can’t use TDT at all. If you can’t close the loop of “I simulate you simulating me”—which isn’t infinite, and actually terminates rather quickly in the simple cases I know how to analyze at all, because we perform counterfactual surgery inside the loop—then you can’t use TDT at all.
No, I mean much crazier than that. Like “This doesn’t follow, but I’m going to believe it anyway!” That’s what it takes to get “unusual reasons”—the sort of madness that only strictly naturally selected biological minds would find compelling in advance of a timeless decision to be crazy. Like “I’M GOING TO THROW THE STEERING WINDOW OUT THE WHEEL AND I DON’T CARE WHAT THE OPPONENT PREDETERMINES” crazy.
It has not been established to my satisfaction that it does. It is a central philosophical intuition driving my decision theory that increased computing power, knowledge, or self-control, should not harm a rational agent.
...possibly employing mixed strategies, by analogy to the equilibrium of games where neither agent gets to go first and both must choose simultaneously? But I haven’t done anything with this idea, yet.
First of all, congratulations, Eliezer! That’s great work. When I read your 3-line description, I thought it would never be computable. I’m glad to see you can actually test it.
Eliezer_Yudkowsky wrote on 19 August 2009 03:05:15PM
Rock-paper-scissors ?
Negotiating to buy a car?
I would like to begin by saying that I don’t believe my own statements are True, and I suggest you don’t either. I do request that you try thinking WITH them before attacking them. It’s really hard to think with an idea AFTER you’ve attacked it. I’ve been told my writing sounds preachy or even fanatical. I don’t say “In My Opinion” enough. Please imagine “IMO” in front of every one of my statements. Thanks!
Having more information (not incorrect “information”) on the opponent’s decisions is beneficial.
Let’s distinguish Secret Commit & Simultaneous Effect (SCSE) from Commit First & Simultaneous Effect (CFSE) and from Act & Effect First (AEF). That’s just a few categories from a coarse categorization of board war games.
The classic gunfight at high noon is AEF (to a first approximation, not counting watching his face & guessing when his reaction time will be lengthened). The fighter who draws first has a serious advantage, the fighter who hits first has a tremendous advantage, but not certain victory. (Hollywood not withstanding, people sometimes keep fighting after taking handgun hits, even a dozen of them.) I contend that all AEFs give advantage to the first actor. Chess is AEF.
My understanding of the Prisoner’s Dilemma is that it is SCSE as presented. On this thread, it seems to have mutated into a CFSE (otherwise, there just isn’t any “first”, in the ordinary, inside-the-Box-Universe, timeful sense). If Prisoner A has managed to get information on Prisoner B’s commitment before he commits, this has to be useful. Even if PA is a near-Omega, it can be a reality check on his Visualization of the Cosmic All. In realistic July 2009 circumstances, it identifies PB as one of the 40% of humans who choose ‘cooperate’ in one-shot PD. PA now has a choice whether to be an economist or a friend.
And now we get down to something fundamental. Some humans are better people than the economic definition of rationality, which ” … assume that each player cares only about minimizing his or her own time in jail”. ” … cooperating is strictly dominated) by defecting … ” even with leaked information.
“I don’t care what happens to my partner in crime. I don’t and I won’t. You can’t make me care. On the advice of my economist… ” That gets both prisoners a 5-year sentence when they could have had 6 months.
That is NOT wisdom! That will make us extinct. (In My Opinion)
Now try on “an injury to one is an injury to all”. Or maybe “an injury to one is an (discounted) injury to ME”. We just might be able to see that the big nuclear arsenals are a BAD IDEA!
Taking that on, the payoff matrix offered by Wei Dai’s Omega (19 August 2009 07:08:23AM)
is now transformed into PA’s Internal Payoff Matrix (IPM)
In other words, his utility function has a term for the freedom of Prisoner B. (Economists be damned! Some of us do, sometimes.)
“I’ll set κ=0.3 ,” Says PA (well, he is a thief). Now PA’s IPM is:
Lo and behold! ‘cooperate’ now strictly dominates!
When over 6 billion people are affected, it doesn’t take much of a κ to swing my decisions around. If I’m not working to save humanity, I must have a very low κ for each distant person unknown to me.
People say, “Human life is precious!” Show it to me in results. Show it to me in how people budget their time and money. THAT is why Friendly AI is our only hope. We will ‘defect’ our way into thwarting any plan that requires a lot of people to change their beliefs or actions. That sub-microscopic κ for unknown strangers is evolved-in, it’s not going away. We need a program that can be carried out by a tiny number of people.
.
.
.
IMO.
---=
Maybe I missed the point. Maybe the whole point of TDT is to derive some sort of reduced-selfishness decision norm without an ad-hoc utility function adjustment (is that what “rained down from heaven” means?). I can derive the κ needed in order to save humanity, if there were a way to propagate it through the population. I cannot derive The One True κ from absolute principles, nor have I shown a derivation of “we should save humanity”. I certainly fell short of ” … looking at which agents walk away with huge heaps of money and then working out how to do it systematically … ”. I would RATHER look at which agents get their species through their singularity alive. Then, and only then, can we look at something grander than survival. I don’t grok in fullness “reflective consistency”, but from extinction we won’t be doing a lot of reflecting on what went wrong.
IMO.
Now, back to one-shot PD and “going first”. For some values of κ and some external payoff matrices (not this one), the resulting IPM is not strictly dominated, and having knowledge of PB’s commitment actually determines whether ‘cooperate’ or ‘defect’ produces a better world in PA’s internal not-quite-so-selfish world-view. Is that a disadvantage? (That’s a serious, non-rhetorical question. I’m a neophyte and I may not see some things in the depths where Eliezer & Wei think.)
Now let’s look at that game of chicken. Was “throw out the steering wheel” in the definition of the thought experiment? If not, that player just changed the universe-under-consideration, which is a fairly impressive effect in an AEF, not a CFSE.
If re-engineering was included, then Driver A may complete his wheel-throwing (while in motion!) only to look up and see Driver B’s steering gear on a ballistic trajectory. Each will have a few moments to reflect on “always get away with it.”
If Driver A successfully defenestrates first, is Driver B at a disadvantage? Among humans, the game may be determined more by autonomic systems than by conscious computation, and B now knows that A won’t be flinching away. However, B now has information and choices. One that occurs to me is to stop the car and get out. “Your move, A.” A truly intelligent player (in which category I do not, alas, qualify) would think up better, or funnier, choices.
Hmmm… to even play Chicken you have to either be irrational or have a damned strange IPM. We should establish that before proceeding further.
I challenge anyone to show me a CFSE game that gives a disadvantage to the second player.
I’m not too proud to beg: I request your votes. I’ve got an article I’d like to post, and I need the karma.
Thanks for your time and attention.
RickJS
Saving Humanity from Homo Sapiens
08/28/2009 ~20:10 Edit: formatting … learning formatting … grumble … GDSOB tab-deleter … Fine. I’ll create the HTML for tables, but this is a LOT of work for 3 simple tables … COMMENT TOO LONG!?!? … one last try … now I can’t quit, I’m hooked! … NAILED that sucker! … ~22:40 : added one more example *YAWN*
It’s incomprehensible. Try debugging individual ideas first, written up more carefully.
BTW, thanks for this compact way of putting it.
This reminds me of logical Fatalism and the Argument from Bivalence
That’s a good point, but what if the process that gives birth to CDT doesn’t listen to the incentives you give it? For example, it could be evolution or random chance.
Here’s an example, similar to Wei’s example above. Imagine two parallel universes, both containing large populations of TDT agents. In both universes, a child is born, looking exactly like everyone else. The child in universe A is a TDT agent named Alice. The child in universe B is named Bob and has a random mutation that makes him use CDT. Both children go on to play many blind PDs with their neighbors. It looks like Bob’s life will be much happier than Alice’s, right?
What force will push against evolution and keep the number of Bobs small?
The problem is that “source code of your AI” is not a complete story, since your decisions as AI programmer also depended on the Omega AIs’ code, and so what you give as the source of AI is already only one of the possible worlds that presupposes the behavior of Omega AIs.
Yes, I think Eliezer made a similar point:
So if you run TDT, then there are at least two equilibria in this game, only one of which involves you submitting a CDT. Can you think of a way to select between these two equilibria?
If not, I can fix this by changing the game a bit. Omega will now create his TDT AIs after you design yours, and hard code the source code of your AI into it as givens. His AIs won’t even know about you, the real player.
They might simply infer you, the real player. You might as well tell the TDT AIs that they’re up against a hardcoded Defect move as the “other player”, but they won’t know if that player has been selected. In fact, that pretty much is what you’re telling them, if you show them a CDT player. The CDT player is a red herring—the decision to defect was made by you, in the moment of submitting a CDT player. There is no law against TDT players realizing this after Omega codes them.
I should note that in matters such as these, the phrase “hard code” should act as a warning sign that you’re trying to fix something that, at least in your own mind, doesn’t want to be fixed. (E.g. “hard code obedience into AIs, build it into the very circuitry!”) Where you are tempted to say “hard code” you may just need to accept whatever complex burden you were trying to get rid of by saying “fix it in place with codes of iron!”
By hard code, I meant code it into the TDT’s probability distribution. (Even TDT isn’t meta enough to say “My prior is wrong!”) But that does make the example less convincing, so let me try something else.
Have Omega’s AIs physically go first and you play for yourself. They get a copy of your source code, then make their moves in the 3-choose-2 PD game first. You learn their move, then make your choice. Now, if you follow CDT, you’ll reason that your decision has no causal effect on the TDT’s decisions, and therefore choose D. The TDTs, knowing this, will play C.
And I think I can still show that if you run TDT, you will decide to self-modify into CDT before starting this game. First, if Omega’s AIs know that you run TDT at the beginning, then they can use that “play D if you self-modify” strategy to deter you from self-modifying. But you can also use “I’ll self-modify anyway” to deter them from doing that. So who wins this game? (If someone moves first logically, then he wins, but what if everyone moves simultaneously in the logical sense, which seems to be the case in this game?)
Suppose it’s common knowledge that Omega mostly chooses CDT agents to participate in this game, then “play D if you self-modify” isn’t very “credible”. That’s because they only see your source code after you self-modify so they’d have to play D if they predict that a TDT agent would self-modify, even if the actual player started with CDT. Given that, your “I’ll self-modify anyway” would be highly credible.
I’m not sure how to formalize this notion of “credibility” among TDTs, but it seems to make intuitive sense.
Well that should never happen. Anything that would make a TDT want to self-modify into CDT should make it just want to play D, no need for self-modification. It should give the same answer at different times, that’s what makes it a timeless decision theory. If you can break that without direct explicit dependence on the algorithm apart from its decisions, then I am in trouble! But it seems to me that I can substitute “play D” for “self-modify” in all cases above.
E.g., “play D if you play D to deter you from playing D” seems like the same idea, the self-modification doesn’t add anything.
Well… it partially seems to me that, in assuming certain decisions are made without logical consequences—because you move logically first, or because the TDT agents have fixed wrong priors, etc. - you are trying to reduce the game to a Prisoner’s Dilemma in which you have a certain chance of playing against a piece of cardboard with “D” written on it. Even a uniform population of TDTs may go on playing C in this case, of course, if the probability of facing cardboard is low enough. But by the same token, the fact that the cardboard sometimes “wins” does not make it smarter or more rational than the TDT agents.
Now, I want to be very careful about how I use this argument, because indeed a piece of cardboard with “only take box B” written on it, is smarter than CDT agents on Newcomb’s Problem. But who writes that piece of cardboard, rather than a different one?
An authorless piece of cardboard genuinely does go logically first, but at the expense of being a piece of cardboard, which makes it unable to adapt to more complex situations. A true CDT agent goes logically first, but at the expense of losing on Newcomb’s Problem. And your choice to put forth a piece of cardboard marked “D” relies on you expecting the TDT agents to make a certain response, which makes the claim that it’s really just a piece of cardboard and therefore gets to go logically first, somewhat questionable.
Roughly, what I’m trying to reply is that you’re reasoning about the response of the TDT agents to your choosing the CDT algorithm, which makes you TDT, but you’re also trying to force your choice of the CDT algorithm to go logically first, but this is begging the question.
I would, perhaps, go so far as to agree that in an extension of TDT to cases in which certain agents magically get to go logically first, then if those agents are part of a small group uncorrelated with yet observationally indistinguishable from a large group, the small group might make a correlated decision to defect “no matter what” the large group does, knowing that the large group will decide to cooperate anyway given the payoff matrix. But the key assumption here is the ability to go logically first.
It seems to me that the incompleteness of my present theory when it comes to logical ordering is the real key issue here.
The reason to self-modify is to make yourself indistinguishable from players who started as CDT agents, so that Omega’s AIs can’t condition their moves on the player’s type. Remember that Omega’s AIs get a copy of your source code.
But a CDT agent would self-modify into something not losing on Newcomb’s problem if it expects to face that. On the other hand, if TDT doesn’t self-modify into something that wins my game, isn’t that worse? (Is it better to be reflectively consistent, or winning, if you had to choose one?)
Yes, I agree that’s a big piece of the puzzle, but I’m guessing the solution to that won’t fully solve the “stupid winner” problem.
ETA: And for TDT agents that move simultaneously, there remains the problem of “bargaining” to use Nesov’s term. Lots of unsolved problems… I wish you started us working on this stuff earlier!
Being (or performing an action) indistinguishable from X doesn’t protect you from the inference that X probably resulted from such a plot. That you can decide to camouflage like this may even reduce X’s own credibility (and so a lot of platonic/possible agents doing that will make the configuration unattractive). Thus, the agents need to decide among themselves what to look like: first-mover configurations is a limited resource.
(This seems like a step towards solving bargaining.)
Yes, I see that your comment does seem like a step towards solving bargaining among TDT agents. But I’m still trying to argue that if we’re not TDT agents yet, maybe we don’t want to become them. My comment was made in that context.
Let’s pick up Eliezer’s suggestion and distinguish now-much-less-mysterious TDT from the different idea of “updateless decision theory”, UDT, that describes choice of a whole strategy (function from states of knowledge to actions) rather than choice of actions in each given state of knowledge, of which latter class TDT is an example. TDT isn’t a UDT, and UDT is a rather vacuous statement, as it only achieves reflective consistency pretty much by definition, but doesn’t tell much about the structure of preference and how to choose the strategy.
I don’t want to become a TDT agent, as in UDT sense, TDT agents aren’t reflectively consistent. They could self-modify towards more UDT-ish look, but this is the same argument as with CDT self-modifying into a TDT.
Dai’s version of this is a genuine, reflectively consistent updateless decision theory, though. It makes the correct decision locally, rather than needing to choose a strategy once and for all time from a privileged vantage point.
That’s why I referred to it as “Dai’s decision theory” at first, but both you and Dai seem to think your idea was the important one, so I compromised and referred to it as Nesov-Dai decision theory.
Well, as I see UDT, it also makes decisions locally, with understanding that this local computation is meant to find the best global solution given other such locally computed decisions. That is, each local computation can make a mistake, making the best global solution impossible, which may make it very important for the other local computations to predict (or at least notice) this mistake and find the local decisions that together with this mistake constitute the best remaining global solution, and so on. The structure of states of knowledge produced by the local computations for the adjacent local computations is meant to optimize the algorithm of local decision-making in those states, giving most of the answer explicitly, leaving the local computations to only move the goalpost a little bit.
The nontrivial form of the decision-making comes from the loop that makes local decisions maximize preference given the other local decisions, and those other local decisions do the same. Thus, the local decisions have to coordinate with each other, and they can do that only through the common algorithm and logical dependencies between different states of knowledge.
At which point the fact that these local decisions are part of the same agent seems to become irrelevant, so that a more general problem needs to be solved, one of cooperation of any kinds of agents, or even more generally processes that aren’t exactly “agents”.
One thing I don’t understand is that both you and Eliezer talk confidently about how agents would make use of logical dependencies/correlations. You guys don’t seem to think this is a really hard problem.
But we don’t even know how to assign a probability (or whether it even makes sense to do so) to a simple mathematical statement like P=NP. How do we calculate and/or represent the correlation between one agent and another agent (except in simple cases like where they’re identical or easily proven to be equivalent)? I’m impressed by how far you’ve managed to push the idea of updatelessness, but it’s hard for me to process what you say, when the basic concept of logical uncertainty is still really fuzzy.
I can argue pretty forcefully that (1) a causal graph in which uncertainty has been factored into uncorrelated sources, must have nodes or some kind of elements corresponding to logical uncertainty; (2) that in presenting Newcomblike problems, the dilemma-presenters are in fact talking of such uncertainties and correlations; (3) that human beings use logical uncertainty all the time in an intuitive sense, to what seems like good effect.
Of course none of that is actually having a good formal theory of logical uncertainty—I just drew a boundary rope around a few simple logical inferences and grafted them onto causal graphs. Two-way implications get represented by the same node, that sort of thing.
I would be drastically interested in a formal theory of logical uncertainty for non-logically-omniscient Bayesians.
Meanwhile—you’re carrying out logical reasoning about whole other civilizations starting from a vague prior over their origins, every time you reason that most superintelligences (if any) that you encounter in faraway galaxies, will have been built in such a way as to maximize a utility function rather than say choosing the first option in alphabetical order, on the likes of true PDs.
I want to try to understand the nature of logical correlations between agents a bit better.
Consider two agents who are both TDT-like but not perfectly correlated. They play a one-shot PD but in turn. First one player moves, then the other sees the move and makes its move.
In normal Bayesian reasoning, once the second player sees the first player’s move, all correlation between them disappears. (Does this happen in your TDT?) But in UDT, the second player doesn’t update, so the correlation is preserved. So far so good.
Now consider what happens if the second player has more computing power than the first, so that it can perfectly simulate the first player and compute its move. After it finishes that computation and knows the first player’s move, the logical correlation between them disappears, because no uncertainty implies no correlation. So, given there’s no logical correlation, it ought to play D. The first player would have expected that, and also played D.
Looking at my formulation of UDT, this may or may not happen, depending on what the “mathematical intuition subroutine” does when computing the logical consequences of a choice. If it tries to be maximally correct, then it would do a full simulation of the opponent when it can, which removes logical correlation, which causes the above outcome. Maybe the second player could get a better outcome if it doesn’t try to be maximally correct, but the way my theory is formulated, what strategy the “mathematical intuition subroutine” uses is not part of what’s being optimized.
So, I’m not sure what to do about this, except to add it to the pile of unsolved problems.
Coming to this a bit late :), but I’ve got a basic question (which I think is similar to Nesov’s, but I’m still confused after reading the ensuing exchange). Why would it be that,
If the second player has more computer power (so that the first player cannot simulate it), how can the first player predict what the second player will do? Can the first player reason that since the second player could simulate it, the second player will decide that they’re uncorrelated and play D no matter what?
That dependence on computing power seems very odd, though maybe I’m sneaking in expectations from my (very rough) understanding of UDT.
The first player’s move could depend on the second player’s, in which case the second player won’t get the answer is a closed form, the answer must be a function of the second player’s move...
But if the second player has more computational power, it can just keep simulating the first player until the first player runs out of clock cycles and has to output something.
I don’t understand your reply: exact simulation is brute force that isn’t a good idea. You can prove general statements about the behavior of programs on runs of unlimited or infinite length in finite time. But anyway, why would the second player provoke mutual defection?
In my formulation, it doesn’t have a choice. Whether or not it does exact simulation of the first player is determined by its “mathematical intuition subroutine”, which I treated as a black box. If that module does an exact simulation, then mutual defection is the result. So this also ties in with my lack of understanding regarding logical uncertainty. If we don’t treat the thing that reasons about logical uncertainty as a black box, what should we do?
ETA: Sometimes exact simulation clearly is appropriate, for example in rock-paper-scissors.
Conceptually, I treat logical uncertainty as I do prior+utility, a representation of preference, in this more general case over mathematical structures. The problems of representing this preference compactly and extracting human preference don’t hinder these particular explorations.
I don’t understand this yet. Can you explain in more detail what is a general (noncompact) way to representing logical uncertainty?
If you are a CDT agent, you can’t (or simply won’t) become a normal TDT agent. If you are a human, who knows what that means.
After all, for anything you can hard code, the AI can build a new AI that lacks your hard coding and sacrifice its resources to that new AI.
Wei_Dai wrote on 19 August 2009 07:08:23AM :
That seems to violate the secrecy assumptions of the Prisoner’s Dilemma problem! I thought each prisoner has to commit to his action before learning what the other one did. What am I missing?
Thanks!
This is very cool, and I haven’t digested it yet, but I wonder if it might be open to the criticism that you’re effectively postulating the favored answer to Newcomb’s Problem (and other such scenarios) by postulating that when you surgically alter one of the nodes, you correspondingly alter the nodes for the other instances of the computation. After all, the crux of the counterfactual-reasoning dilemma in Newcomb’s Problem (and similarly in the Prisoner’s Dilemma) is to jusftify the inference “If I choose both boxes, then (probably) so does the simulation (even if in fact I/it do not)” rather than “If I choose both boxes, then the simulation doesn’t necessarily match my choice (even though in fact it does)”. It could be objected that your formalism postulates the desired answer rather than giving a basis for deriving it—an objection that becomes more important when we move away from identical or functionally equivalent source code and start to consider approximate similarities. (See my criticism of Leslie (1991)’s proposal that you should make your choice as though you were also choosing on behalf of other agents of similar causal structure. If I’m not mistaken, your proposal seems to be a formalization of that idea.)
Here’s an alternative proposal.
Metacircular Decision Theory (MCDT)
For purposes of this discussion, let me just stipulate that subjective probabilities will be modeled as though they were quantum under MWI—that is, we’ll regard the entire distribution as part of the universe. That move will help with dual-simulation/counterfactual-mugging scenarios; but also, as I argued in Good and Real, we effectively make that move whenever we assign value to probabilistic outcomes even in nonesoteric situations (so we may as well avail ourselves of that move in the weird scenarios too, though eventually we need to justify the move).
Say we have an agent embodied in the universe. The agent knows some facts about the universe (including itself), has an inference system of some sort for expanding on those facts, and has a preference scheme that assigns a value to the set of facts, and is wired to select an action—specifically, the/an action that implies (using its inference system) the/a most-preferred set of facts.
But without further constraint, this process often leads to a contradiction. Suppose the agent’s repertoire of actions is A1, …An, and the value of action Ai is simply i. Say the agent starts by considering the action A7, and dutifully evaluates it as 7. Next, it contemplates the action A6, and reasons as follows: “Suppose I choose A6. I know I’m a utility-maximizing agent, and I already know there’s another choice that has value 7. Therefore, if follows from my (hypothetical) choice of A6 that A6 has a value of at least 7.” But that inference, while sound, contradicts the fact that A6′s value is 6.
Unsurprisingly, a false premise leads to a contradiction. To avoid contradiction, we need to limit the set of facts that the agent is allowed to reason from when making inferences about a hypothetical action. But which facts do we omit? Different choices yield different preferred actions. If we omit the fact that val(A6)=6, then we can infer val(A6)>=7; if instead we omit the fact that the agent utility-maximizes, then we can infer val(A6)=6 without contradiction (or at least without the particular contradiction above).
So this is the usual full-blown problem of counterfactual inference: which things do we “hold fixed” when contemplating a counterfactual antecedent, and which do we “let vary” for consistency with that antecedent? Different choices here correspond to different decision theories. If the agent allows inferences (only) from all facts about physical law as applied to the future, and all facts about the past and present universe-state, except for facts about the agent’s internal decision-making state, then we get CDT. If we leave the criteria unspecified/ambiguous, we get EDT. If we allow the agent to reason from facts about the future as well as the past and present, we get FDT (Fatalist Decision Theory: choice is futile, which most people think follows from determinism).
MCDT’s proposed criterion is this: the agent makes a meta-choice about which facts to omit when making inferences about the hypothetical actions, and selects the set of facts which lead to the best outcome if the agent then evaluates the original candidate actions with respect to that choice of facts. The agent then iterates that meta-evaluation as needed (probably not very far) until a fixed point is reached, i.e. the same choice (as to which facts to omit) leaves the first-order choice unchanged. (It’s ok if that’s intractable or uncomputable; the agent can muddle through with some approximate algorithm.)
EDIT1: The algorithm also needs to check, when it evaluates a meta-level choice candidate, that the winning choice at the next level down is consistent with all known facts. If not, the meta-level candidate is eliminated from consideration. (Otherwise, the A6 choice could remain stable in the example above.)
EDIT2: Or rather, that consistency check can probably substitute for the additional meta-iterations.
So e.g. in Newcomb’s Problem or the Prisoner’s Dilemma, the agent can calculate that it does better if it retains the fact that its dispositional-state/source-code is functionally equivalent to the simulation’s/other’s (but omits facts about which particular choice is made by both) than if it makes the CDT choice and omits the fact about equivalence, but keeps the facts about the simulation’s/other’s choice (or keeps some probability distribution about the simulation’s/other’s choice).
In other words, metacircular consistency isn’t just a test that we’d like the decision theory to pass. Metacircular consistency is the theory; it is the algorithm.
Replied at http://lesswrong.com/lw/164/timeless_decision_theory_and_metacircular/
To clarify: the agent in MCDT is a particular physical instantiation, rather than being timeless/Platonic (well, except insofar as physics itself is Platonic).
Does this theory handle Drescher’s example of raising my hand because I want the universe a billion years ago to be such that I would raise my hand a billion years hence?
Yes. That’s a logical dependence.
ETA: To be exact, you have a fixed state a billion years ago, a computation which runs on that state to determine “Will you raise your hand a billion years hence?”, and you can know the initial state without knowing the output of the function, but then determine that the function outputs “Yes” iff your decision diagonal outputs “Raise hand”, so if your values U maximize at “Yes” of this function on that data, then you can (will) exert logical control over the value of this fixed mathematical function in which a copy of you is embedded.
That’s what life is all about, actually. You could just regard the universe as a big mathematical function containing a copy of you, over which you’re exerting logical control.
ETA2: You’d have to ask Gary Drescher whether he knows of anyone else who’s reductionist enough to realize that you can control the output of a fixed deterministic mathematical function if that function happens to be one in which you are embedded. As far as I know, it’s just Gary Drescher.
ETA3: “Logical control” and “Thou art math” is essentially the same idea as timeless control and thou art physics, it’s just even more fun.
Nice. A while ago I also noticed that you can control any mathematical structure if it knows about you and you know about it (i.e. there is logical dependence), which generalizes the notion of trade with other possible worlds, control of the past, etc. If that other mathematical structure is interpreted as an agent, it can be made to behave as you prefer, if in return you behave is it prefers. Thus, it’s possible for us to have and realize preferences over mathematical structures, in particular by trading with them in this manner.
At the same time there are all sorts of weird limitations of what’s possible to affect this way, for example you can control something faster than light (logical control), but only with info that is already in the logical dependence, which excludes the info that only one side has. For example, if you send away a perfect simulation of your mind on a spaceship, you can “control” what happens of the spaceship if neither of you receives observations from outside, as both computations will be identical. If some info from a year ago is sent to the spaceship, and both you and the simulation observe it (simultaneously), you remain synchronized, but now you learned something new. This way, streams of observations can be sent in both directions, continuously updating both copies. These observations, being identical, are added to logical dependence between you and the simulation, and so can be used in logical control. Thus, the whole state of knowledge in shared, and the conclusions of the whole algorithm of mind can be used for control.
On the other hand, if you know something above and beyond this shared knowledge (like recent observations), you can’t use this knowledge or any conclusions reached from this knowledge in logical control. You can’t update on non-shared knowledge and retain ability to handle logical dependence. This seems related to non-updating in counterfactual mugging: you need to exercise control over the other possible world, and so you can’t update on the observation that is particular to your possible world and use the whole algorithm that includes this update to control the other world. You can “update” if you can factor your state of knowledge into what’s dependent to what and what can be used for control of what though.
Eliezer, does the formalism on Pearl’s graphs allow to capture this idea? So far, I’m not sure how much insight can be gained from studying it (and your TDT), so I leave it to after I finish learning basics of logic.
I think you could use a non-updated Pearl graph for your updateless decision theory, but the part where you (instead of updating) decide which computational processes are similar or dissimilar to you, would be a logical problem, I think, not the domain of causal graphs.
Not-updating is the same kind of simplified denotational behemoth as a GLUT. Much of the usefulness of probabilistic graphical models comes from the fact that they compress the probability distribution into smaller representations and allow manipulation and specification of these distributions in terms of the compact representations. If I just start copying a lot of the graphical models, it won’t capture the structure of the problem, so instead of being updateless, the decision theory must update what it can, or represent a lot of partially dependent states of knowledge in a single structure, allowing to extract decisions unaffected by the knowledge that doesn’t belong to them.
I suspect that expectation maximization/probability won’t play an important role in this structure, as the structure of graphical models seems to capture the same objects as logical dependence must (where do you get the causal graphs from?), and so a structure that can work with logical (in)dependence may already contain the structure captured by probabilistic graphical models, subsuming the latter.
Just as a matter of terminology, I prefer to say that we can choose (or that we have a choice about) the output, rather than that we control it. To me, control has too strong a connotation of cause.
It’s tricky, of course, because the concepts of choice-about and causal-influence-over are so thoroughly conflated that most people will use the same word to refer to both without distinction. So my terminology suggestion is kind of like most materialsts’ choice to relinquish the word soul to refer to something extraphysical, retaining consciousness to refer to the actual physical/computational process. (Causes, unlike souls, are real, but still distinct from what they’re often conflated with.)
Again, this is just terminology, nothing substantive.
EDIT: In the (usual) special case where a means-end link is causal, I agree with you that we control something that’s ultimately mathematical, even in my proposed sense of the term.
Hm. To me, “choose” sounds like invoking the idea of multiple possibilities, while “control” sounds more determinism-compatible. Of course that is a mere matter of terminology.
Though I’m not sure what you mean by “in the special case where a means-end link is causal”—my thesis was that if you are uncertain about the output of your decision computation, and you factor the universe the Pearlian way, then your logical decision will end up being, in the graph, the logical cause of box B containing a million dollars. You mean the special case where a means-end link is physical? But what is physics except math? Or are we assuming that the local causal relations in physics are more privileged as ontologically basic causes, whereas “logical causality” is just a convenient way of factoring uncertainty and a winning way to construe counterfactuals? (That last one may have some justice to it.)
I agree that “choose” connotes multiple alternatives, but they’re counterfactual antecedents, and when construed as such, are not inconsistent with determinism.
I don’t know about being ontologically basic, but (what I think of as) physical/causal laws have the important property that they compactly specify the entirety of space-time (together with a specification of the initial conditions).
Is there a formulation of this example that isn’t purely metaphysical, i.e. where you could actually detect the difference?
One of the benefits of publishing a complete explanation is that some of the (valid) criticisms of it will lead to a stronger, repaired theory.
I confess that I don’t follow your program yet, but the outline is much preferred to vague “I have a secret theory” teasing.
Yeah, I hear that claim a lot. It seems to apply to some other world than this one. At some point one must notice when an idealistic belief is failing to accumulate evidence in favor of itself.
We’ll see whether publishing this outline yields any criticisms or suggestions over and above what Nesov and Dai already managed to say based on merely “I have a timeless decision theory”. I’m not holding my breath. This outline actually is enough that someone versed in Newcomblike problems and causality ought to be able to make out what I’m talking about, and with a bit of intelligence work out on their own just how many classical dilemmas it solves. Nonetheless I fully expect this post to drop into the void and never be heard from again.
That’s not because of an evil conspiracy, of course. It’s just the default course of events in academia.
I feel like the ratio of words written to words read in compsci research is getting pretty awful. Conferences are happy to take whatever paper-like substance you can churn out. It’s probably worse in other fields.
I’d be surprised too if academia were to take a blog post seriously. Why not explain the ideas to someone who has the time and motivation to write them up into academic papers (and share co-authorship or whatever)? If you found the right person, that ought to be much faster than doing it yourself. (I mean take up much less of your own time.)
I’d still expect it to drop into the void. Maybe if I write a popular rationality book and it proves popular enough, that probable cost/benefit will change. Are you volunteering?
No, I’m not volunteering. I said earlier that I don’t have the skill/experience/patience/willpower for it. You could publicly ask for volunteers though. Perhaps there is a bunch of Ph.D. students around looking for something to write about.
Why is it that Adam Elga can write about the Sleeping Beauty Problem and get 89 citations? Decision theorists are clearly looking something to do...
ETA: Maybe it’s because of his reputation/status? In that case I guess you need to convince someone high-status to co-author the papers.
Anyone who declines to talk about interesting material because it’s in a blog post, or for that matter, a poem scrawled in blood on toilet paper, is not taking Science seriously. Why should I expect them to have anything important to say if I go to the further trouble of publishing a paper?
I ought to post the decision theory to a thread on /b on 4chan, then try forwarding it around to philosophers who’ve written on Newcomblike problems. Only the ones who really care about their work would dare to comment on it, and the net quality of discussion would go up. Publishing in a peer-reviewed journal just invites in the riffraff.
Yes, this is somewhat tongue-in-cheek, but not so tongue-in-cheek that I’m not seriously considering trying it.
Ignoring non-papers claiming to have solved a problem is a good crackpot-avoiding heuristic. What isn’t even written up is even less likely worth reading than something with only a few citations that is written up.
If that were really what was going on, not status games, then getting a link to the blog post from a couple of known folk of good reputation—e.g. Nick Bostrom and Gary Drescher—would be enough to tell people that here was something worth a quick glance to find out more.
Now it’s worth noting that my whole cynicism here can be falsified if this post gets a couple of links from folk of good reputation, followed by genuinely somewhere-leading discussion which solves open problems or points out new genuine problems.
Heh, if you find a poem scrawled in blood on toilet paper, you probably have a higher priority than Science at the moment—like finding the psycho f---!
But anyway, you half-jest, but this is a problem I’ve run into myself. Stephan Kinsella has a widely-cited magnum opus opposing intellectual property rights. I have since presented a gaping hole in its logic, which he acknowledges isn’t handled well, but doesn’t feel the need to resolve this hole in something he’s built his reputation around, merely because I didn’t get it published in a journal.
Yes, peer review is good crackpot filter, but it can also be a filter from having to admit your errors. [/threadjack]
“Anyone who declines to talk about interesting material because it’s in a blog post, or for that matter, a poem scrawled in blood on toilet paper, is not taking Science seriously. Why should I expect them to have anything important to say if I go to the further trouble of publishing a paper?”
What?
Vladimir is right not paying attention to blog entry with no published work is a great way to avoid crackpots. You have this all backwards you speak as if you have all these credentials so everyone should just take you seriously. In reality what credentials do you have? You built all this expectation for this grand theory and this vague outline is the best you can do? Where is the math? Where is the theory?
I think anyone in academia would be inclined to ask the same question of you why should they take some vague blog entry seriously when the writer controls the comments and can’t be bothered to submit his work for peer-review? You talk about wanting to write a PhD thesis this won’t help get you there. In fact this vague outline should do nothing but cast doubt in everyones mind as to whether you have a theory or not.
I have been following this TDT issue for a while and I for one would like to see some math and some worked out problems. Otherwise I would be inclined to call your bluff.
Eliezer have you ever published a paper in a peer-review journal? The way you talk about it says naive amateur. There is huge value especially for you since you don’t have a PhD or any successful companies or any of the other typical things that people who go the non-academic route tend to have.
Let’s face the music here, your one practical AI project that I am aware of Flare failed, and most of your writing has never been subjected to the rigor that all science should be subjected to. It seems to me if you want to do what you claim you need to start publishing.
“Levels of Organization in General Intelligence” appeared in the Springer volume Artificial General Intelligence. “Cognitive Biases Potentially Affecting Judgement of Global Risks” (PDF) and “Artificial Intelligence as a Positive and Negative Factor in Global Risk” (PDF) appeared in the Oxford University Press volume Global Catastrophic Risks. They’re not mathy papers, though.
I am sorry I am going to take a shortcut here and respond to a couple posts along with yours. So fine I partially insert my foot in my mouth… but the issue I think here is that the papers we need to be talking about are math papers right? Anyone can publish non-technical ideas as long as they are well reasoned, but the art of science is the technical mastery.
As for Eliezer’s comment concerning the irrelevance of Flare being a pre 2003 EY work I have to disagree. When you have no formal academic credentials and you are trying to make your mark in a technical field such as decision theory anything technical that you have done or attempted counts.
You essentially are building your credentials via work that you have done. I am speaking from experience since I didn’t complete college I went the business route. But I can also say that I did a lot of technical work so I built my credentials in the field by doing novel technical things.
I am trying to help here coming from a similar position and wanting a PhD etc. having various technical achievements as my prior work made all the difference in getting in to a PhD program without a B.S. or M.S. It also makes all the difference in being taken seriously by the scientific community.
Which circles back to my original point which is an vague outline is not enough to show you really have a theory much less a revolutionary one. Sadly asking to be taken seriously is just not enough, you have to prove that you meet the bar of admission (decision theory is going to be math).
If someone can show me some technical math work EY has done that would be great, but as of now I have very little confidence that he has a real theory (if someone can I will drop the issue.) Yes I am aware of the Bayesian Theory paper but this lets face it is fairly basic and is far from showing that EY has the ability to revolutionize decision theory.
Where? What university?
The university would be Carnegie Mellon Computer Science Program (an esoteric area of CS)
As for the other parts I did some work in computer hardware specifically graphics hardware design, body armor design (bullet proof vests) etc. The body armor got to prototyping but was not marketable for a variety reasons to dull to go into. I am currently starting a video game company.
Also, volume-editing isn’t as (pointlessly? signallingly?) difficult as journal peer-review.
This vague outline is the result of Eliezer yielding to our pleas to say something—anything—about his confident solution to Newcomb’s problem. Now that it’s been posted as a not-obviously-formalizable text, and people are discussing it informally, I share a lot of your disappointment. But let’s give the topic some days and see how it crystallizes.
What’s Flare? (...looks it up...) Oh dear Cthulhu, oh no.
(Edit: I originally listed several specific users as “refusing to formalize”. That was wrong.)
A legacy of pre-2003 Eliezer, of no particular importance one way or another.
What about what I wrote?
Which part do you find insufficiently formal? Of course I use “mathematical intuition” as a black box without explaining how it works, but that’s just like EDT using “prior” without explaining where it comes from, or CDT using “causal probability” as a black box. It’s an unsolved problem, not refusal to formalize.
Your decision theory is formal enough for me, but it seems to be different from Eliezer’s, which I was talking about. If they’re really the same, could you explain how?
In that case, I never said I understood Eliezer’s version well enough that I could formalize it if I wanted to, and I don’t think Nesov and Drescher claimed that either, so I don’t know why you mentioned our names in connection with “refuse to formalize”. Actually I explicitly said that I don’t understand Eliezer’s theory very well yet.
You’re right. I apologize. Amended the comment.
Well, it may be that some academics do take Science seriously, but they also care about status signaling. There’s nothing that says a person can’t simultaneously optimize for two different values, right? Why exclude those whose values aren’t exactly your values, instead of trying to cooperate with them?
Also:
Anyone who declines to talk about interesting material because it’s in a blog post, or for that matter, a poem scrawled in blood on toilet paper, is not taking Science seriously. Why should I expect them to have anything important to say if I go to the further trouble of publishing a paper?
Looks to me like there’s a pretty lively conversation so far!
I’m trying to understand the difference between this formulation and mine. Interestingly, Eliezer seems to have specified a “causal” timeless decision theory, whereas mine could be described as an “evidential” TDT. In my formulation, you’d compute the expected utility of a strategy (i.e., mapping of inputs to outputs) T by taking “S is logically equivalent to T” as a (provisional) axiom, then recomputing logical uncertainties and expected utility.
The “evidential” approach seems simpler. What advantage does the “causal” approach have? Sorry if this is obvious, but my knowledge of Pearl is very limited.
CTDT vs. ETDT. Hmm, that’s a tough one. First, CTDT allows “screening off” of causes, which makes a big difference.
I liked EY’s formulation above: “TDT doesn’t cooperate or defect depending directly on your decision, but cooperates or defects depending on how your decision depends on its decision.” It’s hard to collect evidence, I think, but reasoning about a causal graph gives you the ability to find out how latent decisions affect other outcomes.
So in this case, expected utility based reasoning leaves you in a posiiton where you make some decisions because they seem correlated with good outcomes, while the causal reasoning lets you sometimes see either that the actions and consequences are disconnected or that the causation runs in the opposite direction to what you desire.
ETA: EY’s street crossing example is an example of causation running in the opposite direction.
= Drescher’s street crossing example, don’t know if Drescher got it from somewhere else.
Parfit’s Hitchhiker; in the future, after having observed that you’ve already been picked up and made it to safety, you’ll still compute the counterfactual “If the output of my computation were to refuse to pay, then I would not have been picked up.”
Since TDT screens off all info that goes into your decision-setup, using your updateless version of TDT might obliterate the difference between evidential and causal approaches entirely—no counterfactuals, no updates, just ruling out of self-copies that have received incompatible sense data. (Not sure yet if this works.)
This feels right to me. I can’t implement it, and I’m not sure I could explain what Eli said, but I understand Pearl well enough (at an intuitive level) to say that it feels like the kind of additions Eli is talking about would clarify and reach the results he’s talking about.
Read Pearl. It’s not mathy, it’s mostly words about graph manipulation.
If you’re bothered by math, read Pearl anyway. He doesn’t use equations or make you transform symbols. If you can think about information flows or reason visually, Pearl’s calculus is for you. You’ll understand what it means for something to be a cause or a possible cause or not a possible cause of something else in a deeper way than you do before Pearl.
If you’re already comfortable with math, there’s nothing hard about the theory, it’s just using a different formalism than linear symbols to explain how events are connected causally.
Thanks Eli.
Second Chris’ advice on reading Pearl.
If it helps, I am happy to help with the technical content of the book, or with general technical questions about causal inference (either over email or here).
I’ve tried to read Pearl’s decision theory book, but it seemed dry and boring. Guess I’ll have to give it another go...
It’s available online too, but don’t pirate it.
That’s “Causality: models, reasoning, and inference By Judea Pearl”...? “Not mathy”? It’s jammed full of dense maths! It has integration symbols, summation symbols, logic, probability, theorems and lemmas coming out of its ears! Obviously, Pearl is showing off to impress his peers ;-)
okay, you’re right they’re in there, but Pearl uses those in the proofs, not the explanations, as I recall. I don’t think you have to understand the proofs to get the idea.
If you find math oppressive, let me know if you try Pearl and find it too daunting. If that happens, I’ll change the way I describe the book, I promise.
Probably a little, but it does help you find mistakes where they exist.
(Okay, that was showing off.)
Rolf Nelson wanted to know what everyday problems evidential decision theory produces. Newcomb’s Problem can be mapped onto the Prisoner’s Dilemma, but are there similarly common Smoking Lesion like problems?
Well, if you’re using TDT, then conditioning on the initial state of your physical computation screens off most such problems. But if you don’t break down your causal graph that finely, then there are all sorts of situations in which crazy people might be tempted to use EDT. I think Drescher in his book gives the case of someone who observes that people usually decide to cross the street only when it is safe to do so, who concludes that by deciding to cross the street they can make it safe.
Majoritarianism may frequently be the result of the application of evidential decision theory, ignoring all of the non-naturalistic vagueness in the formulations of CDT and EDT, might it not?
Some kinds of majoritarianism, certainly. The confusion is based on mistaking correlation of votes with commonality of interests. “If we can all agree to vote for proposition X, then it must be in our favor, right?”
This is better than nothing, thanks and upvote. Now let’s begin translating this stuff. AFAICT, a “decision theory” is supposed to have two parts:
1) A blah blah verbal algorithm for translating real-world problem descriptions into a certain kind of formal structure.
2) A mathematical algorithm that accepts that formal structure and outputs a decision.
I don’t fully understand what formal structure you’re proposing (a Pearl-style causal graph with additional “logical” arrows? why would this always be acyclic?), and can’t understand the algorithm until the structure is clear enough.
If the arrows are material implications, then A → B → C → A collapses via iff to a single node. Can you give an example of cyclic logical uncertainty?
I was thinking of some case where the cycle contains both physical and logical arrows. Logical arrows can point backwards in time, so this doesn’t seem to be impossible in principle. Sorry, can’t give a specific example because I don’t fully understand what you mean by “logical uncertainty”.
My reading is that logical nodes can point to physical nodes, but not vice versa. (Also that it doesn’t make sense to say an arrow from a logical node “points backwards in time”. Logical nodes are timeless.)
Physical arrows shouldn’t point to logical nodes, though… right?
Can anyone suggest me good background reading material to understand the technical language/background knowledge of this and, more generally, on decision theory?
I gave one example earlier of TDT agents not playing cooperate in PD against each other. Here’s another, perhaps even more puzzling, example.
Consider 3 TDT agents, A, B, and C, playing a game of 3-choose-2 PD. These agents are identical, except that they have different beliefs about how they are logically related to each other. A and B both believe that A and B are 100% logically correlated (in other words, logically equivalent). A and C both believe that A and C are 0% logically correlated. B and C also believe that B and C are 0% logically correlated.
What’s the outcome of this game? Well, C should clearly play defect, since it’s sure that it’s not correlated with either of the other players. A and B both play cooperate, since that maximizes expected utility given that they are correlated with each other but not with C (the arithmetic is the same as in my earlier 3-choose-2 PD example). Given this outcome, their initial beliefs about their logical relationships don’t seem to be inconsistent.
How do they end up in this situation? Clearly they cannot all have common knowledge of each other’s source code, so where do they obtain their definite beliefs about each other instead?
Re: “definite beliefs”, the numbers don’t have to be 100% and 0%. They could be any p and q, where p is above the threshold for cooperation, and q is below.
As for where the numbers come from, I don’t know. Perhaps the players have different initial intuitions (from a mathematical intuition module provided by evolution or their programmers) about their logical correlations, which causes them to actually have different logical correlations (since they are actually computing different things when making decisions), which then makes those intuitions consistent upon reflection.
Why can’t A and B choose to be correlated with C by deliberately making their decision dependent on its decision? Insufficient knowledge of C’s code even to make their decision dependent on “what an agent does when it thinks it’s not correlated to you”? In other words, you know that C is going to follow a certain decision algorithm here—do the Dai-obvious thing and defect—but A and B don’t know enough about C to defect conditional on the “obvious” thing being to defect?
A and B don’t choose this, because given their beliefs (i.e., low correlation between A and C, and B and C), that doesn’t maximize their expected utilities. So the belief is like a self-fulling prophecy. Intuitively, you might think “Why don’t they get out of this trap by choosing to be correlated with C and simultaneously change their beliefs?” The problem is that they don’t think this will work, because they think C wouldn’t respond to this.
In other words, why would A and B defect conditional on C defecting, when they know “C is going to follow a certain decision algorithm here—do the Dai-obvious thing and defect”?
Anyway, that’s what I think happens under UDT1. It’s quite possible (almost certain, really) that UDT1 is wrong or incomplete. But if you have a better solution, can you try to formalize it, and not just make informal arguments? Or, if you think you have an intuitively satisfactory solution that you don’t know how to formalize yet, I’ll stop beating this dead horse and let you work it out.
I don’t have a general solution. I’m just carrying out the reasoning by hand. I don’t know how to solve the logical ordering problem.
Why would C choose to follow such an algorithm, if C perceives that not following such an algorithm might lead to mutual cooperation instead of mutual defection?
Essentially, I’m claiming that your belief about “logical uncorrelation” is hard to match up with your out-of-context intuitive reasoning about what all the parties are likely to do. It’s another matter if C is a piece of cardboard, a random number generator, or a biological organism operating on some weird deluded decision theory; but you’re reasoning as if C is calmly maximizing.
Suppose I put things to you this way: Groups of superrational agents will not occupy anything that is not at least a Pareto optimum, because they have strong motives to occupy Pareto optima and TDT lets them coordinate where such motives exist. Now the 3-choose-2 problem with two C players and one D player may be a Pareto optimum (if taken at face value without further trades being possible), but if you think of Pareto-ization as an underlying motivation—that everyone starts out in the mutual defection state, and then has a motive to figure out how to leave the mutual defection state by increasing their entanglement—then you might see why I’m a bit more skeptical about these “logical uncorrelations”. Then you just end up in the all-D state, the base state, and agents have strong incentives to figure out ways to leave it if they can.
In other words, you seem to be thinking in terms of a C-equilibrium already accomplished among one group of agents locally correlated with themselves only, and looking at the incentive of other agents to locally-D; whereas my own reasoning assumes the D-equilibrium already globally accomplished, but suspects that in this case rational agents have a strong incentive to reach up to the largest reachable C-equilibria, which they can accomplish by increasing (not decreasing) various forms of entanglement.
Relations between “previously uncorrelated” groups may be viewable as analogous to relations between causally uncorrelated individuals. To assume that one subgroup has decided on interior cooperation even though it makes them vulnerable to outside defection, without that subgroup having demanded anything in return, may be like presuming unilateral cooperation on the PD.
Ok, this looks reasonable to me. But how would they actually go about doing this? So far I can see two general methods:
convergence towards an “obvious” decision theory
deliberate conditioning of moves between players
My current view is that neither of these methods seem very powerful as mechanisms for enabling cooperation, compared to say the ability to prove source code, or to merge securely. To summarize my thoughts and the various examples I’ve given, here are the problems with each of the above methods for “increasing entanglement”:
Two agents with the same “obvious” decision theory may not be highly correlated, if they have different heuristics, intuitions, priors, utility functions, etc. Also, an agent may have a disincentive to unilaterally increase his correlation with a large group of already highly correlated agents.
Deliberate conditioning of moves is difficult when two sides have high uncertainty about each others’ source code. Which hypothetical agent(s) do you condition your move against? How would they know that you’ve done so, when they don’t know your source code either? It’s also difficult if two sides have different preferences about the correlation of their moves, that is, if one side wants them to be positively correlated, and another wants them to be uncorrelated or negatively correlated.
These sound like basically reasonable worries / lines of argument to me. I’m sure life will be a lot easier for… not necessarily everyone, but at least us primitive mortal analysts… if it’s easy for superintelligences to exhibit their source code to each other. Then we just have the problem of logical ordering in threats and games of Chicken. (Come to think of it, blackmail threats of mutual destruction unless paid off, would seem to become more probable, not less, as you became more able to exhibit and prove your source code to the other player.)
A possible primary remaining source of our differing guesses at this point, may have to do with the degree to which we think that decision processes are a priori (un)correlated. I take statements like “Obviously, everyone plays D at the end” to be evidence of very high a priori correlation—it’s no good talking about different heuristics, intuitions, priors, utility functions, etcetera, if you don’t actually conclude that maybe some players play C and others play D.
How would that happen?
I think Nesov’s position is that such threats don’t work against updateless agents, but I’m not sure about that yet. ETA: See previous discussion of this topic.
That doesn’t make sense… Suppose nobody smokes, and nobody gets cancer. Does that mean smoking and cancer are correlated? In order to have correlation, you need to have both (C,C) and (D,D) outcomes. If all you have are (D,D) outcomes, there is no correlation.
I’m referring to rock-paper-scissors and this example. Or were you asking something else?
I’m not keeping up here—I only peek at this site occasionallly, rather than following it—but this:
“The one-sentence version is: Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.”
… seems rather similar to the dictum that you should choose as if you really might be any of your subjective duplicates, from across all possible worlds. (I suppose there is a difference, in that “subjective duplicate” refers only to the properties of yourself that you can perceive, whereas “the abstract computation you implement” refers to a property that is not explicitly available to you.)
And to me that dictum sounds standardly Bayesian—with the set of all entities in all possible worlds providing the prior, and the subjectively available data (about what sort of entity you are) providing the evidence on which you condition. So it’s intriguing to see the claim that starting out in this way leads to making the right choices in a number of situations where standard decision theory gets it wrong.
Omega may not contain a copy of you which is detailed enough to be a subjective duplicate. Omega may just be reasoning abstractly about you. So you legitimately know that you are not inside Omega—but you also expect that whatever you decide, Omega will have successfully predicted.
Upvoted; this is a good summary of the issue, and using the new label TDT is arguably more elegant than having to talk separately about the rationality of cultivating a disposition.
How significant are the open questions? We should not expect correct theory to work in the face of arbitrary acts of Omega. Suppose Omega says “Tomorrow I will examine your source code, and if you don’t subscribe to TDT I will give you $1 million, and if you do subscribe to TDT I will make you watch the Alien movie series—from the third one on”. In this scenario it would be rational to self modify to something other than TDT; a similar counter can be constructed for any theory whatsoever.
“I just flipped a fair coin. I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000. And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads. The coin came up heads—can I have $1000?”
Does this correspond to a significant class of problem in the real world, in the same way Parfit’s Hitchhiker does?
Right, so the decision theories I try to construct are for classes of problems where I can identify a winning property of how the algorithm decides things or strategizes things or responds to things or whatever, a property which determines the payoff fully and screens off all other dependence on the algorithm. Then the algorithm can maximize that property of itself.
Causal decision theory then corresponds to the problem class where your physical action fully determines the result, and anything else, like logical dependence on your algorithm’s disposition, is not allowed. CDT agents will successfully maximize on that problem class.
Okay, so what problem class are you aiming for with TDT? It can’t be the full class of problems where the result depends on your disposition, because there will always be a counter. Do you have a slightly more restricted class in mind?
The TDT I actually worked out is for the class where your payoffs are fully determined by the actual output of your algorithm, but not by other outputs that your algorithm would have made under other conditions. As I described in the “open problems” post, once you allow this sort of strategy-based dependence, then I can depend on your dependence on my dependence on … and I don’t yet know how to stop the recursion. This is closely related to what Wei Dai and I are talking about in terms of the “logical order” of decisions.
If you want to use the current TDT for the Prisoner’s Dilemma, you have to start by proving (or probabilistically expecting) that your opponent’s decision is isomorphic to your own. Not by directly simulating the opponent’s attempt to determine if you cooperate only if they cooperate. Because, as written, the counterfactual surgery that stops the recursion is just over “What if I cooperate?” not “What if I cooperate only if they cooperate?” (Look at the diagonal sentence.)
Okay...
Omega comes along and says “I ran a simulation to see if you would one-box in Newcomb. The answer was yes, so I am now going to feed you to the Ravenous Bugblatter Beast of Traal. Have a nice day.”
Doesn’t this problem fit within your criteria?
If you reject it on the basis of “if you had told me the relevant facts up front, I would’ve made the right decision”, can’t you likewise reject the one where Omega flips a coin before telling you about the proposed bet?
If you have reason in advance to believe that either is likely to occur, you can make an advance decision about what to do.
Does either problem have some particular quality relevant for its classification here, that the other does not?
That’s more like a Counterfactual Mugging, which is the domain of Nesov-Dai updateless decision theory—you’re being rewarded or punished based on a decision you would have made in a different state of knowledge, which is not “you” as I’m defining this problem class. (Which again may sound quite restrictive at this point, but if you look at 95% of the published Newcomblike problems...)
What you need here is for the version of you that Omega simulates facing Newcomb’s Box, to know about the fact that another Omega is going to reward another version of itself (that it cares about) based on its current logical output. If the simulated decision system doesn’t know/believe this, then you really are screwed, but it’s more because now Omega really is an unfair bastard (i.e. doing something outside the problem class) because you’re being punished based on the output of a decision system that didn’t know about the dependency of that event on its output—sort of like Omega, entirely unbeknownst to you, watching you from a rooftop and sniping you if you eat a delicious sandwich.
If the version of you facing Newcomb’s Problem has a prior over Omega doing things like this, even if the other you’s observed reality seems incompatible with that possible world, then this is the sort of thing handled by updateless decision theory.
Right. But then if that is the (reasonable) criterion under which TDT operates, it seems to me that it does indeed handle the case of Omega’s after the fact coin flip bet, in the same way that it handles (some versions of) Newcomb’s problem. How do you figure that it doesn’t?
Because the decision diagonal I wrote out, handles the probable consequences of “this computation” doing something, given its current state of knowledge—its current, updated P—so if it already knows the coinflip (especially a logical coinflip like a binary digit of pi) came up heads, and this coinflip has nothing counterfactually to do with its decision, then it won’t care about what Omega would have done if the coin had come up tails and the currently executing decision diagonal says “don’t pay”.
Ah! so you’re defining “this” as exact bitwise match, I see. Certainly that helps make the conclusions more rigorous. I will suggest the way to handle the after-the-fact coin flip bet is to make the natural extension to sufficiently similar computations.
Note that even selfish agents must do this in order to care about themselves five minutes in the future.
To further motivate the extension, consider the variant of Newcomb where just before making your choice, you are given a piece of paper with a large number written on it; the number has been chosen to be prime or composite depending on whether the money is in the opaque box.
That’s not the problem. The problem is that you’ve already updated your probability distribution, so you just don’t care about the cases where the binary digit came up 0 instead of 1 - not because your utility function isn’t over them, but because they have negligible probability.
(First read that variant in Martin Gardner.) The epistemically intuitive answer is “Once I choose to take one box, I will be able to infer that this number has always been prime”. If I wanted to walk through TDT doing this, I’d draw a causal graph with Omega’s choice descending from my decision diagonal, and sending a prior-message in turn to the parameters of a child node that runs a primality test over numbers and picked this number because it passed (failed), so that—knowing / having decided your logical choice—seeing this number becomes evidence that its primality test came up positive.
In terms of logical control, you don’t control whether the primality test comes up positive on this fixed number, but you do control whether this number got onto the box-label by passing a primality test or a compositeness test.
(I don’t remember where I first read that variant, but Martin Gardner sounds likely.) Yes, I agree with your analysis of it—but that doesn’t contradict the assertion that you can solve these problems by extending your utility function across parallel versions of you who received slightly different sensory data. I will conjecture that this turns out to be the only elegant solution.
Sorry, that doesn’t make any sense. It’s a probability distribution that’s the issue, not a utility function. UDT tosses out the probability distribution entirely. TDT still uses it and therefore fails on Counterfactual Mugging.
It’s precisely the assertion that all such problems have to be solved at the probability distribution level that I’m disputing. I’ll go so far as to make a testable prediction: it will be eventually acknowledged that the notion of a purely selfish agent is a good approximation that nonetheless cannot handle such extreme cases. If you can come up with a theory that handles them all without touching the utility function, I will be interested in seeing it!
None of the decision theories in question assume a purely selfish agent.
No, but most of the example problems do.
It might be nontrivial to do this in a way that doesn’t automatically lead to wireheading (using all available power to simulate many extremely fulfilled versions of itself). Or is that problem even more endemic than this?
This is a statement about my global strategy, the strategy I consider winning. In this strategy, I one-box in the states of knowledge where I don’t know about the monster, and two-box where I know. If Omega told me about the monster, I’d transition to a state of knowledge where I know about it, and, according to the fixed strategy above, I two-box.
In counterfactual mugging, for each instance of mugging, I give away $100 on the mugging side, and receive $10000 on the reward side. This is also a fixed global strategy that gives the actions depending on agent’s state of knowledge.
We already have Disposition-Based Decision Theory—and have had since 2002 or so. I think it’s more a case of whether there is anything more to add.
Thanks for the link! I’ll read the paper more thoroughly later, a quick skim suggests it is along the same lines. Are there any cases where DBDT and TDT give different answers?
I don’t think DBDT gives the right answer if the predictor’s snapshot of the local universe-state was taken before the agent was born (or before humans evolved, or whatever), because the “critical point”, as Fisher defines it, occurs too late. But a one-box chooser can still expect a better outcome.
It looks to me like DBDT is working in the direction of TDT but isn’t quite there yet. It looks similar to the sort of reasoning I was talking about earlier, where you try to define a problem class over payoff-determining properties of algorithms.
But this isn’t the same as a reflectively consistent decision theory, because you can only maximize on the problem class from outside the system—you presume an existing decision process or ability to maximize, and then maximize the dispositions using that existing decision theory. Why not insert yet another step? What if one were to talk about dispositions to choose particular disposition-choosing algorithms as being rational? In other words, maximizing “dispositions” from outside strikes me as close kin to “precommitment”—it doesn’t so much guarantee reflective consistency of viewpoints, as pick one particular viewpoint to have control.
As Drescher points out, if the base theory is a CDT, then there’s still a possibility that DBDT will end up two-boxing if Omega takes a snapshot of the (classical) universe a billion years ago before DBDT places the “critical point”. A base theory of TDT, of course, would one-box, but then you don’t need the edifice of DBDT on top because the edifice doesn’t add anything. So you could define “reflective consistency” in terms of “fixed point under precommitment or disposition-choosing steps”.
TDT is validated by the sort of reasoning that goes into DBDT, but the TDT algorithm itself is a plain-vanilla non-meta decision theory which chooses well on-the-fly without needing to step back and consider its dispositions, or precommit, etc. The Buck Stops Immediately. This is what I mean by “reflective consistency”. (Though I should emphasize that so far this only works on the simple cases that constitute 95% of all published Newcomblike problems, and in complex cases like Wei Dai and I are talking about, I don’t know any good fixed algorithm (let alone a single-step non-meta one).)
Exactly. Unless “cultivating a disposition” amounts to a (subsequent-choice-circumventing) precommitment, you still need a reason, when you make that subsequent choice, to act in accordance with the cultivated disposition. And there’s no good explanation for why that reason should care about whether or not you previously cultivated a disposition.
(Though I think the paper was trying to use dispositions to define “rationality” more than to implement an agent that would consistently carry out those dispositions?)
I didn’t really get the purpose of the paper’s analysis of “rationality talk”. Ultimately, as I understood the paper, it was making a prescriptive argument about how people (as actually implemented) should behave in the scenarios presented (i.e, the “rational” way for them to behave).
That’s just what “dispositions” are in this context—tendencies to behave in particular ways under particular circumstances.
By this conception of what “disposition” means, you can’t cultivate a dispositon for keeping promises—and then break the promises when the chips are down. You are either disposed to keep promises, or you are not.
I had a look a the Wikipedia “Precommitment” article to see whether precommitment is actually as inappropriate as it seems to be being portrayed as.
According to the article, the main issue seems to involve cutting off your own options.
Is a sensible one-boxing agent “precommitting” to one-boxing by “cutting off its own options”—namely the option of two-boxing?
On one hand, they still have the option and a free choice when they come to decide. On the other hand, the choice has been made for them by their own nature—and so they don’t really have the option of choosing any more.
My assessment is that the word is not obviously totally inappropriate.
Does “disposition” have the same negative connotations as “precommitting” has? I would say not: “disposition” seems like a fairly appropriate word to me.
I don’t know if Justin Fisher’s work exactly replicates your own conclusions. However it seems to have much the same motivations, and to have reached many of the same conclusions.
FWIW, it took me about 15 minutes to find that paper in a literature search.
Another relevant paper:
“No regrets: or: Edith Piaf revamps decision theory”.
That one seems to have christened what you tend to refer to as “consistency under reflection” as “desire reflection”.
I don’t seem to like either term very much—but currently don’t have a better alternative to offer.
Violation of desire reflection would be a sufficient condition for violation of dynamic consistency, which in turn is a sufficient condition to violate reflective consistency. I don’t see a necessity link.
The most obvious reply to the point about dispositions to have dispositions is to take a behavourist stance: if a disposition results in particular actions under particular circumstances, then a disposition to have a disposition (plus the ability to self-modify) is just another type of disposition, really.
What the document says about the placing of the “critical point” is:
Consequently, I am not sure where the idea that it could be positioned “too late” comes from. The document pretty clearly places it early on.
Newcomb’s problem? That’s figure 1. You are saying that you can’t easily have a disposition—before you even exist? Just so—unless your maker had a disposition to make you with a certain disposition, that is.
Well, we have a lengthy description of the revised DBDT—so that should hopefully help figure out what its predicted actions are.
The author claims it gets both the The Smoking-Cancer Problem and Newcomb’s problem right—which seems to be a start.
The three sentence version is actually a one sentence version; it’s three independent clauses, but semicolons don’t separate sentences.
I’m really sorry, I couldn’t help myself.
If this is correct, then it amounts to a profound philosophical and scientific achievement.
Not by my standards.
Free will is about as easy as a problem can get and still be Confusing. Plenty of moderately good reductionists have refused to be confused by it. Killing off the problem entirely is more like dropping nuclear weapons to obliterate the last remnants of a dead horse than any great innovation within the field of reductionism.
There are non-reductionist philosophers who would think of reducing free will as a great and difficult achievement, but by reductionist standards it’s a mostly-solved problem already.
Formal cooperation in the one-shot PD, now that should be interesting.
Free will is counted as one of the great problems of philosophy. Wikipedia Lists it as a “central problem of metaphysics”. SEP has a whole, long article on it along with others on: “compatibilism”, “causal determinism” , “free will and fatalism”, “divine foreknowledge”, “incompatibilism (nondeterministic) theories of free will” and “arguments for incompatibilism”.
If you really have “nuked the dead donkey” here, you would cut out a lot of literature. Furthermore, religious people would no longer be able to use “free will” as a magic incantation with which to defend God.
The only reason free will is regarded as a problem of philosophy is that philosophers are in the rather bizarre habit of defining it as “your actions are uncaused”—it should be no surprise that a nonsensical definition leads to problems!
When we use the correct definition—the one that corresponds to how the term is actually used—“your actions are caused by your own decisions, as opposed to by external coercion”—the problem doesn’t arise.
Dennett and others have used multi-ton high explosives on the dead donkey. Why would nuclear weapons make a further difference?
People respond to math more than to words.
Er… no they don’t?
Some do.
rather, if one challenges a valid verbal theory one can usually find some way of convincing people that there is some “wiggle room”, that it may or may not be valid, etc. But a mathematical theory has, I think, an air of respectability that will make people pay attention, even if they don’t like it, and especially if they don’t actually understand the mathematics.
If your theory finds applications, (which, given the robotics revolution we seem to be in the middle of is not vastly unlikely), then it will further marginalize those who stick to the old convenient confusion about free will.
Of course, given what has happened with evolution (smart Christians accept it, but find excuses to still believe in God), I suspect that it will only have an incremental impact on religiosity, even amongst the elite.
Free will seems like a pretty boring topic to me. The main recent activity I have noticed in the area was Daniel Dennett’s “Freedom Evolves” book. That book was pretty boring and mostly wrong—I thought. It was curious to see Daniel Dennett make such a mess of the subject, though.
As it happens, I’m reading through Freedom Evolves right now; up to chapter 3, and while I don’t quite buy his ideas on inevitability, it so far doesn’t strike me as a mess?
I liked the bit on memes. Most of the rest of it was a lot of word games, IMO.
Here is what I don’t understand about the free will problem. I know this is a simple objection, so there must be a standard reply to it; but I don’t know what that reply is.
Denote F as a world in which free will exists, f as one in which it doesn’t. Denote B as a world in which you believe in free will, and b as one in which you don’t. Let a combination of the two, e.g., FB, denote the utility you derive from having that belief in that world. Suppose FB > Fb and fb > fB (being correct > being wrong).
The expected utility of B is FB x p(F) + fB x (1-p(F)). Expected utility of b is Fb x p(F) + fb x (1-p(F)). Choose b if Fb x p(F) + fb x (1-p(F)) > FB x p(F) + fB x (1-p(F)).
But, that’s not right in this case! You shouldn’t consider worlds of type f in your decision, because if you’re in one of those worlds, your decision is pre-ordained. It doesn’t make any sense to “choose” not to believe in free will—that belief may be correct, but if it is correct, then you can’t choose it.
Over worlds of type F, the expected utility of B is FB x p(F), and the utility of b is Fb x p(F), and FB > Fb. So you always choose B.
Saying that you shouldn’t do something because it’s preordained whether you do it or not is a very confused way of looking at things. Christine Korsgaard, by whom I am normally unimpressed but who has a few quotables, says:
(From “The Authority of Reflection”)
I don’t understand what that Korsgaard quote is trying to say.
I didn’t say that. I said that, when making a choice, you shouldn’t consider, in your set of possible worlds, possible worlds in which you can’t make that choice.
It’s certainly not as confused a way of looking at things as choosing to believe that you can’t choose what to believe.
I should have said you shouldn’t try to consider those worlds. If you are in f, then it may be that you will consider such possible worlds; and there’s no shouldness about it.
“But”, you might object, “what should you do if you are a computer program, running in a deterministic language on deterministic hardware?”
The answer is that in that case, you do what you will do. You might adopt the view that you have no free will, and you might be right.
The 2-sentence version of what I’m saying is that, if you don’t believe in free will, you might be making an error that you could have avoided. But if you believe in free will, you can’t be making an error that you could have avoided.
In the context of the larger paper, the most charitable way of interpreting her (IMO) is that whether we have free will or not, we have the subjective impression of it, this impression is simply not going anywhere, and so it makes no sense to try to figure out how a lack of free will ought to influence our behavior, because then we’ll just sit around waiting for our lack of free will to pick us up out of our chair and make us water our houseplants and that’s not going to happen.
What if we’re in a possible world where we can’t choose not to consider those worlds? ;)
“Choosing to believe that you can’t choose what to believe” is not a way of looking at things; it’s a possible state of affairs, in which one has a somewhat self-undermining and false belief. Now, believing that one can choose to believe that one cannot choose what to believe is a way of looking at things, and might even be true. There is some evidence that people can choose to believe self-undermining false things, so believing that one could choose to believe a particular self-undermining false thing which happens to have recursive bearing on the choice to believe it isn’t so far out.
I am unable to attach a truth condition to these sentences—I can’t imagine two different ways that reality could be which would make the statements true or alternatively false.
http://wiki.lesswrong.com/wiki/Free_will_(solution)
Do you mean that the phrases “free will exists” and “free will does not exist” are both incoherent?
If I want to, I can assign a meaning to “free will” in which it is tautologically true of causal universes as such, and applied to agents, is true of some agents but not others. But you used the term, you tell me what it means to you.
You used the term first. You called it a “dead horse” and “about as easy as a problem can get and still be Confusing”. I would think this meant that you have a clear concept of what it means. And it can’t be a tautology, because tautologies are not dead horses.
I can at least say that, to me, “Free will exists” implies “No Omega can predict with certainty whether I will one-box or two-box.” (This is not an “if and only if” because I don’t want to say that a random process has free will; nor that an undecidable algorithm has free will.)
I thought about saying: “Free will does not exist” if and only if “Consciousness is epiphenomenal”. That sounds dangerously tautological, but closer to what I mean.
I can’t think how to say anything more descriptive than what I wrote in my first comment above. I understand that saying there is free will seems to imply that I am not an algorithm; and that that seems to require some weird spiritualism or vitalism. But that is vague and fuzzy to me; whereas it is clear that it doesn’t make sense to worry about what I should do in the worlds where I can’t actually choose what I will do. I choose to live with the vague paradox rather than the clear-cut one.
ADDED: I should clarify that I don’t believe in free will. I believe there is no such thing. But, when choosing how to act, I don’t consider that possibility, because of the reasons I gave previously.
Then you’ve got the naive incoherent version of “free will” stuck in your head. Read the links.
http://wiki.lesswrong.com/wiki/Free_will
http://wiki.lesswrong.com/wiki/Freewill(solution)
All right, I read all of the non-italicized links, except for the “All posts on Less Wrong tagged Free Will”, trusting that one of them would say something relevant to what I’ve said here. But alas, no.
All of those links are attempts to argue about the truth value of “there is free will”, or about whether the concept of free will is coherent, or about what sort of mental models might cause someone to believe in free will.
None of those things are at issue here. What I am talking about is what happens when you are trying to compute something over different possible worlds, where what your computation actually does is different in these different worlds. When you must compare expected value in possible worlds in which there is no free will, to expected value in possible worlds in which there is free will, and then make a choice; what that choice actually does is not independent of what possible world you end up in. This means that you can’t apply expectation-maximization in the usual way. The counterintuitive result, I think, is that you should act in the way that maximizes expected value given that there is free will, regardless of the computed expected value given that there is not free will.
As I mentioned, I don’t believe in free will. But I think, based on a history of other concepts or frameworks that seemed paradoxical but were eventually worked out satisfactorily, that it’s possible there’s something to the naive notion of “free will”.
We have a naive notion of “free will” which, so far, no one has been able to connect up with our understanding of physics in a coherent way. This is powerful evidence that it doesn’t exist, or isn’t even a meaningful concept. It isn’t proof, however; I could say the same thing about “consciousness”, which as far as I can see really shouldn’t exist.
All attempts that I’ve seen so far to parse out what free will means, including Eliezer’s careful and well-written essays linked to above, fail to noticeably reduce the probability I assign to there being naive “free will”, because the probability that there is some error in the description or mapping or analogies made is always much higher than the very-low prior probability that I assign to there being “free will”.
I’m not arguing in favor of free will. I’m arguing that, when considering an action to take that is conditioned on the existence of free will, you should not do the usual expected-utility calculations, because the answer to the free will question determines what it is you’re actually doing when you choose an action to take, in a way that has an asymmetry such that, if there is any possibility epsilon > 0 that free will exists, you should assume it exists.
(BTW, I think a philosopher who wished to defend free will could rightfully make the blanket assertion against all of Eliezer’s posts that they assume what they are trying to prove. It’s pointless to start from the position that you are an algorithm in a Blocks World, and argue from there against free will. There’s some good stuff in there, but it’s not going to convince someone who isn’t already reductionist or determinist.)
I have stated exactly what I mean by the term “free will” and it makes this sentence nonsense; there is no world in which you do not have free will. And I see no way that your will could possibly be any freer than it already is. There is no possible amendment to reality which you can consistently describe, that would make your free will any freer than it is in our own timeless and deterministic (though branching) universe.
What do you mean by “free will” that makes your sentence non-nonsense? Don’t say “if we did actually have free will”, tell me how reality could be different.
That’s the part I don’t buy. I’m not saying it’s false, but I don’t see any good reason to think it’s true. (I think I read the posts where you explained why you believe it, but I might have missed some.)
I can’t state exactly what I mean by “free will”, any more than I can state exactly what I mean by “consciousness”. No one has come up with a reductionist account of either. But since I actually do believe in consciousness, I can’t dismiss free will as nonsense.
A clarification added in response to the instantaneous orgy of downvotes: I realize that Eliezer has provided a reductionist explanation for how he thinks “free will” should be interpreted, and for why people believe in it. That is not what I mean. I mean that no one has come up with a reductionist account for how what people actually mean by “free will” could work in the physical world. Just as no one has come up with a reductionist account for how what people mean by “consciousness” could work in the physical world.
If you find a reason to disagree with this, it means that you have a tremendously important insight, and should probably write a little comment to share your revelation with us on a reductionist implementation of naive free will, or consciousness.
This is not only incorrect, but is in dismissive denial of statements to the opposite made by people in response to your questions. One thing is to consider an argument incorrect or to be unwilling to accept it, another is to fail to understand the argument to the point of denying its very existence.
You should be more specific: Point out which part of my statement is incorrect, and what statements I am dismissively denying.
A reductionist account of causality does not count as a reductionist account of free will. Saying, “The world is deterministic, therefore ‘free will’ actually means the uninteresting concept X that is not what anybody means by ‘free will’” does not count as a deterministic account of free will.
What I mean is that no one has provided a reductionist account of how the naive notion of free will could work. Not that no one has provided a reductionist account of how the world actually works and what “free will” maps onto in that world.
I’m also curious why it’s bad for me to dismissively deny statements made to me, but okay for you to dismissively deny my statements as incorrect.
Because that would be as silly as seeking a reductionist account of how souls or gods could “work”—the only way you’re going to get one is by explaining how the brain tends to believe these (purely mental) phenomena actually exist.
Free will is just the feeling that more than one choice is possible, just like a soul or a god is just the feeling of agency, detached from an actual agent.
All three are descriptions of mental phenomena, rather than having anything to do with a physical reality outside the brain.
Again—yes, I agree that what you say is almost certainly true. The reason I said that no one has provided a reductionist account of how the naive notion of free will could work, was to point out its similarity to the question of consciousness, which seems as nonsensical as free will, and yet exists; and thereby show that there is a possibility that there is something to the naive notion. And as long as there is some probability epsilon > 0 of that, then we have the situation I described above when performing expectation maximization.
BTW, your response is an assertion, or at best an explaining-away; not a proof.
The mistake you’re making is that determinism does not mean your decisions are irrelevant. The universe doesn’t swoop in and force you to decide a certain way even though you’d rather not. Determinism only means that your decisions, by being part of physical reality rather than existing outside it, result from the physical events that led to them. You aren’t free to make events happen without a cause, but you can still look at evidence and come to correct conclusions.
If you can’t choose whether you believe, then you don’t choose whether you believe. You just believe or not. The full equation still captures the correctness of your belief, however you arrived at it. There’s nothing inconsistent about thinking that you are forced to not believe and that seeing the equation is (part of) what forced you.
(I avoid the phrase “free will” because there are so many different definitions. You seem to be using one that involves choice, while Eliezer uses one based on control. As I understand it, the two of you would disagree about whether a TV remote in a deterministic universe has free will.)
edit: missing word, extra word
Brian said:
And Alicorn said:
And before either of those, I said:
These all seem to mean the same thing. When you try to argue against what someone said by agreeing with him, someone is failing to communicate.
Brian, my objection is not based on the case fb. It’s based on the cases Fb and fB. fB is a mistake that you had to make. Fb, “choosing to believe that you can’t choose to believe”, is a mistake you didn’t have to make.
Yes. I started writing my reply before Alicorn said anything, took a short break, posted it, and was a bit surprised to see a whole discussion had happened under my nose.
But I don’t see how what you originally said is the same as what you ended up saying.
At first, you said not to consider f because there’s no point. My response was that the equation correctly includes f regardless of your ability to choose based on the solution.
Now you are saying that Fb is different from (inferior to?) fB.
Eliezer_Yudkowsky wrote on 19 August 2009 03:24:46PM:
Tversky demonstrated: One experiment based on the simple dilemma found that approximately 40% of participants played “cooperate” (i.e., stayed silent). Hmmm...
Compassion (in a certain sense) may be part of your answer.
If I (as Prisoner A) have a term in my utility function such that an injury to Prisoner B is an injury to me (discounted), it can make ‘Cooperate’ much more attractive.
I might have enough compassion to be willing to do 6 months in jail if it will spare Prisoner B a 2-year prison term (or more).
For example, given the external payoff matrix given by Wei Dai (http://lesswrong.com/lw/15z/ingredients_of_timeless_decision_theory/11w9) (19 August 2009 07:08:23AM):
My INTERNAL payoff matrix becomes:
And ‘Cooperate’ now strictly dominates using elementary game theory.
Thank you for your time and consideration.
RickJS
While a good question, Eliezer_Yudkowsky has already thoroughly answered it in The True Prisoner’s Dilemma.
His point there is, the values in the matrix are supposed to represent the participants’ utility, rather than jail time, which accounts for your compassion for your friend. If it were simply prison sentences, your reasoning would apply, which is why EY says the true Prisoner’s Dilemma requires convoluted, unusual scenarios, and why normal presentations of the PD don’t make the situation clear.
That Prisoner A is completely and utterly selfish is part of the Prisoner’s Dilemma. If the prisoner’s not selfish, it’s not the Prisoner’s Dilemma anymore.
EDIT: Of course, this is only true if the numbers in the matrix represent years spent in jail, not utilons.
inorite?!
Of course, this might still be muddy if you recast the payoff matrix in utilons, or (to abstract away less) adjust the “external” payoff matrices so that the “internal” payoff matrices match those of the original problem.
Inorite? What is that?
I suspect I’m not smart enough to play on this site. I’m quite unsure I can even parse your sentence correctly, and I can’t imagine a reason to adjust the external payoff matrices (they were given by Wei Dai, that is the original problem I’m discussing) so the internal payoff mtrices match something. I’m baffled.
“inorite”.
See Cyan’s comment below. Do not be dispirited by lolspeak.
Also, the reason to adjust the payoff matrices in the original problem is so that your ‘internal’ payoff matrices match those of Wei Dai’s problem, or to put it another way, consider the problem in the least convenient possible world. Basically, the prisoner’s dilemma is still there if you take the problem to be in utilons, which take into account things like your ‘compassion’ (in this case, valuing the reward given to the other person). I can’t quite figure out what your formula for discounting is above, so let me simplify...
It would be remiss for me to not do the math, though it is not my forte:
Suppose the matrix represents jelly beans for you or the opponent, each worth 1 utilon. Further suppose that you get .25 utilons for each jelly bean the opponent gets, due to your ‘compassion’. Now take this payoff matrix (in jellybeans):
Which becomes in your ‘internal’ matrix (in utilons):
Now cooperation is dominated by defection for the ‘compassionate’ person.
Someone please note if my numbers don’t work out—it’s early here.
Ah. Thanks! I think I get that.
But maybe I just think I do. I thought I understood that narrow part of Wei Dai’s post on a problem that maybe defeats TDT. I had no idea that compassion had already been considered and compensated out of consideration. And that’s such common shared knowledge here in the LessWrong community that it need not be mentioned.
I have a lot to learn. I now see I was very arrogant think I could contribute here. I should read the archives & wiki before I post. I apologize.
<<Begins to compute an estimated time to de-lurk. They collectively write several times faster than I can read, even if I don’t slow down to mull it over. Hmmm… >>
I’m pretty sure Socrates and Aristotle already pointed much of this out in different words. I should make a post about that. Of course, they didn’t do the math.
I agree with cousin_it below. It seems like you’re missing some math.
But other than that, I don’t see what the big deal is. I was expecting something monumental and game-changing, not “Is that it?”
This is indeed interesting, although it seems to be going over my head somewhat.
Re: “Some concluding chiding of those philosophers who blithely decided that the “rational” course of action systematically loses”
Some of those philosophers draw a distinction between rational action and the actions of a rational agent—see here:
So: these folk had got the right answer, and any debate with them is over terminology.
(Looks over Tim Tyler’s general trend in comments.)
Okay. It’s helpful that you’re doing a literature search. It’s not helpful that every time you find something remotely related, you feel a need to claim that it is already TDT and that TDT is nothing innovative by comparison. It does not appear to me that you understand either the general background of these questions as they have been pursued within decision theory, or TDT in particular. Literature search is great, but if you’re just spending 15 minutes Googling, then you have insufficient knowledge to compare the theories. Plenty of people have called for a decision theory that one-boxes on Newcomb and smokes on the smoking lesion—the question is coughing up something that seems reasonably formal. Plenty of people have advocated precommitment, but it comes with its own set of problems, and that is why a non-precommitment-based decision theory is important.
In the spirit of dredging up references with no actual deep insight, I note this recent post on Andrew Gelman’s blog.
Well, other people have previously taken a crack at the same problem.
If they have resolved it, then I should think that would be helpful—since then you can look at their solution. If not, their efforts to solve the problem might still be enlightening.
So: I think my contribution in this area is probably helpful.
15 minutes was how long it took me to find the cited material in the first place. Not trivial—but not that hard.
No need to beat me up for not knowing the background of your own largely unpublished theory!
...but yes, in my view, advanced decision theory is a bit of a red herring for those interested in machine intelligence. It’s like: that is so not the problem. It seems like wondering whether to use butter-icing or marzipan on the top of the cake—when you don’t yet have the recipe or the ingredients.
The cited material isn’t much different from a lot of other material in the same field.
So far, “Disposition-Based Decision Theory” (and its apparently-flawed precursor) is the only thing I have seen that apparently claims to address and solve the same problem that is under discussion in this forum:
I suppose there’s also a raft of CDT enthusiasts, who explain why two-boxing is actually not a flaw in their system, and that they have no objections to the idea of agents who one-box. In their case, the debate appears to be over terminology: what does the word “rational” actually mean—is it about choosing the best action from the available options? Or does it mean something else?
Are there other attempts at a solution? Your turn for some references, I feel.
“Paradoxes of Rationality and Cooperation” (the edited volume) will give you a feel for the basics, as will reading Marion Ledwig’s thesis paper.
Marion Ledwig’s thesis appears to be an overview of Newcomb’s Problem from 2000. That’s from before the disposition-based decision theory I referenced was worked out—and there’s minimal coverage.
Are you suggesting that there are some proposed solutions to the problem of building a decision theory that “does the right thing” somewhere in there, that pre-date disposition-based decision theory?
The main thesis there (in the section “Newcomb’s Problem as a Game against Nature”) seems to go against what many people think here—and is more along the lines of CDT.
Anyway, since it’s a 300 page thesis, perhaps you would like to be more specific.
Or maybe you are just waving in the general direction of the existing literature. In which case, I fail to see how that addresses my point.
“Paradoxes of Rationality and Cooperation” dates from 1985. That seems rather unlikely to have relevant coverage either. Again, it came too early—before the first attempts at a solution that I’m aware of.
People have been trying to solve the problem since the day it was presented, and it’s pretty clear that you don’t understand which parts of this particular solution are supposed to be novel. The main novel idea is the incorporation of logical uncertainty into Pearl-style causal graphs and the formulation of the counterfactuals as surgery over those causal graphs.
The idea that rationalists should make lots of money, versus the idea that rationalists should appear very reasonable, has been a central point of controversy from the beginning.
Talking about dispositions and precommitments has been going on since the beginning.
If you’re going to start waving judgments of novelty around, then read the literature.
One problem here is that the “particular solution” tha I am apparently expected to be understanding the novelty of hasn’t actually been published. Instead what we have is some notes.
The problem I was considering involves finding a method which obtains the “right” answer to problems like Newcomb’s problem and The Smoking Problem with a decision theory. If you are trying to solve some other problem, that’s fine.
If “precommitment” just means cutting off some of your options in advance, precommitment seems to be desirable—under various circumstances where you want to signal commitment—and believe that faked commitment signals would be detected. You use the term as though it is in some way negative.
It seems to me that I have not encountered the critics of precommitment saying what they mean by the term. Consequently, it is hard to see what problems they see with the idea.
This is the crippleware version of TDT that pure CDT agents self-modify to. It’s crippleware because if you self-modify at 7:00pm you’ll two-box against an Omega who saw your code at 6:59am.
By hypothesis, Omega on examining your code at 6:59, knows that you will self-modify at 7:00 and one-box thereafter.
Consider that every TDT agent must be derived from a non-TDT agent. There is no difference in principle between “I used to adhere to CDT but self-modified to TDT” and “I didn’t understand TDT when I was a child, but I follow it now as an adult”.
Correction made, thanks to Tim Tyler.
CDT agents don’t care. They aren’t causing Omega to fill box B by changing their source code at 7pm, so they have no reason to change their source code in a way that takes only one box. The source code change only causes Omega to fill box B if Omega looks at their source code after 7pm. That is how CDT agents (unwisely) compute “causes”.
Yes, but the CDT agent at seven o’clock is not being asked to choose one or two boxes. It has to choose between rewriting its algorithm to plain TDT (or DBDT or some variant that will one box), or to TDT with an exception clause “but use the old algorithm if you find out Omega’s prediction was made before seven o’clock”. Even by straight CDT, there is no motive for writing that exception.
This is the point at which I say “Wrong” and “Read the literature”. I’m not sure how I can explain this any more clearly than I have already, barring a full-fledged sequence. At 7pm the CDT agent calculates that if it modifies its source to use the old algorithm in cases where Omega saw the code before 7pm, it will get an extra thousand dollars on Newcomb’s Problem, since it will take box A which contains an additional thousand dollars, and since its decision to modify its code at 7pm has no effect on an Omega who saw the code before 7pm, hence no effect on whether box B is full. It does not reason “but Omega knows I will change my code”. If it reasoned that way it would be TDT, not CDT, and would one-box to begin with.
Actually I will add another comment because I can now articulate where the ambiguity comes in: how you add self modification to CDT (which doesn’t have it in the usual form); I’ve been assuming the original algorithm doesn’t try to micromanage the new algorithm’s decisions (which strikes me as the sensible way, not least because it gives better results here); you’ve been assuming it does (which I suppose you could argue, is more true to the spirit of the original CDT).
I still disagree, but I agree that we have hit the limits of discussion in this comment thread; fundamentally this needs to be analyzed in a more precise language than English. We can revisit it if either of us ever gets to actually programming anything like this.
By what hypothesis? That is not how the proposed Disposition-Based Decision Theory says it works. It claims to result in agents who have the disposition to one-box.
Sure. This sub thread was about plain CDT, and how it self-modifies into some form of DBDT/TDT once it figures out the benefits of doing so—and given the hypothesis of an omniscient Omega, then Omega will know that this will occur.
In that case, what I think you meant to say was:
Doh! Thanks for the correction, editing comment.
I don’t see any reason for thinking this fellow’s work represents “crippleware”.
It seems to me that he agrees with you regarding actions, but differs about terminology.
Here’s the CDT explanation of the terminology:
The basic idea of forming a disposition to one-box has been around for a while. Here’s another one:
Realistic decision theory: rules for nonideal agents … by Paul Weirich − 2004
...and another one:
“DISPOSITION-BASED DECISION THEORY”
In Eliezer’s article on Newcomb’s problem, he says, “Omega has been correct on each of 100 observed occasions so far—everyone who took both boxes has found box B empty and received only a thousand dollars; everyone who took only box B has found B containing a million dollars. ” Such evidence from previous players fails to appear in some problem descriptions, including Wikipedia’s.
For me this is a “no-brainer”. Take box B, deposit it, and come back for more. That’s what the physical evidence says. Any philosopher who says “Taking BOTH boxes is the rational action,” occurs to me as an absolute fool in the face of the evidence. (But I’ve never understood non-mathematical philosophy anyway, so I may a poor judge.)
Clarifying (NOT rhetorical) questions:
Have I just cheated, so that “it’s not the Newcomb Problem anymore?”
When you fellows say a certain decision theory “two-boxes”, are those theory-calculations including the previous play evidence or not?
Thanks for your time and attention.
There is no opportunity to come back for more. Assume that when you take box B before taking box A, box A is removed.
Yes, I read about ” … disappears in a puff of smoke.” I wasn’t coming back for a measly $1K, I was coming back for another million! I’ll see if they’ll let me play again. Omega already KNOWS I’m greedy, this won’t come as a shock. He’ll probably have told his team what to say when I try it.
″ … and come back for more.” was meant to be funny.
Anyway, this still doesn’t answer my questions about “Omega has been correct on each of 100 observed occasions so far—everyone who took both boxes has found box B empty and received only a thousand dollars; everyone who took only box B has found B containing a million dollars.”
Someone please answer my questions! Thanks!
The problem needs lots of little hypotheses about Omega. In general, you can create these hypotheses for yourself, using the principle of “Least Convenient Possible World”
http://lesswrong.com/lw/2k/the_least_convenient_possible_world/
Or, from philosophy/argumentation theory, “Principle of Charity”.
http://philosophy.lander.edu/intro/charity.shtml
In your case, I think you need to add at least two helper assumptions—Omega’s prediction abilities are trustworthy, and Omega’s offer will never be repeated—not for you, not for anyone.
What the physical evidence says is that the boxes are there, the money is there, and Omega is gone. So what does your choice effect and when?
Well, I mulled that over for a while, and I can’t see any way that contributes to answering my questions.
As to ” … what does your choice effect and when?”, I suppose there are common causes starting before Omega loaded the boxes, that affect both Omega’s choices and mine. For example, the machinery of my brain. No backwards-in-time is required.
Penalising a rational agent for its character flaws while it is under construction seems like a rather weak objection. Most systems have a construction phase during which they may behave imperfectly—so similar objections seem likely to apply to practically any system. However, this is surely no big deal: once a synthetic rational agent exists, we can copy its brain. After that, developmental mistakes would no longer be much of a factor.
It does seem as though this makes CDT essentially correct—in a sense. The main issue would then become one of terminology—of what the word “rational” means. There would be no significant difference over how agents should behave, though.
My reading of this issue is that the case goes against CDT. Its terminology is misleading. I don’t think there’s much of a case that it is wrong, though.
Eric Barnes—while appreciating the benefits of taking one box—has harsh words for the “taking one box is rational” folk.
(Sigh.)
Yes, causal decision theorists have been saying harsh words against the winners on Newcomb’s Problem since the dawn of causal decision theory. I am replying to them.
Note that this is the same guy who says:
He’s drawing a distinction between a “rational action” and the actions of a “rational agent”.
Newcomb’s Problem capriciously rewards irrational people in the same way that reality capriciously rewards people who irrationally believe their choices matter.
“The Smoking Lesion” is a terrible, terrible example—since it starts by hypothesising that a well-known falsehood is true:
That’s medical bullshit! Why would anyone make up such an insanely counter-factual example? Is the author on the payroll of Big Tobacco—or something?
Creating a counterfactual similar to a situation we’re familiar with helps us form intuitions about it more easily. You could replace the problem, perhaps, with the old saw about the Calvinist deity, predetermination, and the decision to enjoy a life of sin.
Doesn’t evidential decision theory get the right answer in that problem, and causal the wrong one? in which case it’s the opposite of the smoking lesion problem.
That’s correct. I don’t know what the person who wrote the grandparent of this comment could have been thinking; it’s as if he didn’t understand decision theory...
Why use a counterfactual example at all? Surely there’s no need to have an example that contains falsehoods—it just creates unnecessary problems.
My reaction was more along the lines of: this author just tried to subliminally slip me some potentially damaging health advice! I had better watch out for other lies they might want to subliminally feed me for their own nefarious ends!
Then you might want to recalibrate your deception-detection heuristics.
What—and have them so they don’t trigger even on outrageous falsehoods? That sounds as though it would be a dubious plan. Isn’t determining when people are lying to you an important skill? I can put up with a few false positives—and that seems preferable to missing deceptions.
Maybe just so they don’t trigger on outrageous counterfactuals.
Yeah— and what gives that Einstein guy the right to make me imagine riding a beam of light? That’s outrageously impossible— he must be trying to make me attempt lightspeed travel so that I’ll die!
Or the journalistic ethics counterfactual “What if you found conclusive evidence that the Diary of Anne Frank was a fake?”— the ethics professor must be a neo-Nazi!