Suppose you run your twins scenario, and the twins both defect. You visit one of the twins to discuss the outcome.
Consider the statement: “If you had cooperated, your twin would also have cooperated, and you would have received $1M instead of $1K.” I think this is formally provable, given the premises.
Now consider the statement: “If you had cooperated, your twin would still have defected, and you would have received $0 instead of $1K.” I think this is also formally provable, given the premises. Because we have assumed a deterministic AI that we already know will defect given this particular set of inputs! Any statement that begins “if you had cooperated...” is assuming a contradiction, from which literally anything is formally provable.
You say in the post that only the cooperate-cooperate and defect-defect outcomes are on the table, because cooperate-defect is impossible by the scenario’s construction. I think that cooperate-cooperate and defect-defect aren’t both on the table, either. Only one of those outcomes is consistent with the AI program that you already copied. If we can say you don’t need to worry about cooperate-defect because it’s impossible by construction, then in precisely what sense are cooperate-cooperate and defect-defect both still “possible”?
I feel like most people have a mental model for deterministic systems (billiard balls bouncing off each other, etc.) and a separate mental model for agents. If you can get your audience to invoke both of these models at once, you have probably instantiated in their minds a combined model with some latent contradiction in it. Then, by leading your audience down a specific path of reasoning, you can use that latent contradiction to prove essentially whatever you want.
(To give a simple example, I’ve often seen people ask variations of “does (some combinatorial game) have a 50⁄50 win rate if both sides play optimally?” A combinatorial game, played optimally, has only one outcome, which must occur 100% of the time; but non-mathematicians often fail to notice this, and apply their usual model of “agents playing a game” even though the question constrained the “agents” to optimal play.)
I notice this post uses a lot of phrases like “it actually works” and “try it yourself” when talking about the twins example. Unless there’s been a recent breakthrough in mind uploading that I haven’t heard about, this wording implies empirical confirmation that I’m pretty confident you don’t have (and can’t get).
If you were forced to express your hypothetical scenarios in computer source code, instead of informal English descriptions, I think it would probably be pretty easy to run some empirical tests and see which strategies actually get better outcomes. But I don’t know, and I suspect you don’t know, how to “faithfully” represent any of these examples as source code. This leaves me suspicious that perhaps all the interesting results are just confusions, rather than facts about the universe.
My reading is that the crux of the argument here is: causality implies no free will in twin PD, so equally, free will implies no causality; therefore, we can use our free will to break causality. The relevant quote:
because only one of (a) or (b) is compatible with the past/the physical laws, and because you are free to choose (a) or (b), it turns out that in some sense, you’re free to choose the past/the physical laws (or, their computational analogs).
To me, Occam’s razor prefers no-free-will to causality-breaking. Granted, causality is as mysterious as free will. But causality is more fundamental, more basic — it exists in non-agent systems too. Free will, on the other hand, privileges agents, as if there’s something metaphysical about them.
By the way, the causality view is still consistent with one-boxing. I go with causality.
I think this type of reflection is the decision theory equivalent of calculating the perfect launch sequence in Kerbal Space Program. If you sink enough time into it, you can probably achieve it, but by then you’ll have loooong passed the point of diminishing returns, and very little of what you’ve learned will be applicable in the real world, because you’ve spent all your energy optimizing strategies that immediately fall apart the second any uncertainty or fuzziness is introduced into your simulation.
Has Functional Decisions Theory ever been tested “on the field”, so to speak? Is there any empirical evidence that it actually helps people / organizations / AIs make better decisions in the real world?
Look, I’m going to be an asshole, but no, that doesn’t count.
There are millions of stories of the type “I lost lots of weight thanks to X even though nothing else had worked” around. They are not strong evidence that X works.
FWIW, in your comment above you had asked for “any empirical evidence”.
I agree that Zvi’s story is not “strong evidence”, but I don’t think that means it “doesn’t count” — a data point is a data point, even if inconclusive on its own.
(And I think it’s inappropriate to tell someone that a data point “doesn’t count” in response to a request for “any empirical evidence”. In other words, I agree with your assessment that you were being a little bit of an asshole in that response ;-) )
When deciding to skip the gym, FDT would tell you your decision procedure now is similar to the one you use each day when deciding to go to the gym. So you’d best go now, because then you always go. (This is a bit simplified, as the situation may not be the same each day, but the point stands.)
Furthermore, FDT denies voting is irrational when there are enough voters who are enough similarly-minded to you (who vote when you vote, since their decision procedure is the same). This is a pretty cool result.
Also, it may be worth noting that many real-life scenarios are Newcomblike: e.g. people predict what you will do using your microexpressions. Newcomb’s Problem is just a special case.
Suppose you run your twins scenario, and the twins both defect. You visit one of the twins to discuss the outcome.
Consider the statement: “If you had cooperated, your twin would also have cooperated, and you would have received $1M instead of $1K.” I think this is formally provable, given the premises.
Now consider the statement: “If you had cooperated, your twin would still have defected, and you would have received $0 instead of $1K.” I think this is also formally provable, given the premises. Because we have assumed a deterministic AI that we already know will defect given this particular set of inputs! Any statement that begins “if you had cooperated...” is assuming a contradiction, from which literally anything is formally provable.
You say in the post that only the cooperate-cooperate and defect-defect outcomes are on the table, because cooperate-defect is impossible by the scenario’s construction. I think that cooperate-cooperate and defect-defect aren’t both on the table, either. Only one of those outcomes is consistent with the AI program that you already copied. If we can say you don’t need to worry about cooperate-defect because it’s impossible by construction, then in precisely what sense are cooperate-cooperate and defect-defect both still “possible”?
I feel like most people have a mental model for deterministic systems (billiard balls bouncing off each other, etc.) and a separate mental model for agents. If you can get your audience to invoke both of these models at once, you have probably instantiated in their minds a combined model with some latent contradiction in it. Then, by leading your audience down a specific path of reasoning, you can use that latent contradiction to prove essentially whatever you want.
(To give a simple example, I’ve often seen people ask variations of “does (some combinatorial game) have a 50⁄50 win rate if both sides play optimally?” A combinatorial game, played optimally, has only one outcome, which must occur 100% of the time; but non-mathematicians often fail to notice this, and apply their usual model of “agents playing a game” even though the question constrained the “agents” to optimal play.)
I notice this post uses a lot of phrases like “it actually works” and “try it yourself” when talking about the twins example. Unless there’s been a recent breakthrough in mind uploading that I haven’t heard about, this wording implies empirical confirmation that I’m pretty confident you don’t have (and can’t get).
If you were forced to express your hypothetical scenarios in computer source code, instead of informal English descriptions, I think it would probably be pretty easy to run some empirical tests and see which strategies actually get better outcomes. But I don’t know, and I suspect you don’t know, how to “faithfully” represent any of these examples as source code. This leaves me suspicious that perhaps all the interesting results are just confusions, rather than facts about the universe.
Indeed.
My reading is that the crux of the argument here is: causality implies no free will in twin PD, so equally, free will implies no causality; therefore, we can use our free will to break causality. The relevant quote:
To me, Occam’s razor prefers no-free-will to causality-breaking. Granted, causality is as mysterious as free will. But causality is more fundamental, more basic — it exists in non-agent systems too. Free will, on the other hand, privileges agents, as if there’s something metaphysical about them.
By the way, the causality view is still consistent with one-boxing. I go with causality.
Agreed.
I think this type of reflection is the decision theory equivalent of calculating the perfect launch sequence in Kerbal Space Program. If you sink enough time into it, you can probably achieve it, but by then you’ll have loooong passed the point of diminishing returns, and very little of what you’ve learned will be applicable in the real world, because you’ve spent all your energy optimizing strategies that immediately fall apart the second any uncertainty or fuzziness is introduced into your simulation.
How so? Functional Decision Theory handles these situations beautifully, with or without uncertainty.
Has Functional Decisions Theory ever been tested “on the field”, so to speak? Is there any empirical evidence that it actually helps people / organizations / AIs make better decisions in the real world?
Zvi would tell you that yes it has: How I Lost 100 Pounds Using TDT.
Look, I’m going to be an asshole, but no, that doesn’t count.
There are millions of stories of the type “I lost lots of weight thanks to X even though nothing else had worked” around. They are not strong evidence that X works.
FWIW, in your comment above you had asked for “any empirical evidence”.
I agree that Zvi’s story is not “strong evidence”, but I don’t think that means it “doesn’t count” — a data point is a data point, even if inconclusive on its own.
(And I think it’s inappropriate to tell someone that a data point “doesn’t count” in response to a request for “any empirical evidence”. In other words, I agree with your assessment that you were being a little bit of an asshole in that response ;-) )
Alright, sorry. I should have asked “is there any non-weak empirical evidence that...”. Sorry if I was condescending.
When deciding to skip the gym, FDT would tell you your decision procedure now is similar to the one you use each day when deciding to go to the gym. So you’d best go now, because then you always go. (This is a bit simplified, as the situation may not be the same each day, but the point stands.)
Furthermore, FDT denies voting is irrational when there are enough voters who are enough similarly-minded to you (who vote when you vote, since their decision procedure is the same). This is a pretty cool result.
Also, it may be worth noting that many real-life scenarios are Newcomblike: e.g. people predict what you will do using your microexpressions. Newcomb’s Problem is just a special case.