By “analog of one-shot true PD” I meant any game where the Nash equilibrium isn’t Pareto-optimal. The two links in my last comment gave plenty of examples.
all the complexity is hidden in “would want”
I think I formalized it already, but to say it again, suppose the creator had the option of creating a giant lookup table in place of S. What choice of GLT would have maximized his expected utility at the time of coding, under the creator’s own decision theory? S would compute that and then return whatever the GLT entry for X is.
ETA:
Can you walk me through what you think a CDT agent self-modifies to
It self-modifies to the S described above, with a description of itself embedded as the creator. Or to make it even simpler but less realistic, a CDT just replaces itself by a GLT, chosen to maximize its current expected utility.
By “analog of one-shot true PD” I meant any game where the Nash equilibrium isn’t Pareto-optimal. The two links in my last comment gave plenty of examples.
Suppose we have an indefinitely iterated PD with an unknown bound and hard-to-calculate but small probabilities of each round being truly unobserved. Do you call that “a game where the Nash equilibrium isn’t a Pareto optimum”? Do you think evolution has handled it by programming us to just defect?
I’ve done some informal psychological experiments to check human conformance with timeless decision theory on variants of the original Newcomb’s Problem, btw, and people who one-box on Newcomb’s Problem seem to have TDT intuitions in other ways. Not that this is at all relevant to the evolutionary dilemmas, which we seem to’ve been programmed to handle by being temptable, status-conscious, and honorable to variant quantitative degrees.
But programming an AI to cooperate with strangers on oneshot true PDs out of a human sense of honor would be the wrong move—our sense of honor isn’t the formal “my C iff (opponent C iff my C)”, so a TDT agent would then defect against us.
I just don’t see human evolution—status, temptation, honor—as being very relevant here. An AI’s decision theory will be, and should be, decided by our intuitions about logic and causality, not about status, temptation, and honor. Honor enters as a human terminal value, not as a decider of the structure of the decision theory.
How do you play “cooperate iff (the opponent cooperates iff I cooperate)” in a GLT? Is the programmer supposed to be modeling the opponent AI in sufficient resolution to guess how much the opponent AI knows about the programmer’s decision, and how many other possible programmers that the AI is modeling are likely to correlate with it? Does S compute the programmer’s decision using S’s knowledge or only the programmer’s knowledge? Does S compute the opponent inaccurately as if it were modeling only the programmer, or accurately as if it were modeling both the programmer and S?
I suppose that a strict CDT could replace itself with a GLT, if that GLT can take into account all info where the opponent AI gets a glimpse at the GLT after it’s written. Then the GLT behaves just like the code I specified before on e.g. Newcomb’s Problem—one-box if Omega glimpses the GLT or gets evidence about it after the GLT was written, two-box if Omega perfectly knows your code 5 seconds before the GLT gets written.
[Edit: Don’t bother responding to this yet. I need to think this through.]
How do you play “cooperate iff (the opponent cooperates iff I cooperate)” in a GLT?
I’m not sure this question makes sense. Can you give an example?
Does S compute the programmer’s decision using S’s knowledge or only the programmer’s knowledge?
S should take the programmer R’s prior and memories/sensory data at the time of coding, and compute a posterior probability distribution using them (assuming it would do a better job at this than R). Then use that to compute R’s expected utility for the purpose of computing the optimal GLT. This falls out of the idea that S is trying to approximate what the GLT would be if R had logical omniscience.
Is the programmer supposed to be modeling the opponent AI in sufficient resolution to guess how much the AI knows about the programmer?
No, S will do it.
Does S compute the opponent as if it were modeling only the programmer, or both the programmer and S?
I guess both, but I don’t understand the significance of this question.
By “analog of one-shot true PD” I meant any game where the Nash equilibrium isn’t Pareto-optimal. The two links in my last comment gave plenty of examples.
I think I formalized it already, but to say it again, suppose the creator had the option of creating a giant lookup table in place of S. What choice of GLT would have maximized his expected utility at the time of coding, under the creator’s own decision theory? S would compute that and then return whatever the GLT entry for X is.
ETA:
It self-modifies to the S described above, with a description of itself embedded as the creator. Or to make it even simpler but less realistic, a CDT just replaces itself by a GLT, chosen to maximize its current expected utility.
Is that sufficiently clear?
Suppose we have an indefinitely iterated PD with an unknown bound and hard-to-calculate but small probabilities of each round being truly unobserved. Do you call that “a game where the Nash equilibrium isn’t a Pareto optimum”? Do you think evolution has handled it by programming us to just defect?
I’ve done some informal psychological experiments to check human conformance with timeless decision theory on variants of the original Newcomb’s Problem, btw, and people who one-box on Newcomb’s Problem seem to have TDT intuitions in other ways. Not that this is at all relevant to the evolutionary dilemmas, which we seem to’ve been programmed to handle by being temptable, status-conscious, and honorable to variant quantitative degrees.
But programming an AI to cooperate with strangers on oneshot true PDs out of a human sense of honor would be the wrong move—our sense of honor isn’t the formal “my C iff (opponent C iff my C)”, so a TDT agent would then defect against us.
I just don’t see human evolution—status, temptation, honor—as being very relevant here. An AI’s decision theory will be, and should be, decided by our intuitions about logic and causality, not about status, temptation, and honor. Honor enters as a human terminal value, not as a decider of the structure of the decision theory.
How do you play “cooperate iff (the opponent cooperates iff I cooperate)” in a GLT? Is the programmer supposed to be modeling the opponent AI in sufficient resolution to guess how much the opponent AI knows about the programmer’s decision, and how many other possible programmers that the AI is modeling are likely to correlate with it? Does S compute the programmer’s decision using S’s knowledge or only the programmer’s knowledge? Does S compute the opponent inaccurately as if it were modeling only the programmer, or accurately as if it were modeling both the programmer and S?
I suppose that a strict CDT could replace itself with a GLT, if that GLT can take into account all info where the opponent AI gets a glimpse at the GLT after it’s written. Then the GLT behaves just like the code I specified before on e.g. Newcomb’s Problem—one-box if Omega glimpses the GLT or gets evidence about it after the GLT was written, two-box if Omega perfectly knows your code 5 seconds before the GLT gets written.
[Edit: Don’t bother responding to this yet. I need to think this through.]
I’m not sure this question makes sense. Can you give an example?
S should take the programmer R’s prior and memories/sensory data at the time of coding, and compute a posterior probability distribution using them (assuming it would do a better job at this than R). Then use that to compute R’s expected utility for the purpose of computing the optimal GLT. This falls out of the idea that S is trying to approximate what the GLT would be if R had logical omniscience.
No, S will do it.
I guess both, but I don’t understand the significance of this question.