As far as linking “space of algorithms” to “Actual thing effectively implemented”, the idea is more to maintain one of the key ideas of TDT… ie, that what you’re ‘controlling’ (for lack of a better word) is effectively the output of all instances of the algorithm you actually implement, right?
Yes, well, the goal should be to find an accurate representation of the situation. If the causal model implied by TDT doesn’t fit, well, all the worse for TDT!
And incidentally, this idea:
that what you’re ‘controlling’ (for lack of a better word) is effectively the output of all instances of the algorithm you actually implement
is very troubling to me in that it is effectively saying that my internal thoughts have causal power (or something indistinguishable therefrom) over other people: that by coming to one conclusion means other people must be coming to the same conclusion with enough frequency to matter for my predictions.
Yes, the fact that I have general thoughts, feeling, emotions, etc. is evidence that other people have them … but it’s very weak evidence, and more so for my decisions. Whatever the truth behind the TDT assumption, it’s not likely to be applicable often.
But… two instances of the exact same (deterministic) algorithm fed the exact same parameters will return the exact same output. So when choosing what the output ought to be, one ought to act as if one is determining the output of all the instances.
Perfectly true, and perfectly useless. You’ll never find the exact same deterministic algorithm as the one you’re implementing, especially when you consider the post-decision interference of innards. Any identical sub-algorithm will be lost in a sea of different sub-algorithms.
Sure, but then we can extend to talking about classes of related algorithms that produce related outputs, so that there would still statistical dependence even if they’re not absolutely identical in all cases. (And summing over ‘which algorithm is being run’ would be part of dealing with that.)
But statistical dependence does not imply large dependence nor does it imply useful dependence. The variety of psychology among people means that my decision is weak evidence of others’ decision, even if it is evidence. It doesn’t do much to alter my prior of how other observations have molded my expectations of people. (And this point applies doubly so in the case of autistic spectrum people like me who are e.g. surprised at other’s unwillingness to point out my easily correctible difficulties.)
Now, if we were talking about a situation confined to that one sub-algorithm, your point would still have validity. But the problem involves the interplay of other algorithms with even more uncertainty.
Plus, the inherent implausibility of the position (implied by TDT) that my decision to be more charitable must mean that other people just decided to become more charitable.
Well, one would actually take into account the degree of dependence when doing the relevant computation.
And your decision to be more charitable would correlate to others being so to the extent that they’re using related methods to come to their own decision.
Well, one would actually take into account the degree of dependence when doing the relevant computation.
Yes, and here’s what it would look like: I anticipate a 1⁄2 + e probability of the other person doing the same thing as me in the true PD. I’ll use the payoff matrix of
C D
C (3,3) (0,5)
D (5,0) (1,1)
where the first value is my utility. The expected payoff is then (after a little algebra):
If I cooperate: 3⁄2 + 3e; if I defect: 3 − 4e
Defection has a higher payoff as long as e is less than 3⁄14 (total probability of other person doing what I do = 10⁄14). So you should cooperate as long as you have over 0.137 bits of evidence that they will do what you do. Does the assumption that other people’s algorithm has a minor resemblance to mine get me that?
And your decision to be more charitable would correlate to others being so to the extent that they’re using related methods to come to their own decision.
Yes, and that’s the tough bullet to bite: me being more charitable, irrespective the impact of my charitable action, causes (me to observe) other people being more charitable.
But if you are literally talking about the same computation, that computation must be unable to know which instance it is operating from. Once the question of getting identified with “other” instances is raised, the computations are different, and can lead to different outcomes, if these outcomes nontrivially depend on the different contexts. How is this progress compared to the case of two identical copies in PD that know their actions to be necessarily identical, and thus choosing between (C,C) and (D,D)?
The interesting part is making the sense of dependence between different decision processes precise.
No problem.
As far as linking “space of algorithms” to “Actual thing effectively implemented”, the idea is more to maintain one of the key ideas of TDT… ie, that what you’re ‘controlling’ (for lack of a better word) is effectively the output of all instances of the algorithm you actually implement, right?
Yes, well, the goal should be to find an accurate representation of the situation. If the causal model implied by TDT doesn’t fit, well, all the worse for TDT!
And incidentally, this idea:
is very troubling to me in that it is effectively saying that my internal thoughts have causal power (or something indistinguishable therefrom) over other people: that by coming to one conclusion means other people must be coming to the same conclusion with enough frequency to matter for my predictions.
Yes, the fact that I have general thoughts, feeling, emotions, etc. is evidence that other people have them … but it’s very weak evidence, and more so for my decisions. Whatever the truth behind the TDT assumption, it’s not likely to be applicable often.
“controlling” is the wrong word.
But… two instances of the exact same (deterministic) algorithm fed the exact same parameters will return the exact same output. So when choosing what the output ought to be, one ought to act as if one is determining the output of all the instances.
Perfectly true, and perfectly useless. You’ll never find the exact same deterministic algorithm as the one you’re implementing, especially when you consider the post-decision interference of innards. Any identical sub-algorithm will be lost in a sea of different sub-algorithms.
Sure, but then we can extend to talking about classes of related algorithms that produce related outputs, so that there would still statistical dependence even if they’re not absolutely identical in all cases. (And summing over ‘which algorithm is being run’ would be part of dealing with that.)
But statistical dependence does not imply large dependence nor does it imply useful dependence. The variety of psychology among people means that my decision is weak evidence of others’ decision, even if it is evidence. It doesn’t do much to alter my prior of how other observations have molded my expectations of people. (And this point applies doubly so in the case of autistic spectrum people like me who are e.g. surprised at other’s unwillingness to point out my easily correctible difficulties.)
Now, if we were talking about a situation confined to that one sub-algorithm, your point would still have validity. But the problem involves the interplay of other algorithms with even more uncertainty.
Plus, the inherent implausibility of the position (implied by TDT) that my decision to be more charitable must mean that other people just decided to become more charitable.
Well, one would actually take into account the degree of dependence when doing the relevant computation.
And your decision to be more charitable would correlate to others being so to the extent that they’re using related methods to come to their own decision.
Yes, and here’s what it would look like: I anticipate a 1⁄2 + e probability of the other person doing the same thing as me in the true PD. I’ll use the payoff matrix of
C D
C (3,3) (0,5)
D (5,0) (1,1)
where the first value is my utility. The expected payoff is then (after a little algebra):
If I cooperate: 3⁄2 + 3e; if I defect: 3 − 4e
Defection has a higher payoff as long as e is less than 3⁄14 (total probability of other person doing what I do = 10⁄14). So you should cooperate as long as you have over 0.137 bits of evidence that they will do what you do. Does the assumption that other people’s algorithm has a minor resemblance to mine get me that?
Yes, and that’s the tough bullet to bite: me being more charitable, irrespective the impact of my charitable action, causes (me to observe) other people being more charitable.
But if you are literally talking about the same computation, that computation must be unable to know which instance it is operating from. Once the question of getting identified with “other” instances is raised, the computations are different, and can lead to different outcomes, if these outcomes nontrivially depend on the different contexts. How is this progress compared to the case of two identical copies in PD that know their actions to be necessarily identical, and thus choosing between (C,C) and (D,D)?
The interesting part is making the sense of dependence between different decision processes precise.