In the discussion, I had proposed this as a causal net that captures all of your concerns, and I still don’t see why it doesn’t. Explanation
First of all, I will remind you that all nodes on a Bayesian causal net implicitly have a lone parent (disconnected from all other such parents) that represents uncertainty, which you explicitly represent in your model as “messy details of physical reality” and “more [i.e. independent] messy details of physical reality”.
Similarly, another being that might base its actions on a model of your behavior will be represented as having a model of your innards and the model itself having a selector, analogous to the above.
To actually compute consequences of decisions and do all the relevant counterfactual surgery, ideally (ignoring “minor” issues like computability), one iterates over all possible algorithms one might be. … lets one consider the possibility of ones own choice being decoupled from what the model of their choice would predict, given that the initial model is correct, but while they are actually considering the decision, a hardware error or whatever causes the agent to be/implement A2 while the model of them is instead properly implementing A1.
The model I made for you captures all of this, so I don’t see why it’s something TDT has any difficulty representing.
Omega knows your innards. Omega knows what algorithm you’re trying to implement. Omega knows something about how hardware issues lead to what failure modes. So yes, there remains a chance Omega will guess wrong (under your restrictive assumptions about Omega), but this is fully represented by the model.
Also (still in my model), the agent, when computing a “would”, looks at its choice as being what algorithm it will attempt to implement. It sees that there is room for the possibility of its intended algorithm not being the algorithm that actually gets implemented. It estimates what kinds of effects turn what intended algorithms into bad algorithms and therefore has reasons to pick algorithms that are unlikely to be turned into bad ones.
For a more concrete example of this kind of agent reasoning, refer back to what EY does in this post. He points out that we (including him) run on corrupted hardware (“innards” in my model). Therefore, the kind of corruption that his innards have, given his desired payoffs, justifies rejecting such target algorithms as “cheat when it will benefit the tribe on net”, reasoning that that algorithm will likely degrade (via the causal effect of innards) into the actual algorithm of “cheat when it benefits me personally”. To avoid this, he picks an algorithm harder to corrupt, like “for the good of the tribe, don’t cheat, even if it benefits the tribe”, which will, most likely, degrade into “don’t cheat to benefit yourself at the expense of the tribe”, something consistent with his values.
All of this is describable in TDT and represented by my model.
I think I may be misunderstanding your model, but, well, here’s an example of where I think yours (ie, just using the built in error terms) would fail worse than mine:
Imagine that in addition to you, there’re, say, a thousand systems that are somewhat explicitly dependent on algorithm A1 (or try to be) and another thousand that are explicitly dependent on A2 (or try to be), either through directly implementing, or modeling, or...
If you are A1, then your decision will be linked to the first group and less so to the second group… and if you are A2, then the other way around. Just using error terms would weaken all the couplings without noticing that if one is A2, while one is no longer coupled to the first group, they are to the second.
Does that make sense?
And again, I know that error correction and so on can and should be used to ensure lower probability of “algorithm you’re trying to implement not being what you actually are implementing”, but right now I’m just focusing on “how can we represent that sort of situation?”
I may be misunderstanding your solution to the problem, though.
I’m going to wait for at least one person other than you or me to join this discussion before saying anything further, just as a “sanity check” and to see what kind of miscommunication might be going on.
I’ve followed along. But I’ve been hesitant to join on because it seemed to me that this question was being raised to a meta-level that it didn’t necessarily deserve.
In the grandparent, for example, why can I not model my uncertainty about how the other agents will behave using the same general mechanism I use for everything else I’m uncertain about? It’s not all that special, at least for these couple of examples. (Of course the more general question of failure detection and mitigation, completely independent of any explicitly dependant mind reading demigods or clones is another matter but doesn’t seem to be what the conversation is about...)
As for a sanity check, such as I can offer: The grandparent seems correct in stating that Silas’s graph doesn’t handle the problem described in the grandparent. Just because it is a slightly different problem. With the grandparent’s problem it seems to be the agent’s knowledge of likely hardware failure modes that is important rather than Omega’s.
As for a sanity check, such as I can offer: The grandparent seems correct in stating that Silas’s graph doesn’t handle the problem described in the grandparent. Just because it is a slightly different problem. With the grandparent’s problem it seems to be the agent’s knowledge of likely hardware failure modes that is important rather than Omega’s
Well, Psy-Kosh had been repeatedly bringing up that Omega has to account for how something might happen between me choosing an algorithm, and the algorithm I actually implement, because of cosmic rays and whatnot, so I thought that one was more important.
However, I think the “innards” node already contains one’s knowledge about what kinds of things could go wrong. If I’m wrong, add that as a parent to the boxed node. the link is clipped when you compute the “would” anyway.
OOOOOOH! I think I see (part of, but not all) of the misunderstanding here. I wasn’t talking about how Omega can take this into account, I was talking about how the agent Omega is playing games with would take this into account.
ie, not how Omega deals with the problem, but how I would.
Problems involving Omega probably aren’t useful examples for demonstrating your problem either way since Omega will accurately predict our actions either way and our identity angst is irrelevant.
I’d like to see an instantiation of the type of problem you mentioned above, involving the many explicitly dependant systems. Something involving a box to pick or a bet to take. Right now the requirements of the model are not defined much beyond ‘apply standard decision theory with included mechanism for handling uncertainty at such time as the problem becomes available’.
In the discussion, I had proposed this as a causal net that captures all of your concerns, and I still don’t see why it doesn’t. Explanation
First of all, I will remind you that all nodes on a Bayesian causal net implicitly have a lone parent (disconnected from all other such parents) that represents uncertainty, which you explicitly represent in your model as “messy details of physical reality” and “more [i.e. independent] messy details of physical reality”.
The model I made for you captures all of this, so I don’t see why it’s something TDT has any difficulty representing.
Omega knows your innards. Omega knows what algorithm you’re trying to implement. Omega knows something about how hardware issues lead to what failure modes. So yes, there remains a chance Omega will guess wrong (under your restrictive assumptions about Omega), but this is fully represented by the model.
Also (still in my model), the agent, when computing a “would”, looks at its choice as being what algorithm it will attempt to implement. It sees that there is room for the possibility of its intended algorithm not being the algorithm that actually gets implemented. It estimates what kinds of effects turn what intended algorithms into bad algorithms and therefore has reasons to pick algorithms that are unlikely to be turned into bad ones.
For a more concrete example of this kind of agent reasoning, refer back to what EY does in this post. He points out that we (including him) run on corrupted hardware (“innards” in my model). Therefore, the kind of corruption that his innards have, given his desired payoffs, justifies rejecting such target algorithms as “cheat when it will benefit the tribe on net”, reasoning that that algorithm will likely degrade (via the causal effect of innards) into the actual algorithm of “cheat when it benefits me personally”. To avoid this, he picks an algorithm harder to corrupt, like “for the good of the tribe, don’t cheat, even if it benefits the tribe”, which will, most likely, degrade into “don’t cheat to benefit yourself at the expense of the tribe”, something consistent with his values.
All of this is describable in TDT and represented by my model.
I think I may be misunderstanding your model, but, well, here’s an example of where I think yours (ie, just using the built in error terms) would fail worse than mine:
Imagine that in addition to you, there’re, say, a thousand systems that are somewhat explicitly dependent on algorithm A1 (or try to be) and another thousand that are explicitly dependent on A2 (or try to be), either through directly implementing, or modeling, or...
If you are A1, then your decision will be linked to the first group and less so to the second group… and if you are A2, then the other way around. Just using error terms would weaken all the couplings without noticing that if one is A2, while one is no longer coupled to the first group, they are to the second.
Does that make sense?
And again, I know that error correction and so on can and should be used to ensure lower probability of “algorithm you’re trying to implement not being what you actually are implementing”, but right now I’m just focusing on “how can we represent that sort of situation?”
I may be misunderstanding your solution to the problem, though.
I’m going to wait for at least one person other than you or me to join this discussion before saying anything further, just as a “sanity check” and to see what kind of miscommunication might be going on.
Fair enough
I’ve followed along. But I’ve been hesitant to join on because it seemed to me that this question was being raised to a meta-level that it didn’t necessarily deserve.
In the grandparent, for example, why can I not model my uncertainty about how the other agents will behave using the same general mechanism I use for everything else I’m uncertain about? It’s not all that special, at least for these couple of examples. (Of course the more general question of failure detection and mitigation, completely independent of any explicitly dependant mind reading demigods or clones is another matter but doesn’t seem to be what the conversation is about...)
As for a sanity check, such as I can offer: The grandparent seems correct in stating that Silas’s graph doesn’t handle the problem described in the grandparent. Just because it is a slightly different problem. With the grandparent’s problem it seems to be the agent’s knowledge of likely hardware failure modes that is important rather than Omega’s.
Well, Psy-Kosh had been repeatedly bringing up that Omega has to account for how something might happen between me choosing an algorithm, and the algorithm I actually implement, because of cosmic rays and whatnot, so I thought that one was more important.
However, I think the “innards” node already contains one’s knowledge about what kinds of things could go wrong. If I’m wrong, add that as a parent to the boxed node. the link is clipped when you compute the “would” anyway.
OOOOOOH! I think I see (part of, but not all) of the misunderstanding here. I wasn’t talking about how Omega can take this into account, I was talking about how the agent Omega is playing games with would take this into account.
ie, not how Omega deals with the problem, but how I would.
Problems involving Omega probably aren’t useful examples for demonstrating your problem either way since Omega will accurately predict our actions either way and our identity angst is irrelevant.
I’d like to see an instantiation of the type of problem you mentioned above, involving the many explicitly dependant systems. Something involving a box to pick or a bet to take. Right now the requirements of the model are not defined much beyond ‘apply standard decision theory with included mechanism for handling uncertainty at such time as the problem becomes available’.
So? The graph still handles that.