...if it computes the expected utility of “provably modify myself to start a war against UDT-AI unless it gives me 99% of its resources” it might possibly get a low value (not sure because UDT isn’t fully specified), because the UDT-AI, when choosing what to do when faced with this kind of threat, would take into account the logical correlation between its decision and the alien AI’s prediction of its decision.
Well, that’s plausible. I’ll have to work through some UDT examples to understand fully.
What model do you have of how entity X can prove to entity Y that X is running specific source code?
The proof that I can imagine is entity Y gives some secure hardware Z to X, and then X allows Z to observe the process of X self-modifying to run the specified source code, and then X gives the secure hardware back to Y. Both X and Y can observe the creation of Z, so Y can know that it’s secure and X can know that it’s a passive observer rather than a bomb or something.
This model breaks the scenario, since a CDT playing the role of Y could self-modify any time before it hands over Z and play the game competently.
Now, if there’s some way for X to create proofs of X’s source code that will be convincing to Y without giving advance notice to Y, I can imagine a problem for Y here. Does anyone know how to do that?
(I acknowledge that if nobody knows how to do that, that means we don’t know how to do that, not that it can’t be done.)
Hmm, this explains my aversion to knowing the details of what other people are thinking. It can put me at a disadvantage in negotiations unless I am able to lie convincingly and say I do not know.
I think I″ll stop here for now, because you already seem intrigued enough to want to learn about UDT in detail. I’m guessing that once you do, you won’t be so motivated to think up reasons why CDT isn’t really so bad. :) Let me know if that turns out not to be the case though.
What model do you have of how entity X can prove to entity Y that X is running specific source code?
On second thought, I should answer this question because it’s of independent interest. If Y is sufficiently powerful, it may be able to deduce the laws of physics and the initial conditions of the universe, and then obtain X’s source code by simulating the universe up to when X is created. Note that Y may do this not because it wants to know X’s source code in some anthropomorphic sense, but simply due to how its decision-making algorithm works.
If Y is sufficiently powerful, it may be able to deduce the laws of physics and the initial conditions of the universe, and then obtain X’s source code by simulating the universe up to when X is created.
Unless there have been some specific assumptions made about the universe that will not work. Simulating the entire universe does not tell Y which part of the universe it inhabits. It will give Y a set of possible parts of the universe which match Y’s observations. While the simulation strategy will allow the best possible prediction about what X’s source code is given what Y already knows it does not give evidence to Y that it didn’t already have.
You’re right, the model assumes that we live in a universe such that superintelligent AIs would “naturally” have enough evidence to infer the source code of other AIs. (That seems quite plausible, although by no means certain, to me.) Also, since this is a thread about the relative merits of CDT, I should point out that there are some games in which CDT seems to win relative to TDT or UDT, which is a puzzle that is still open.
Also, since this is a thread about the relative merits of CDT, I should point out that there are some games in which CDT seems to win relative to TDT or UDT, which is a puzzle that is still open.
It’s an interesting problem, but my impression when reading was somewhat similar to that of Eliezer in the replies. At the core it is the question of “How do you deal with constructs made by other agents?” I don’t think TDT has any particular weakness there.
If Y is sufficiently powerful, it may be able to deduce the laws of physics and the initial conditions of the universe, and then obtain X’s source code by simulating the universe up to when X is created.
Quantum mechanics seems to be pretty clear that true random number generators are available, and probably happen naturally. I don’t understand why you consider that scenario probable enough to be worth talking about.
Well, that’s plausible. I’ll have to work through some UDT examples to understand fully.
What model do you have of how entity X can prove to entity Y that X is running specific source code?
The proof that I can imagine is entity Y gives some secure hardware Z to X, and then X allows Z to observe the process of X self-modifying to run the specified source code, and then X gives the secure hardware back to Y. Both X and Y can observe the creation of Z, so Y can know that it’s secure and X can know that it’s a passive observer rather than a bomb or something.
This model breaks the scenario, since a CDT playing the role of Y could self-modify any time before it hands over Z and play the game competently.
Now, if there’s some way for X to create proofs of X’s source code that will be convincing to Y without giving advance notice to Y, I can imagine a problem for Y here. Does anyone know how to do that?
(I acknowledge that if nobody knows how to do that, that means we don’t know how to do that, not that it can’t be done.)
Hmm, this explains my aversion to knowing the details of what other people are thinking. It can put me at a disadvantage in negotiations unless I am able to lie convincingly and say I do not know.
I think I″ll stop here for now, because you already seem intrigued enough to want to learn about UDT in detail. I’m guessing that once you do, you won’t be so motivated to think up reasons why CDT isn’t really so bad. :) Let me know if that turns out not to be the case though.
On second thought, I should answer this question because it’s of independent interest. If Y is sufficiently powerful, it may be able to deduce the laws of physics and the initial conditions of the universe, and then obtain X’s source code by simulating the universe up to when X is created. Note that Y may do this not because it wants to know X’s source code in some anthropomorphic sense, but simply due to how its decision-making algorithm works.
Unless there have been some specific assumptions made about the universe that will not work. Simulating the entire universe does not tell Y which part of the universe it inhabits. It will give Y a set of possible parts of the universe which match Y’s observations. While the simulation strategy will allow the best possible prediction about what X’s source code is given what Y already knows it does not give evidence to Y that it didn’t already have.
You’re right, the model assumes that we live in a universe such that superintelligent AIs would “naturally” have enough evidence to infer the source code of other AIs. (That seems quite plausible, although by no means certain, to me.) Also, since this is a thread about the relative merits of CDT, I should point out that there are some games in which CDT seems to win relative to TDT or UDT, which is a puzzle that is still open.
It’s an interesting problem, but my impression when reading was somewhat similar to that of Eliezer in the replies. At the core it is the question of “How do you deal with constructs made by other agents?” I don’t think TDT has any particular weakness there.
Quantum mechanics seems to be pretty clear that true random number generators are available, and probably happen naturally. I don’t understand why you consider that scenario probable enough to be worth talking about.