UDT doesn’t search the environment for copies of the agent, it merely accepts a problem statement where multiple locations of the agent are explicitly stated. Thus, if you don’t explicitly tell UDT that those other agents are following the same decision-making process as you do, it won’t notice, even if the other agents all have source code that is equal to yours.
So ‘my version’ of UDT is perhaps brushing over the distinction between “de facto copies of the agent that were not explicitly labelled as such in the problem statement” and “places where a superbeing or telepathic robot (i.e. Omega) is simulating the agent”?
The former would be subroutines of the world-program different from S but with the same source code as S, whereas the latter would be things of the form “Omega_predict(S, argument)”? (And a ‘location of the agent explicitly defined as such’ would just be a place where S itself is called?)
That could be quite important...
So I wonder how all this affects decision-making. If you have an alternate version of Newcomb’s paradox where rather than OmegaPredict(S) we have OmegaPredict(T) for some T with the same source code as S, does UDT two-box?
Also, how does it square with the idea that part of what it means for an agent to be following UDT is that it has a faculty of ‘mathematical intuition’ by which it computes the probabilities of possible execution histories (based on the premise that its own output is Y)? Is it unreasonable to suppose that ‘mathematical intuition’ extends as far as noticing when two programs have the same source code?
Since UDT receives environment parametrized by the source code, there is no way to tell what agent’s source code is, and so there is no way of stating that environment contains another instance of agent’s source code or of a program that does the same thing as agent’s program, apart from giving the explicit dependence already. Explicit parametrization here implies absence of information about the parameter. UDT is in a strange situation of having to compute its own source code, when, philosophically, that doesn’t make sense. (And it also doesn’t know its own source code when, in principle, it’s not a big deal.)
So the question of whether UDT is able to work with slightly different source code passed to Omega, or the same source code labeled differently, is not in the domain of UDT, it is something decided “manually” before the formal problem statement is given to UDT.
[I’m writing this from a hotel room in Leshan, China, as part of a 10-day 7-city self-guided tour, which may help explain my relative lack of participation in this discussion.]
Nesov, if by UDT you mean the version I gave in the article that AlpheNeil linked to in this post (which for clarity I prefer to call UDT1), it was intended that the agent knows its own source code. It doesn’t explicitly look for copies of itself in the environment, but is supposed to implicitly handle other copies of itself (or predictions of itself, or generally, other agents/objects that are logically related to itself in some way). The way it does so apparently has problems that I don’t know how to solve at this point, but it was never intended that locations of the agent are explicitly provided to the agent.
I may have failed to convey this because whenever I write out a world program for UDT1, I always use “S” to represent the agent, but S is supposed stand for the actual source code of the agent (i.e., a concrete implementation of UDT1), not a special symbol that means “a copy of the agent”. And S is supposed to know its own source code via a quining-type trick.
(I’m hoping this is enough to get you and others to re-read the original post in a new light and understand what I was trying to get at. If not, I’ll try to clarify more at a later time.)
And S is supposed to know its own source code via a quining-type trick.
This phrase got me thinking in another completely irrelevant direction. If you know your own source code by quining, how do you know that it’s really your source code? How does one verify such things?
Here’s a possibly more relevant variant of the question: we human beings don’t have access to our own source code via quining, so how are we supposed to make decisions?
My thoughts on this so far are that we need to develop a method of mapping an external description of an mathematical object to what it feels like from the inside. Then we can say that the consequences of “me choosing option A” is the logical consequences of all objects with the same subjective experiences/memories as me choosing option A.
I think the quining trick may just be a stopgap solution, and the full solution even for AIs will need to involve something like the above. That’s one possibility that I’m thinking about.
I guess the ability to find oneself in the environment depends on no strange things happening in the environments you care about (which are “probable”), so that you can simply pattern-match the things that qualify as you, starting with an extremely simple idea of what “you” is, in some sense inbuilt is our minds by evolution. But in general, if there are tons of almost-you running around, you need the exact specification to figure out which of them you actually control.
This is basically the same idea I want to use to automatically extract human preference, even though strictly speaking one already needs full preference to recognize its instance.
Then we can say that the consequences of “me choosing option A” is the logical consequences of all objects with the same subjective experiences/memories as me choosing option A.
Not exactly… Me and Bob may have identical experiences/memories, but I have a hidden rootkit installed that makes me defect, and Bob doesn’t.
Maybe it would make more sense to inspect the line of reasoning that leads to my defection, and ask a) would this line of reasoning likely occur to Bob? b) would he find it overwhelmingly convincing? This is kinda like quining, because the line of reasoning must refer to a copy of itself in Bob’s mind.
This just considers the “line of reasoning” as a whole agent (part of the original agent, but not controlled by the original agent), and again assumes perfect self-knowledge by that “line of reasoning” sub-agent (supplied by the bigger agent, perhaps, but taken on faith by the sub-agent).
What is the “you” that is supposed to verify that? It’s certainly possible if “you” already have your source code via the quine trick, so that you just compare it with the one given to you. On the other hand, if “you” are a trivial program that is not able to do that and answers “yes, it’s my source code all right” unconditionally, there is nothing to be done about that. You have to assume something about the agent.
“What is the you” is part of the question. Consider it in terms of counterfactuals. Agent A is told via quining that it has source code S. We’re interested in how to implement A so that it outputs “yes” if S is really its source code, but would output “no” if S were changed to S’ while leaving the rest of the agent unchanged.
In this formulation the problem seems to be impossible to solve, unless the agent has access to an external “reader oracle” that can just read its source code back to it. Guess that answers my original question, then.
I understand now. So UDT is secretly ambient control, expressed a notch less formally (without the concept of ambient dependence). It is specifically the toy examples you considered that take the form of what I described as “explicit updateless control”, where world-programs are given essentially parametrized by agent’s source code (or, agent’s decisions), and I mistook this imprecise interpretation of the toy examples for the whole picture. The search for the points from which the agent controls the world in UDT is essentially part of “mathematical intuition” module, so AlephNeil got that right, where I failed.
UDT doesn’t search the environment for copies of the agent, it merely accepts a problem statement where multiple locations of the agent are explicitly stated. Thus, if you don’t explicitly tell UDT that those other agents are following the same decision-making process as you do, it won’t notice, even if the other agents all have source code that is equal to yours.
Edit: This is not quite right. See Wei Dai’s clarification and my response.
So ‘my version’ of UDT is perhaps brushing over the distinction between “de facto copies of the agent that were not explicitly labelled as such in the problem statement” and “places where a superbeing or telepathic robot (i.e. Omega) is simulating the agent”?
The former would be subroutines of the world-program different from S but with the same source code as S, whereas the latter would be things of the form “Omega_predict(S, argument)”? (And a ‘location of the agent explicitly defined as such’ would just be a place where S itself is called?)
That could be quite important...
So I wonder how all this affects decision-making. If you have an alternate version of Newcomb’s paradox where rather than OmegaPredict(S) we have OmegaPredict(T) for some T with the same source code as S, does UDT two-box?
Also, how does it square with the idea that part of what it means for an agent to be following UDT is that it has a faculty of ‘mathematical intuition’ by which it computes the probabilities of possible execution histories (based on the premise that its own output is Y)? Is it unreasonable to suppose that ‘mathematical intuition’ extends as far as noticing when two programs have the same source code?
You are right. See Wei Dai’s clarification and my response.
Since UDT receives environment parametrized by the source code, there is no way to tell what agent’s source code is, and so there is no way of stating that environment contains another instance of agent’s source code or of a program that does the same thing as agent’s program, apart from giving the explicit dependence already. Explicit parametrization here implies absence of information about the parameter. UDT is in a strange situation of having to compute its own source code, when, philosophically, that doesn’t make sense. (And it also doesn’t know its own source code when, in principle, it’s not a big deal.)
So the question of whether UDT is able to work with slightly different source code passed to Omega, or the same source code labeled differently, is not in the domain of UDT, it is something decided “manually” before the formal problem statement is given to UDT.
Edit: This is not quite right. See Wei Dai’s clarification and my response.
[I’m writing this from a hotel room in Leshan, China, as part of a 10-day 7-city self-guided tour, which may help explain my relative lack of participation in this discussion.]
Nesov, if by UDT you mean the version I gave in the article that AlpheNeil linked to in this post (which for clarity I prefer to call UDT1), it was intended that the agent knows its own source code. It doesn’t explicitly look for copies of itself in the environment, but is supposed to implicitly handle other copies of itself (or predictions of itself, or generally, other agents/objects that are logically related to itself in some way). The way it does so apparently has problems that I don’t know how to solve at this point, but it was never intended that locations of the agent are explicitly provided to the agent.
I may have failed to convey this because whenever I write out a world program for UDT1, I always use “S” to represent the agent, but S is supposed stand for the actual source code of the agent (i.e., a concrete implementation of UDT1), not a special symbol that means “a copy of the agent”. And S is supposed to know its own source code via a quining-type trick.
(I’m hoping this is enough to get you and others to re-read the original post in a new light and understand what I was trying to get at. If not, I’ll try to clarify more at a later time.)
This phrase got me thinking in another completely irrelevant direction. If you know your own source code by quining, how do you know that it’s really your source code? How does one verify such things?
Here’s a possibly more relevant variant of the question: we human beings don’t have access to our own source code via quining, so how are we supposed to make decisions?
My thoughts on this so far are that we need to develop a method of mapping an external description of an mathematical object to what it feels like from the inside. Then we can say that the consequences of “me choosing option A” is the logical consequences of all objects with the same subjective experiences/memories as me choosing option A.
I think the quining trick may just be a stopgap solution, and the full solution even for AIs will need to involve something like the above. That’s one possibility that I’m thinking about.
I guess the ability to find oneself in the environment depends on no strange things happening in the environments you care about (which are “probable”), so that you can simply pattern-match the things that qualify as you, starting with an extremely simple idea of what “you” is, in some sense inbuilt is our minds by evolution. But in general, if there are tons of almost-you running around, you need the exact specification to figure out which of them you actually control.
This is basically the same idea I want to use to automatically extract human preference, even though strictly speaking one already needs full preference to recognize its instance.
Not exactly… Me and Bob may have identical experiences/memories, but I have a hidden rootkit installed that makes me defect, and Bob doesn’t.
Maybe it would make more sense to inspect the line of reasoning that leads to my defection, and ask a) would this line of reasoning likely occur to Bob? b) would he find it overwhelmingly convincing? This is kinda like quining, because the line of reasoning must refer to a copy of itself in Bob’s mind.
This just considers the “line of reasoning” as a whole agent (part of the original agent, but not controlled by the original agent), and again assumes perfect self-knowledge by that “line of reasoning” sub-agent (supplied by the bigger agent, perhaps, but taken on faith by the sub-agent).
What is the “you” that is supposed to verify that? It’s certainly possible if “you” already have your source code via the quine trick, so that you just compare it with the one given to you. On the other hand, if “you” are a trivial program that is not able to do that and answers “yes, it’s my source code all right” unconditionally, there is nothing to be done about that. You have to assume something about the agent.
“What is the you” is part of the question. Consider it in terms of counterfactuals. Agent A is told via quining that it has source code S. We’re interested in how to implement A so that it outputs “yes” if S is really its source code, but would output “no” if S were changed to S’ while leaving the rest of the agent unchanged.
In this formulation the problem seems to be impossible to solve, unless the agent has access to an external “reader oracle” that can just read its source code back to it. Guess that answers my original question, then.
I understand now. So UDT is secretly ambient control, expressed a notch less formally (without the concept of ambient dependence). It is specifically the toy examples you considered that take the form of what I described as “explicit updateless control”, where world-programs are given essentially parametrized by agent’s source code (or, agent’s decisions), and I mistook this imprecise interpretation of the toy examples for the whole picture. The search for the points from which the agent controls the world in UDT is essentially part of “mathematical intuition” module, so AlephNeil got that right, where I failed.