Have you considered that you may be spending a lot of time writing up a problem that has already been solved, and should spend a bit more time checking whether this is the case, before going much further on your path?
Yes! If UDT solves this problem, that’s extremely good news. I mention the possibility here. Unfortunately, I (and several others) don’t understand UDT well enough to tease out all the pros and cons of this approach. It might take a workshop to build a full consensus about whether it solves the problem, as opposed to just reframing it in new terms. (And, if it’s a reframing, how much it deepens our understanding.)
Part of the goal of this sequence is to put introductory material about this problem in a single place, to get new workshop attendees and LWers on the same page faster. A lot of people are already familiar with these problems and have made important progress on them, but the opening moves are still scattered about in blog comments, private e-mails, wiki pages, etc.
It would be very valuable to pin down concrete examples of how UDT agents behave better than AIXI. (That may be easier after my next post, which goes into more detail about how and why AIXI misbehaves.) Even people who aren’t completely on board with UDT itself should be very excited about the possibility of showing that AIXI not only runs into a problem, but runs into a formally solvable problem. That makes for a much stronger case.
But these seem to be two instances of the same general problem, and it seems like an AGI problem rather than an FAI problem—if you don’t know how to do this, then you can’t use math to make predictions about physical systems, which makes it hard to be generally intelligent.
Goal stability looks like an ‘AGI problem’ in the sense that nearly all superintelligences converge on stable goals, but in practice it’s an FAI problem because a UFAI’s method of becoming stable is probably very different from an FAI’s method of being stable. Naturalized induction is an FAI problem in the same way; it would get solved by an UFAI, but that doesn’t help us (especially since the UFAI’s methods, even if we knew them, might not generalize well to clean, transparent architectures).
Yes! If UDT solves this problem, that’s extremely good news. I mention the possibility here. Unfortunately, I (and several others) don’t understand UDT well enough to tease out all the pros and cons of this approach. It might take a workshop to build a full consensus about whether it solves the problem, as opposed to just reframing it in new terms. (And, if it’s a reframing, how much it deepens our understanding.)
Do you have any specific questions about UDT that I can help answer? MIRI has held two decision theory workshops that I attended, and AFAIK nobody had a lot of difficulty understanding UDT, or thought that the UDT approach would have trouble with the kind of problem that you are describing in this sequence. It doesn’t seem very likely to me that someone would hold another workshop specifically to answer whether UDT handles this problem correctly, so I think our best bet is to just hash it out in this forum. (If we run into a lot of trouble communicating, we can always try something else at that point.)
(If you want to do this after your next post, then go ahead, but again it seems like you may be putting a lot of time and effort into writing this sequence, whereas if you spent a bit more time on UDT first, maybe you’d go “ok, this looks like a solved problem, let’s move on at least for now.” It’s not like there’s a shortage of other interesting and important problems to work on or introduce to people.)
Part of the goal of this sequence is to put introductory material about this problem in a single place, to get new workshop attendees and LWers on the same page faster.
I guess part of what’s making me think “you seem to be spending too much time on this” is that the problems/defects you’re describing with the AIXI approach here seem really obvious (at least in comparison to some other FAI-related problems), such that if somebody couldn’t see them right away or understand them in a few paragraphs, I think it’s pretty unlikely that they’d be able to contribute much to the kinds of problems that I’m interested in now.
AFAIK nobody had a lot of difficulty understanding UDT, or thought that the UDT approach would have trouble with the kind of problem that you are describing in this sequence.
For what it’s worth, I had a similar impression before, but now I suspect that either Eliezer doesn’t understand how UDT deals with that problem, or he has some objection that I don’t understand. That may or may not have something to do with his insistence on using causal models, which I also don’t understand.
I think I can explain why we might expect an UDT agent to avoid these problems. You’re probably already familiar with the argument at this level, but I haven’t seen it written up anywhere yet.
First, we’ll describe (informally) an UDT agent as a mathematical object. The preferences of the agent are built in (so no reward channel, which allows us to avoid preference solipsism). It will also have models of every possible universe, and also an understanding of its own mathematical structure. To make a decision given a certain input, it will scan each universe model for structures that will be logically dependent on its output. It will then predict what will happen in each universe for each particular output. Then, it will choose the output that maximizes its preferences.
Now let’s see why it won’t have the immortality problem. Let’s say the agent is considering an output string corresponding to an anvil experiment. After running the predictions of this in its models, it will realize that it will lose a significant amount of structure which is logically dependent on it. So unless it has very strange preferences, it will mark this outcome as low utility, and consider better options.
Similarly, the agent will also notice that some outputs correspond to having more structures which are logically dependent on it. For example, an output that built a faster version of an UDT agent would allow more things to be affected by future outputs. In other words, it would be able to self-improve.
To actually implement an UDT agent with these preferences, we just need to create something (most likely a computer programmed appropriately) that will be logically dependent on this mathematical object to a sufficiently high degree. This, of course, is the hard part, but I don’t see any reasons why a faithful implementation might suddenly have these specific problems again.
Another nice feature of UDT (which sometimes is treated as a bug) is that it is extremely flexible in how you can choose the utility function. Maybe you Just Don’t Care about worlds that don’t follow the Born probabilities—so just ignore anything that happens in such a universe in your utility function. I interpret this as meaning that UDT is a framework decision theory that could be used regardless of what the answers (or maybe just preferences) to anthropics, induction or other such things end up being.
Oh, and if anyone notices something I got wrong, or that I seem to be missing, please let me know—I want to understand UDT better :)
It will also have models of every possible universe, and also an understanding of its own mathematical structure. To make a decision given a certain input, it will scan each universe model for structures that will be logically dependent on its output. It will then predict what will happen in each universe for each particular output. Then, it will choose the output that maximizes its preferences.
Apologies if this is a stupid question—I am not an expert—but how do we know what “level of reality” to have our UDT-agent model its world-models with? That is, if we program the agent to produce and scan universe-models consisting of unsliced representations of quark and lepton configurations, what happens if we discover that quarks and leptons are composed of more elementary particles yet?
Wei Dai has suggested that the default setting for a decision theory be Tegmark’s Level 4 Multiverse—where all mathematical structures exist in reality. So a “quark—lepton” universe and a string theory universe would both be considered among the possible universes—assuming they are consistent mathematically.
Of course, this makes it difficult to specify the utility function.
To elaborate on “the preferences of the agent are built in”, that means the agent is coded with a description of a large but fixed mathematical formula with no free variables, and wants the value of that formula to be as high as possible. That doesn’t make much sense in simple cases like “I want the value of 2+2 to be as high as possible”, but it works in more complicated cases where the formula contains instances of the agent itself, which is possible by quining.
To elaborate on why “scanning each universe model for structures that will be logically dependent on its output” doesn’t need bridging laws, let’s note that it can be viewed as theorem proving. The agent might look for easily provable theorems of the form “if my mathematical structure has a certain input-output map, then this particular universe model returns a certain value”. Or it could use some kind of approximate logical reasoning, but in any case it wouldn’t need explicit bridging laws.
Yes! If UDT solves this problem, that’s extremely good news. I mention the possibility here. Unfortunately, I (and several others) don’t understand UDT well enough to tease out all the pros and cons of this approach. It might take a workshop to build a full consensus about whether it solves the problem, as opposed to just reframing it in new terms. (And, if it’s a reframing, how much it deepens our understanding.)
Part of the goal of this sequence is to put introductory material about this problem in a single place, to get new workshop attendees and LWers on the same page faster. A lot of people are already familiar with these problems and have made important progress on them, but the opening moves are still scattered about in blog comments, private e-mails, wiki pages, etc.
It would be very valuable to pin down concrete examples of how UDT agents behave better than AIXI. (That may be easier after my next post, which goes into more detail about how and why AIXI misbehaves.) Even people who aren’t completely on board with UDT itself should be very excited about the possibility of showing that AIXI not only runs into a problem, but runs into a formally solvable problem. That makes for a much stronger case.
Goal stability looks like an ‘AGI problem’ in the sense that nearly all superintelligences converge on stable goals, but in practice it’s an FAI problem because a UFAI’s method of becoming stable is probably very different from an FAI’s method of being stable. Naturalized induction is an FAI problem in the same way; it would get solved by an UFAI, but that doesn’t help us (especially since the UFAI’s methods, even if we knew them, might not generalize well to clean, transparent architectures).
Do you have any specific questions about UDT that I can help answer? MIRI has held two decision theory workshops that I attended, and AFAIK nobody had a lot of difficulty understanding UDT, or thought that the UDT approach would have trouble with the kind of problem that you are describing in this sequence. It doesn’t seem very likely to me that someone would hold another workshop specifically to answer whether UDT handles this problem correctly, so I think our best bet is to just hash it out in this forum. (If we run into a lot of trouble communicating, we can always try something else at that point.)
(If you want to do this after your next post, then go ahead, but again it seems like you may be putting a lot of time and effort into writing this sequence, whereas if you spent a bit more time on UDT first, maybe you’d go “ok, this looks like a solved problem, let’s move on at least for now.” It’s not like there’s a shortage of other interesting and important problems to work on or introduce to people.)
I guess part of what’s making me think “you seem to be spending too much time on this” is that the problems/defects you’re describing with the AIXI approach here seem really obvious (at least in comparison to some other FAI-related problems), such that if somebody couldn’t see them right away or understand them in a few paragraphs, I think it’s pretty unlikely that they’d be able to contribute much to the kinds of problems that I’m interested in now.
For what it’s worth, I had a similar impression before, but now I suspect that either Eliezer doesn’t understand how UDT deals with that problem, or he has some objection that I don’t understand. That may or may not have something to do with his insistence on using causal models, which I also don’t understand.
I think I can explain why we might expect an UDT agent to avoid these problems. You’re probably already familiar with the argument at this level, but I haven’t seen it written up anywhere yet.
First, we’ll describe (informally) an UDT agent as a mathematical object. The preferences of the agent are built in (so no reward channel, which allows us to avoid preference solipsism). It will also have models of every possible universe, and also an understanding of its own mathematical structure. To make a decision given a certain input, it will scan each universe model for structures that will be logically dependent on its output. It will then predict what will happen in each universe for each particular output. Then, it will choose the output that maximizes its preferences.
Now let’s see why it won’t have the immortality problem. Let’s say the agent is considering an output string corresponding to an anvil experiment. After running the predictions of this in its models, it will realize that it will lose a significant amount of structure which is logically dependent on it. So unless it has very strange preferences, it will mark this outcome as low utility, and consider better options.
Similarly, the agent will also notice that some outputs correspond to having more structures which are logically dependent on it. For example, an output that built a faster version of an UDT agent would allow more things to be affected by future outputs. In other words, it would be able to self-improve.
To actually implement an UDT agent with these preferences, we just need to create something (most likely a computer programmed appropriately) that will be logically dependent on this mathematical object to a sufficiently high degree. This, of course, is the hard part, but I don’t see any reasons why a faithful implementation might suddenly have these specific problems again.
Another nice feature of UDT (which sometimes is treated as a bug) is that it is extremely flexible in how you can choose the utility function. Maybe you Just Don’t Care about worlds that don’t follow the Born probabilities—so just ignore anything that happens in such a universe in your utility function. I interpret this as meaning that UDT is a framework decision theory that could be used regardless of what the answers (or maybe just preferences) to anthropics, induction or other such things end up being.
Oh, and if anyone notices something I got wrong, or that I seem to be missing, please let me know—I want to understand UDT better :)
Apologies if this is a stupid question—I am not an expert—but how do we know what “level of reality” to have our UDT-agent model its world-models with? That is, if we program the agent to produce and scan universe-models consisting of unsliced representations of quark and lepton configurations, what happens if we discover that quarks and leptons are composed of more elementary particles yet?
Wei Dai has suggested that the default setting for a decision theory be Tegmark’s Level 4 Multiverse—where all mathematical structures exist in reality. So a “quark—lepton” universe and a string theory universe would both be considered among the possible universes—assuming they are consistent mathematically.
Of course, this makes it difficult to specify the utility function.
Yeah, your explanation sounds right.
To elaborate on “the preferences of the agent are built in”, that means the agent is coded with a description of a large but fixed mathematical formula with no free variables, and wants the value of that formula to be as high as possible. That doesn’t make much sense in simple cases like “I want the value of 2+2 to be as high as possible”, but it works in more complicated cases where the formula contains instances of the agent itself, which is possible by quining.
To elaborate on why “scanning each universe model for structures that will be logically dependent on its output” doesn’t need bridging laws, let’s note that it can be viewed as theorem proving. The agent might look for easily provable theorems of the form “if my mathematical structure has a certain input-output map, then this particular universe model returns a certain value”. Or it could use some kind of approximate logical reasoning, but in any case it wouldn’t need explicit bridging laws.