Have you considered that you may be spending a lot of time writing up a problem that has already been solved, and should spend a bit more time checking whether this is the case, before going much further on your path? There was a previous thread about this, but I’ll try to explain again from a slightly different angle.
The idea is that logical facts in general have consequences on what we intuitively think of as “physical objects”. For example, from Fermat’s Last Theorem you can predict that no physical computer that searches for counterexamples to a^n+b^n != c^n will succeed for n>2. Since decisions are logical facts (they are facts about what some decision algorithm outputs), they too have such consequences, which (as suggested in UDT) we can use to make decisions.
In practice we have uncertainty about whether some physical computer really is searching for counterexamples to a^n+b^n != c^n, or whether some physical system really embodies a certain decision algorithm, and need to know how to handle such uncertainty. But these seem to be two instances of the same general problem, and it seems like an AGI problem rather than an FAI problem—if you don’t know how to do this, then you can’t use math to make predictions about physical systems, which makes it hard to be generally intelligent.
So suppose you suspect that a certain set of universes that you care about contains implementations/embodiments of your decision algorithm, and you have some general way of handling uncertainty about this, then you can make decisions by asking questions of the form “suppose I (my decision algorithm) were to output X on input Y, what would be the consequences of this decision on these universes”. The upshot is that It doesn’t seem like you need bridging hypotheses that are specific to agents and their experiences.
UDT may indeed be an aspect of bridging laws. The reason I’m not willing to call it a full solution is as follows:
1) Actually, the current version of UDT that I write down as an equation involves maximizing over maps from sensory sequences to actions. If there’s a version of UDT that maximizes over something else, let me know.
2) We could say that it ought to be obvious to the math intuition module that choosing a map R := S->A ought to logically imply that R^ = S^->A for simple isomorphisms over sensory experience for isomorphic reductive hypotheses, thereby eliminating a possible degree of freedom in the bridging laws. I agree in principle. We don’t actually have that math intuition module. This is a problem with all logical decision theories, yes, but that is a problem.
3) Aspects of the problem like “What prior space of universes?” aren’t solved by saying “UDT”. Nor, “How exactly do you identify processes computationally isomorphic to yourself inside that universe?” Nor, “How do you manipulate a map which is smaller than the territory where you don’t reason about objects by simulating out the actual atoms?” Nor very much of, “How do I modify myself given that I’m made of parts?”
There’s an aspect of UDT that plausibly answers one particular aspect of “How do we do naturalized induction?”, especially a particular aspect of how we write bridging laws, and that’s exciting, but it doesn’t answer what I think of as the entire problem, including the problem of the prior over universes, multilevel reasoning about physical laws and high-level objects, the self-referential aspects of the reasoning, updating in cases where there’s no predetermined Cartesian boundary of what constitutes the senses, etc.
This is a problem with all logical decision theories, yes, but that is a problem.
The way I think about it, if we can reduce one FAI problem to another FAI or AGI problem, which we know has to be solved anyway, that counts as solving the former problem (modulo the possibility of being wrong about the reduction, or being wrong about the necessity of solving the latter problem).
multilevel reasoning about physical laws and high-level objects
Also agreed, but I think it’s plausible that the solution to this could just fall out of a principled approach to the problem of logical uncertainty.
the self-referential aspects of the reasoning
Same with this one.
updating in cases where there’s no predetermined Cartesian boundary of what constitutes the senses
I don’t understand why you think it’s a problem in UDT. A UDT-agent would have some sort of sensory pre-processor which encodes its sensory data into an arbitrary digital format and then feed that into UDT. UDT would compute an optimal input/output map, apply that map to its current input, then send the output to its actuators. Does this count as having a “predetermined Cartesian boundary of what constitutes the senses”? Why do we need to handle cases where there is no such boundary?
Overall, I guess I was interpreting RobbBB’s sequence of posts as describing a narrower problem than your “naturalized induction”. If we include all the problems on your list though, doesn’t solving “naturalized induction” get us most of the way to being able to build an AGI already?
The way I think about it, if we can reduce one FAI problem to another FAI or AGI problem, which we know has to be solved anyway, that counts as solving the former problem (modulo the possibility of being wrong about the reduction, or being wrong about the necessity of solving the latter problem).
This is not how I use the term “solved”, also the gist of my reply was that possibly one aspect of one aspect of a large problem had been reduced to an unsolved problem in UDT.
multilevel reasoning about physical laws and high-level objects
Also agreed, but I think it’s plausible that the solution to this could just fall out of a principled approach to the problem of logical uncertainty.
Thaaat sounds slightly suspicious to me. I mean it sounds a bit like expecting a solution to the One True Prior to fall out of the development of a principled probability theory, or like expecting a solution to AGI to fall out of a principled approach to causal models. I would expect a principled approach to logical uncertainty to look like the core of probability theory itself, with a lot left to be filled in to make an actual epistemic model. I would also think it plausible that a principled version of logical uncertainty would resemble probability theory in that it would still be too expensive to compute, and that an additional principled version of bounded logical uncertainty would be needed on top, and then a further innovation akin to causal models or a particular prior to yield bounded logical uncertainty that looks like multi-level maps of a single-level territory.
the self-referential aspects of the reasoning
Same with this one.
Same reply, plus specific mild skepticism relating to how current work on the Lobian obstacle hasn’t yet taken a shape that looks like it fills the logical-counterfactual symbol in UDT, plus specific stronger skepticism that it would be work on UDT qua UDT that burped out a solution to tiling agents rather than the other way around!
updating in cases where there’s no predetermined Cartesian boundary of what constitutes the senses
I don’t understand why you think it’s a problem in UDT. A UDT-agent would have some sort of sensory pre-processor which encodes its sensory data into an arbitrary digital format and then feed that into UDT. UDT would compute an optimal input/output map, apply that map to its current input, then send the output to its actuators. Does this count as having a “predetermined Cartesian boundary of what constitutes the senses”? Why do we need to handle cases where there is no such boundary?
Let’s say you add a new sensor. How do you remap? We could maybe try to reframe as a tiling problem where agents create successor agents which then have new sensors… whereupon we run into all the current usual tiling issues and Lobian obstacles. Thinking about this in a natively naturalized mode, it doesn’t seem too unnatural to me to try to adopt a bridge hypothesis to an AI that can choose to treat arbitrary events in RAM as sensory observations and condition on them. This does not seem to me to mesh as well with native thinking in UDT the way I wrote out the equation. Again, it’s possible that we could make the two mesh via tiling, assuming that tiling with UDT agents optimizing over a map where actions included building further UDT agents introduced no further open problems or free variables or anomalies into UDT. But that’s a big assumption.
And then all this is just one small aspect of building an AGI, not most of the way AFAICT.
...mild skepticism relating to how current work on the Lobian obstacle hasn’t yet taken a shape that looks like it fills the logical-counterfactual symbol in UDT...
...I mean it sounds a bit like expecting a solution to the One True Prior to fall out of the development of a principled probability theory...
I believe my new formalism circumvents the problem by avoiding strong prior sensitivity.
Same reply, plus specific mild skepticism relating to how current work on the Lobian obstacle hasn’t yet taken a shape that looks like it fills the logical-counterfactual symbol in UDT...
My proposal does look that way. I hope to publish an improved version soon which also admits logical uncertainty in the sense of being unable to know the zillionth digit of pi.
Thinking about this in a natively naturalized mode, it doesn’t seem too unnatural to me to try to adopt a bridge hypothesis to an AI that can choose to treat arbitrary events in RAM as sensory observations and condition on them.
In myformalism input channels and arbitrary events in RAM have similar status.
Minor formal note: I have a mildly negative knee-jerk when someone repeatedly links to/promotes to something referred to only as “my ___”. Giving your formalism a proper name might make you sound less gratuitously self-promotional (which I don’t think you are).
Actually I already have a name for the formalism: I call it the “updateless intelligence metric”. My intuition was that referring to my own invention by the serious-sounding name I gave it myself would sound more pompous / self-promotional than referring to it as just “my formalism”. Maybe I was wrong.
if we can reduce one FAI problem to another FAI or AGI problem, which we know has to be solved anyway, that counts as solving the former problem
Setting aside what counts as a ‘solution’, merging two problems counts as progress on the problem only when the merged version is easier to solve than the unmerged version. Or when the merged version helps us arrive at an important conceptual insight about the unmerged version. You can collapse every FAI problem into a single problem that we need to solve anyway by treating them all as components of its utility function or action policy, but it’s not clear that represents progress, and it’s very clear it doesn’t represent a solution.
I guess I was interpreting RobbBB’s sequence of posts as describing a narrower problem than your “naturalized induction”.
Naturalized induction is the problem of defining an AGI’s priors, from the angle of attack ‘how can we naturalize this?’. In other words, it’s the problem of giving the AGI a reasonable epistemology, as informed by the insight that AGIs are physical processes that don’t differ in any fundamental way from other physical processes. So it encompasses and interacts with a lot of problems.
That should be clearer in my next couple of posts on naturalized induction. I used Solomonoff induction as my entry point because it keeps the sequence grounded in the literature and in a precise formalism. (And I used AIXI because it makes the problems with Solomonoff induction, and some other Cartesian concerns, more vivid and concrete.) It’s an illustration of how and why being bad at reductionism can cripple an AGI, and a demonstration of how easy it is to neglect reductionism while specifying what you want out of an AGI. (So it’s not a straw problem, and there isn’t an obvious cure-all patch.)
I’m also going to use AIXI as an illustration for some other issues in FAI (e.g., self-representation and AGI delegability), so explaining AIXI in some detail now lays gets more people on the same page for later.
doesn’t solving “naturalized induction” get us most of the way to being able to build an AGI already?
You may not need to solve naturalized induction to build a random UFAI. To build a FAI, I believe Eliezer thinks the largest hurdle is getting a recursively self-modifying agent to have stable specifiable preferences. That may depend on the AI’s decision theory, preferences, and external verifiability, or on aspects of its epistemology that don’t have much to do with the AI’s physicality.
1) Actually, the current version of UDT that I write down as an equation involves maximizing over maps from sensory sequences to actions. If there’s a version of UDT that maximizes over something else, let me know.
2) We could say that it ought to be obvious to the math intuition module that choosing a map R := S->A ought to logically imply that R^ = S^->A for simple isomorphisms over sensory experience for isomorphic reductive hypotheses, thereby eliminating a possible degree of freedom in the bridging laws. I agree in principle. We don’t actually have that math intuition module. This is a problem with all logical decision theories, yes, but that is a problem.
Regarding an abstract solution to logical uncertainty, I think the solution given in http://lesswrong.com/lw/imz/notes_on_logical_priors_from_the_miri_workshop/ (which I use in my own post) is not bad. It still runs into the Loebian obstacle. I think I have a solution for that as well, going to write about it soon. Regarding something that can be implemented within reasonable computing resource constraints, well, see below...
3) Aspects of the problem like “What prior space of universes?” aren’t solved by saying “UDT”. Nor, “How exactly do you identify processes computationally isomorphic to yourself inside that universe?” Nor, “How do you manipulate a map which is smaller than the territory where you don’t reason about objects by simulating out the actual atoms?” Nor very much of, “How do I modify myself given that I’m made of parts?”
The prior space of universes is covered: unsurprisingly it’s the Solomonoff prior (over abstract sequences of bits representing the universe, not over sensory data). Regarding the other stuff, my formalism doesn’t give an explicit solution (since I can’t explicitly write the optimal program of given length). However, the function I suggest to maximize already takes everything into account, including restricted computing resources.
Have you considered that you may be spending a lot of time writing up a problem that has already been solved, and should spend a bit more time checking whether this is the case, before going much further on your path?
Yes! If UDT solves this problem, that’s extremely good news. I mention the possibility here. Unfortunately, I (and several others) don’t understand UDT well enough to tease out all the pros and cons of this approach. It might take a workshop to build a full consensus about whether it solves the problem, as opposed to just reframing it in new terms. (And, if it’s a reframing, how much it deepens our understanding.)
Part of the goal of this sequence is to put introductory material about this problem in a single place, to get new workshop attendees and LWers on the same page faster. A lot of people are already familiar with these problems and have made important progress on them, but the opening moves are still scattered about in blog comments, private e-mails, wiki pages, etc.
It would be very valuable to pin down concrete examples of how UDT agents behave better than AIXI. (That may be easier after my next post, which goes into more detail about how and why AIXI misbehaves.) Even people who aren’t completely on board with UDT itself should be very excited about the possibility of showing that AIXI not only runs into a problem, but runs into a formally solvable problem. That makes for a much stronger case.
But these seem to be two instances of the same general problem, and it seems like an AGI problem rather than an FAI problem—if you don’t know how to do this, then you can’t use math to make predictions about physical systems, which makes it hard to be generally intelligent.
Goal stability looks like an ‘AGI problem’ in the sense that nearly all superintelligences converge on stable goals, but in practice it’s an FAI problem because a UFAI’s method of becoming stable is probably very different from an FAI’s method of being stable. Naturalized induction is an FAI problem in the same way; it would get solved by an UFAI, but that doesn’t help us (especially since the UFAI’s methods, even if we knew them, might not generalize well to clean, transparent architectures).
Yes! If UDT solves this problem, that’s extremely good news. I mention the possibility here. Unfortunately, I (and several others) don’t understand UDT well enough to tease out all the pros and cons of this approach. It might take a workshop to build a full consensus about whether it solves the problem, as opposed to just reframing it in new terms. (And, if it’s a reframing, how much it deepens our understanding.)
Do you have any specific questions about UDT that I can help answer? MIRI has held two decision theory workshops that I attended, and AFAIK nobody had a lot of difficulty understanding UDT, or thought that the UDT approach would have trouble with the kind of problem that you are describing in this sequence. It doesn’t seem very likely to me that someone would hold another workshop specifically to answer whether UDT handles this problem correctly, so I think our best bet is to just hash it out in this forum. (If we run into a lot of trouble communicating, we can always try something else at that point.)
(If you want to do this after your next post, then go ahead, but again it seems like you may be putting a lot of time and effort into writing this sequence, whereas if you spent a bit more time on UDT first, maybe you’d go “ok, this looks like a solved problem, let’s move on at least for now.” It’s not like there’s a shortage of other interesting and important problems to work on or introduce to people.)
Part of the goal of this sequence is to put introductory material about this problem in a single place, to get new workshop attendees and LWers on the same page faster.
I guess part of what’s making me think “you seem to be spending too much time on this” is that the problems/defects you’re describing with the AIXI approach here seem really obvious (at least in comparison to some other FAI-related problems), such that if somebody couldn’t see them right away or understand them in a few paragraphs, I think it’s pretty unlikely that they’d be able to contribute much to the kinds of problems that I’m interested in now.
AFAIK nobody had a lot of difficulty understanding UDT, or thought that the UDT approach would have trouble with the kind of problem that you are describing in this sequence.
For what it’s worth, I had a similar impression before, but now I suspect that either Eliezer doesn’t understand how UDT deals with that problem, or he has some objection that I don’t understand. That may or may not have something to do with his insistence on using causal models, which I also don’t understand.
I think I can explain why we might expect an UDT agent to avoid these problems. You’re probably already familiar with the argument at this level, but I haven’t seen it written up anywhere yet.
First, we’ll describe (informally) an UDT agent as a mathematical object. The preferences of the agent are built in (so no reward channel, which allows us to avoid preference solipsism). It will also have models of every possible universe, and also an understanding of its own mathematical structure. To make a decision given a certain input, it will scan each universe model for structures that will be logically dependent on its output. It will then predict what will happen in each universe for each particular output. Then, it will choose the output that maximizes its preferences.
Now let’s see why it won’t have the immortality problem. Let’s say the agent is considering an output string corresponding to an anvil experiment. After running the predictions of this in its models, it will realize that it will lose a significant amount of structure which is logically dependent on it. So unless it has very strange preferences, it will mark this outcome as low utility, and consider better options.
Similarly, the agent will also notice that some outputs correspond to having more structures which are logically dependent on it. For example, an output that built a faster version of an UDT agent would allow more things to be affected by future outputs. In other words, it would be able to self-improve.
To actually implement an UDT agent with these preferences, we just need to create something (most likely a computer programmed appropriately) that will be logically dependent on this mathematical object to a sufficiently high degree. This, of course, is the hard part, but I don’t see any reasons why a faithful implementation might suddenly have these specific problems again.
Another nice feature of UDT (which sometimes is treated as a bug) is that it is extremely flexible in how you can choose the utility function. Maybe you Just Don’t Care about worlds that don’t follow the Born probabilities—so just ignore anything that happens in such a universe in your utility function. I interpret this as meaning that UDT is a framework decision theory that could be used regardless of what the answers (or maybe just preferences) to anthropics, induction or other such things end up being.
Oh, and if anyone notices something I got wrong, or that I seem to be missing, please let me know—I want to understand UDT better :)
It will also have models of every possible universe, and also an understanding of its own mathematical structure. To make a decision given a certain input, it will scan each universe model for structures that will be logically dependent on its output. It will then predict what will happen in each universe for each particular output. Then, it will choose the output that maximizes its preferences.
Apologies if this is a stupid question—I am not an expert—but how do we know what “level of reality” to have our UDT-agent model its world-models with? That is, if we program the agent to produce and scan universe-models consisting of unsliced representations of quark and lepton configurations, what happens if we discover that quarks and leptons are composed of more elementary particles yet?
Wei Dai has suggested that the default setting for a decision theory be Tegmark’s Level 4 Multiverse—where all mathematical structures exist in reality. So a “quark—lepton” universe and a string theory universe would both be considered among the possible universes—assuming they are consistent mathematically.
Of course, this makes it difficult to specify the utility function.
To elaborate on “the preferences of the agent are built in”, that means the agent is coded with a description of a large but fixed mathematical formula with no free variables, and wants the value of that formula to be as high as possible. That doesn’t make much sense in simple cases like “I want the value of 2+2 to be as high as possible”, but it works in more complicated cases where the formula contains instances of the agent itself, which is possible by quining.
To elaborate on why “scanning each universe model for structures that will be logically dependent on its output” doesn’t need bridging laws, let’s note that it can be viewed as theorem proving. The agent might look for easily provable theorems of the form “if my mathematical structure has a certain input-output map, then this particular universe model returns a certain value”. Or it could use some kind of approximate logical reasoning, but in any case it wouldn’t need explicit bridging laws.
Have you considered that you may be spending a lot of time writing up a problem that has already been solved, and should spend a bit more time checking whether this is the case, before going much further on your path? There was a previous thread about this, but I’ll try to explain again from a slightly different angle.
The idea is that logical facts in general have consequences on what we intuitively think of as “physical objects”. For example, from Fermat’s Last Theorem you can predict that no physical computer that searches for counterexamples to a^n+b^n != c^n will succeed for n>2. Since decisions are logical facts (they are facts about what some decision algorithm outputs), they too have such consequences, which (as suggested in UDT) we can use to make decisions.
In practice we have uncertainty about whether some physical computer really is searching for counterexamples to a^n+b^n != c^n, or whether some physical system really embodies a certain decision algorithm, and need to know how to handle such uncertainty. But these seem to be two instances of the same general problem, and it seems like an AGI problem rather than an FAI problem—if you don’t know how to do this, then you can’t use math to make predictions about physical systems, which makes it hard to be generally intelligent.
So suppose you suspect that a certain set of universes that you care about contains implementations/embodiments of your decision algorithm, and you have some general way of handling uncertainty about this, then you can make decisions by asking questions of the form “suppose I (my decision algorithm) were to output X on input Y, what would be the consequences of this decision on these universes”. The upshot is that It doesn’t seem like you need bridging hypotheses that are specific to agents and their experiences.
UDT may indeed be an aspect of bridging laws. The reason I’m not willing to call it a full solution is as follows:
1) Actually, the current version of UDT that I write down as an equation involves maximizing over maps from sensory sequences to actions. If there’s a version of UDT that maximizes over something else, let me know.
2) We could say that it ought to be obvious to the math intuition module that choosing a map R := S->A ought to logically imply that R^ = S^->A for simple isomorphisms over sensory experience for isomorphic reductive hypotheses, thereby eliminating a possible degree of freedom in the bridging laws. I agree in principle. We don’t actually have that math intuition module. This is a problem with all logical decision theories, yes, but that is a problem.
3) Aspects of the problem like “What prior space of universes?” aren’t solved by saying “UDT”. Nor, “How exactly do you identify processes computationally isomorphic to yourself inside that universe?” Nor, “How do you manipulate a map which is smaller than the territory where you don’t reason about objects by simulating out the actual atoms?” Nor very much of, “How do I modify myself given that I’m made of parts?”
There’s an aspect of UDT that plausibly answers one particular aspect of “How do we do naturalized induction?”, especially a particular aspect of how we write bridging laws, and that’s exciting, but it doesn’t answer what I think of as the entire problem, including the problem of the prior over universes, multilevel reasoning about physical laws and high-level objects, the self-referential aspects of the reasoning, updating in cases where there’s no predetermined Cartesian boundary of what constitutes the senses, etc.
The way I think about it, if we can reduce one FAI problem to another FAI or AGI problem, which we know has to be solved anyway, that counts as solving the former problem (modulo the possibility of being wrong about the reduction, or being wrong about the necessity of solving the latter problem).
Agreed that this is an unsolved problem.
Also agreed, but I think it’s plausible that the solution to this could just fall out of a principled approach to the problem of logical uncertainty.
Same with this one.
I don’t understand why you think it’s a problem in UDT. A UDT-agent would have some sort of sensory pre-processor which encodes its sensory data into an arbitrary digital format and then feed that into UDT. UDT would compute an optimal input/output map, apply that map to its current input, then send the output to its actuators. Does this count as having a “predetermined Cartesian boundary of what constitutes the senses”? Why do we need to handle cases where there is no such boundary?
Overall, I guess I was interpreting RobbBB’s sequence of posts as describing a narrower problem than your “naturalized induction”. If we include all the problems on your list though, doesn’t solving “naturalized induction” get us most of the way to being able to build an AGI already?
This is not how I use the term “solved”, also the gist of my reply was that possibly one aspect of one aspect of a large problem had been reduced to an unsolved problem in UDT.
Thaaat sounds slightly suspicious to me. I mean it sounds a bit like expecting a solution to the One True Prior to fall out of the development of a principled probability theory, or like expecting a solution to AGI to fall out of a principled approach to causal models. I would expect a principled approach to logical uncertainty to look like the core of probability theory itself, with a lot left to be filled in to make an actual epistemic model. I would also think it plausible that a principled version of logical uncertainty would resemble probability theory in that it would still be too expensive to compute, and that an additional principled version of bounded logical uncertainty would be needed on top, and then a further innovation akin to causal models or a particular prior to yield bounded logical uncertainty that looks like multi-level maps of a single-level territory.
Same reply, plus specific mild skepticism relating to how current work on the Lobian obstacle hasn’t yet taken a shape that looks like it fills the logical-counterfactual symbol in UDT, plus specific stronger skepticism that it would be work on UDT qua UDT that burped out a solution to tiling agents rather than the other way around!
Let’s say you add a new sensor. How do you remap? We could maybe try to reframe as a tiling problem where agents create successor agents which then have new sensors… whereupon we run into all the current usual tiling issues and Lobian obstacles. Thinking about this in a natively naturalized mode, it doesn’t seem too unnatural to me to try to adopt a bridge hypothesis to an AI that can choose to treat arbitrary events in RAM as sensory observations and condition on them. This does not seem to me to mesh as well with native thinking in UDT the way I wrote out the equation. Again, it’s possible that we could make the two mesh via tiling, assuming that tiling with UDT agents optimizing over a map where actions included building further UDT agents introduced no further open problems or free variables or anomalies into UDT. But that’s a big assumption.
And then all this is just one small aspect of building an AGI, not most of the way AFAICT.
Please take a look at my adaption of parametric polymorphism to the updateless intelligence formalism.
I believe my new formalism circumvents the problem by avoiding strong prior sensitivity.
My proposal does look that way. I hope to publish an improved version soon which also admits logical uncertainty in the sense of being unable to know the zillionth digit of pi.
In my formalism input channels and arbitrary events in RAM have similar status.
Minor formal note: I have a mildly negative knee-jerk when someone repeatedly links to/promotes to something referred to only as “my ___”. Giving your formalism a proper name might make you sound less gratuitously self-promotional (which I don’t think you are).
Hi Vulture, thanks for your comment!
Actually I already have a name for the formalism: I call it the “updateless intelligence metric”. My intuition was that referring to my own invention by the serious-sounding name I gave it myself would sound more pompous / self-promotional than referring to it as just “my formalism”. Maybe I was wrong.
Setting aside what counts as a ‘solution’, merging two problems counts as progress on the problem only when the merged version is easier to solve than the unmerged version. Or when the merged version helps us arrive at an important conceptual insight about the unmerged version. You can collapse every FAI problem into a single problem that we need to solve anyway by treating them all as components of its utility function or action policy, but it’s not clear that represents progress, and it’s very clear it doesn’t represent a solution.
Naturalized induction is the problem of defining an AGI’s priors, from the angle of attack ‘how can we naturalize this?’. In other words, it’s the problem of giving the AGI a reasonable epistemology, as informed by the insight that AGIs are physical processes that don’t differ in any fundamental way from other physical processes. So it encompasses and interacts with a lot of problems.
That should be clearer in my next couple of posts on naturalized induction. I used Solomonoff induction as my entry point because it keeps the sequence grounded in the literature and in a precise formalism. (And I used AIXI because it makes the problems with Solomonoff induction, and some other Cartesian concerns, more vivid and concrete.) It’s an illustration of how and why being bad at reductionism can cripple an AGI, and a demonstration of how easy it is to neglect reductionism while specifying what you want out of an AGI. (So it’s not a straw problem, and there isn’t an obvious cure-all patch.)
I’m also going to use AIXI as an illustration for some other issues in FAI (e.g., self-representation and AGI delegability), so explaining AIXI in some detail now lays gets more people on the same page for later.
You may not need to solve naturalized induction to build a random UFAI. To build a FAI, I believe Eliezer thinks the largest hurdle is getting a recursively self-modifying agent to have stable specifiable preferences. That may depend on the AI’s decision theory, preferences, and external verifiability, or on aspects of its epistemology that don’t have much to do with the AI’s physicality.
My version of UDT (http://lesswrong.com/r/discussion/lw/jub/updateless_intelligence_metrics_in_the_multiverse/) maximizes over programs written for a given abstract “robot” (universal Turing machine + input channels).
Regarding an abstract solution to logical uncertainty, I think the solution given in http://lesswrong.com/lw/imz/notes_on_logical_priors_from_the_miri_workshop/ (which I use in my own post) is not bad. It still runs into the Loebian obstacle. I think I have a solution for that as well, going to write about it soon. Regarding something that can be implemented within reasonable computing resource constraints, well, see below...
The prior space of universes is covered: unsurprisingly it’s the Solomonoff prior (over abstract sequences of bits representing the universe, not over sensory data). Regarding the other stuff, my formalism doesn’t give an explicit solution (since I can’t explicitly write the optimal program of given length). However, the function I suggest to maximize already takes everything into account, including restricted computing resources.
Yes! If UDT solves this problem, that’s extremely good news. I mention the possibility here. Unfortunately, I (and several others) don’t understand UDT well enough to tease out all the pros and cons of this approach. It might take a workshop to build a full consensus about whether it solves the problem, as opposed to just reframing it in new terms. (And, if it’s a reframing, how much it deepens our understanding.)
Part of the goal of this sequence is to put introductory material about this problem in a single place, to get new workshop attendees and LWers on the same page faster. A lot of people are already familiar with these problems and have made important progress on them, but the opening moves are still scattered about in blog comments, private e-mails, wiki pages, etc.
It would be very valuable to pin down concrete examples of how UDT agents behave better than AIXI. (That may be easier after my next post, which goes into more detail about how and why AIXI misbehaves.) Even people who aren’t completely on board with UDT itself should be very excited about the possibility of showing that AIXI not only runs into a problem, but runs into a formally solvable problem. That makes for a much stronger case.
Goal stability looks like an ‘AGI problem’ in the sense that nearly all superintelligences converge on stable goals, but in practice it’s an FAI problem because a UFAI’s method of becoming stable is probably very different from an FAI’s method of being stable. Naturalized induction is an FAI problem in the same way; it would get solved by an UFAI, but that doesn’t help us (especially since the UFAI’s methods, even if we knew them, might not generalize well to clean, transparent architectures).
Do you have any specific questions about UDT that I can help answer? MIRI has held two decision theory workshops that I attended, and AFAIK nobody had a lot of difficulty understanding UDT, or thought that the UDT approach would have trouble with the kind of problem that you are describing in this sequence. It doesn’t seem very likely to me that someone would hold another workshop specifically to answer whether UDT handles this problem correctly, so I think our best bet is to just hash it out in this forum. (If we run into a lot of trouble communicating, we can always try something else at that point.)
(If you want to do this after your next post, then go ahead, but again it seems like you may be putting a lot of time and effort into writing this sequence, whereas if you spent a bit more time on UDT first, maybe you’d go “ok, this looks like a solved problem, let’s move on at least for now.” It’s not like there’s a shortage of other interesting and important problems to work on or introduce to people.)
I guess part of what’s making me think “you seem to be spending too much time on this” is that the problems/defects you’re describing with the AIXI approach here seem really obvious (at least in comparison to some other FAI-related problems), such that if somebody couldn’t see them right away or understand them in a few paragraphs, I think it’s pretty unlikely that they’d be able to contribute much to the kinds of problems that I’m interested in now.
For what it’s worth, I had a similar impression before, but now I suspect that either Eliezer doesn’t understand how UDT deals with that problem, or he has some objection that I don’t understand. That may or may not have something to do with his insistence on using causal models, which I also don’t understand.
I think I can explain why we might expect an UDT agent to avoid these problems. You’re probably already familiar with the argument at this level, but I haven’t seen it written up anywhere yet.
First, we’ll describe (informally) an UDT agent as a mathematical object. The preferences of the agent are built in (so no reward channel, which allows us to avoid preference solipsism). It will also have models of every possible universe, and also an understanding of its own mathematical structure. To make a decision given a certain input, it will scan each universe model for structures that will be logically dependent on its output. It will then predict what will happen in each universe for each particular output. Then, it will choose the output that maximizes its preferences.
Now let’s see why it won’t have the immortality problem. Let’s say the agent is considering an output string corresponding to an anvil experiment. After running the predictions of this in its models, it will realize that it will lose a significant amount of structure which is logically dependent on it. So unless it has very strange preferences, it will mark this outcome as low utility, and consider better options.
Similarly, the agent will also notice that some outputs correspond to having more structures which are logically dependent on it. For example, an output that built a faster version of an UDT agent would allow more things to be affected by future outputs. In other words, it would be able to self-improve.
To actually implement an UDT agent with these preferences, we just need to create something (most likely a computer programmed appropriately) that will be logically dependent on this mathematical object to a sufficiently high degree. This, of course, is the hard part, but I don’t see any reasons why a faithful implementation might suddenly have these specific problems again.
Another nice feature of UDT (which sometimes is treated as a bug) is that it is extremely flexible in how you can choose the utility function. Maybe you Just Don’t Care about worlds that don’t follow the Born probabilities—so just ignore anything that happens in such a universe in your utility function. I interpret this as meaning that UDT is a framework decision theory that could be used regardless of what the answers (or maybe just preferences) to anthropics, induction or other such things end up being.
Oh, and if anyone notices something I got wrong, or that I seem to be missing, please let me know—I want to understand UDT better :)
Apologies if this is a stupid question—I am not an expert—but how do we know what “level of reality” to have our UDT-agent model its world-models with? That is, if we program the agent to produce and scan universe-models consisting of unsliced representations of quark and lepton configurations, what happens if we discover that quarks and leptons are composed of more elementary particles yet?
Wei Dai has suggested that the default setting for a decision theory be Tegmark’s Level 4 Multiverse—where all mathematical structures exist in reality. So a “quark—lepton” universe and a string theory universe would both be considered among the possible universes—assuming they are consistent mathematically.
Of course, this makes it difficult to specify the utility function.
Yeah, your explanation sounds right.
To elaborate on “the preferences of the agent are built in”, that means the agent is coded with a description of a large but fixed mathematical formula with no free variables, and wants the value of that formula to be as high as possible. That doesn’t make much sense in simple cases like “I want the value of 2+2 to be as high as possible”, but it works in more complicated cases where the formula contains instances of the agent itself, which is possible by quining.
To elaborate on why “scanning each universe model for structures that will be logically dependent on its output” doesn’t need bridging laws, let’s note that it can be viewed as theorem proving. The agent might look for easily provable theorems of the form “if my mathematical structure has a certain input-output map, then this particular universe model returns a certain value”. Or it could use some kind of approximate logical reasoning, but in any case it wouldn’t need explicit bridging laws.