This was discussed on Facebook. I’ll copy-paste the entire conversation here, since it can only be viewed by people who have Facebook accounts.
Kaj Sotala: Re: Cartesian reasoning, it sounds like Orseau & Ring’s work on creating a version of the AIXI formalism in which the agent is actually embedded in the world, instead of being separated from it, would be relevant. Is there any particular reason why it hasn’t been mentioned?
Tsvi BT: [...] did you look at the paper Kaj posted above?
Luke Muehlhauser: [...] can you state this open problem using the notation from Orseau and Ring’s “Space-Time Embedded Intelligence” (2012), and thusly explain why the problem you’re posing isn’t solved by their own attack on the Cartesian boundary?
Eliezer Yudkowsky: Orseau and Ring say: “It is convenient to envision the space-time-embedded environment as a multi-tape Turing machine with a special tape for the agent.” As near as I can make out by staring at their equations and the accompanying text, their version is something I would still regard as basically Cartesian. Letting the environment modify the agent is still a formalism with a distinct agent and environment. Stating that the agent is part of the environment is still Cartesian if the agent gets a separate tape. That’s why they make no mention of bridging laws.
Tsvi BT:
...their version is something I would still regard as basically Cartesian.
(Their value definition is Cartesian because the utility is a function of just pi_t. I’m ignoring that as an obvious error, since of course we care about the whole environment. If that’s all you meant, then skip the rest of this comment.)
As I understand, their expression for V boils down to saying that the value of an agent (policy) pi₁ is just the expected value of the environment, given that pi₁ is embedded at time 1. This looks almost vacuously correct as an optimality notion; yes, we want to maximize the value of environment, so what? But it’s not Cartesian. The environment with pi₁ embedded just does whatever its rules prescribe after t=1, which can include destroying the agent, overwriting it, or otherwise modifying it.
Letting the environment modify the agent is still a formalism with a distinct agent and environment. Stating that the agent is part of the environment is still Cartesian if the agent gets a separate tape.
I think their formalism does not, despite appearances, treat pi₁ differently from the rest of the environment, except for the part where it is magically embedded at t=1 (and the utility being a function of pi_t). The separate tape for the agent is an immaterial visualization—the environment treats the agent’s tape the same as the rest as the environment. The formalism talks about an agent at each time step, but I could write down a function, that only mentions pi₁, assigning the same value to agents.
Anyway, for either a human or an AI, maximizing V amounts to solving the tiling problem and decision theory and so on, in an unbounded recursion, under uncertainty. A.k.a., “build a good AI”. So, it looks like a optimality notion that is correct (modulo some problems) and non-Cartesian, albeit just about useless. (Some problems: the utility is defined to depend on just the “agent” pi_t at time t, where it should depend on the whole environment; the series is, as usual, made to converge with an inexplicable discount factor; the environment is assumed to be computable and timeful; and we are supposed to maximize over a mysteriously correct prior on computable environments.)
Joshua Fox: I have to disagree with: “Stating that the agent is part of the environment is still Cartesian if the agent gets a separate tape.”
Orseau & Ring, in the paragraph you reference, say “This tape is used by the environment just like any other working-memory tape… The agent’s tape can be seen as a partial internal state of the environment.” So, that “separate” tape is just for purposes of discussion—not really part of their model.
In the next paragraph, they describe a Game-of-Life model in which the cells which are the agent are in no real way isolated from other cells. The designation of some cells as the agent is again, just a convenience and not a significant part of their model.
Orseau & Ring’s real insight is that in the end, the agent is just a utility function.
This eliminates the Cartesian boundary. The agent sees its world (which we can, for purposes of discussion, model as a Turing Machine, Game-of-Life, etc.) holistically—with no separate object which is the agent itself.
In practice, any agent, even if we interpret it using O&R’s approach, is going to have to care about its own implementation, to avoid those anvils. But that is only because protecting the a certain part of the universe (which we call the “embodiment”) is part of optimizing that utility function.
Evolution does not “care” about its own embodiment (whatever that may mean). A gene does not “care” about its embodiment in DNA, just about copying itself.
A superintelligent paper-clip optimizer does not care if its own parts are recycled, so long as the result is improved long-term paper-clip production.
So, following O&R, and taking this abstractly, can’t we just ask how good an agent/utility-function is at getting optimized given its environment?
We would note that many agents/utility-functions which do get optimized happen to be associated with special parts of the universe which we call the “embodiment” of the agent.
Eliezer Yudkowsky: Joshua, I might be underimpressed by O&R due to taking that sort of thing as a background assumption, but I didn’t see any math which struck me as particularly apt to expressing those ideas. Making the agent be a separate tape doesn’t begin to address naturalism vs. Cartesianism the way a bridging law does, and having your equation talk about completely general arbitrary modifications of the agent by the environment doesn’t get started on the agent representing itself within a universe the way that tiling does. I don’t mean to sound too negative on O&R in particular, in general in life I tend to feel positive reinforcement or a sense of progress on relatively rare and special occasions, but this was not one of those occasions. My feelings about O&R was that they set themselves a valid challenge statement, and then wrote down an equation that I wouldn’t have considered progress if I’d written it down myself. I also don’t reward myself for valid challenge statements because I can generate an unlimited number of them easily. I do feel a sense of progress on inventing an interesting well-posed subproblem with respect to a challenge that previously seemed impossible vague as a challenge, but O&R didn’t include what I consider to be an especially well-posed subproblem. Again, I bear no animus to O&R, just trying to explain why it is that I’m gazing with a vaguely distant expression at the celebrating that some other people seem to do when they read the paper.
Luke Muehlhauser: I sent Eliezer’s comment about the Ring & Orseau paper (“As near as I can make out by staring at their equations...”) to Orseau, and Orseau replied:
Regarding Eliezer’s comment… Tsvi’s reply is approximately correct.
Tsvi still also made a mistake: Our equation can take into account the whole universe’s internal state, but he’s been understandably confused (as has been Eliezer) by our ambiguous notation where we use “pi” instead of just some memory state “m”. The trick is that since the environment outputs a memory state at the second iteration, which can really be anything, and not just the policy of the agent, it is easy to define this state to be the whole internal state or, put differently, to consider only environments that output their whole internal state.
But our equation also allows for something more local, as we may not want the utility function to have access to the whole universe. It can also be locally growing for example.
Orseau doesn’t have time to engage the conversation on Facebook, but he gave me permission to post this bit of our private conversation here.
We can model induction in a monistic fashion pretty well—although at the moment the models are somewhat lacking in advanced inductive capacity/compression abilities. The models are good enough to be built and actually work.
Agents wireheading themselves or accidentally performing fatal experiments on themselves will probably be handled in much the same way that biology has handled it to date—e.g. by liberally sprinkling aversive sensors around the creature’s brain. The argument that such approaches do not scale up is probably wrong—designers will always be smarter than the creatures they build—and will successfully find ways to avoid undesirable self-modifications. If there are limits, they are obviously well above the human level—since individual humans have very limited self-brain-surgery abilities. If this issue does prove to be a significant problem, we won’t have to solve it without superhuman machine intelligence.
The vision of an agent improving its own brain is probably wrong: once you have one machine intelligence, you will soon have many copies of it—and a society of intelligent machines. That’s the easiest way to scale up—as has been proved in biological systems again and again. Agents will be produced in factories run by many such creatures. No individual agent is likely to do much in the way of fundamental redesign on itself. Instead groups of agents will design the next generation of agent.
That still leaves the possibility of a totalitarian world government wireheading itself—or performing fatal experiments on itself. However, a farsighted organization would probably avoid such fates—in order to avoid eternal oblivion at the hands of less short-sighted aliens.
Agents wireheading themselves or accidentally performing fatal experiments on themselves will probably be handled in much the same way that biology has handled it to date—e.g. by liberally sprinkling aversive sensors around the creature’s brain
Band-aids as a solution to catastrophes require that we’re able to see all the catastrophes coming. Biology doesn’t care about letting species evolve to extinction, so it’s happy to rely on hacky post-hoc solutions. We do care about whether we go extinct, so we can’t just turn random AGIs loose on our world and worry about all the problems after they’ve arisen.
The argument that such approaches do not scale up is probably wrong—designers will always be smarter than the creatures they build
Odd comment marked in bold. Why do you think that?
and will successfully find ways to avoid undesirable self-modifications
I’m confused. Doesn’t this predict that no undesirable technology will ever be (or has ever been) invented, much less sold?
If this issue does prove to be a significant problem, we won’t have to solve it without superhuman machine intelligence.
We can’t rely on a superintelligence to provide solutions to problems that need to be solved as a prerequisite to creating an SI it’s safe to ask for help on that class of solutions. Not every buck can be passed to the SI.
The vision of an agent improving its own brain is probably wrong
What about the vision of an agent improving on its design and then creating the new model of itself? Are you claiming that there will never be AIs used to program improved AIs?
once you have one machine intelligence, you will soon have many copies of it—and a society of intelligent machines
Because any feasible AI will want to self-replicate? Or because its designers will desire a bunch of copies?
What’s the relevant difference between a society of intelligent machines, and a singular intelligent machine with a highly modular reasoning and decision-making architecture? I.e., why did you bring up the ‘society’ topic in the first place?
However, a farsighted organization would probably avoid such fates—in order to avoid eternal oblivion at the hands of less short-sighted aliens.
I’m not seeing it. If wireheading is plausible, then it’s equally plausible given an alien-fearing government, since wireheading the human race needn’t get in the way of putting a smart AI in charge of neutralizing potential alien threats. Direct human involvement won’t always be a requirement.
why did you bring up the ‘society’ topic in the first place?
A society leads to a structure with advantages of power and intelligence over individuals. It means that we’ll always be able to restrain agents in test harnesses, for instance. It means that the designers will be smarter than the designed—via collective intelligence. If the the designers are smarter than the designed, maybe they’ll be able to stop them from wireheading themselves.
If wireheading is plausible, then it’s equally plausible given an alien-fearing government, since wireheading the human race needn’t get in the way of putting a smart AI in charge of neutralizing potential alien threats.
What I was talking about was “the possibility of a totalitarian world government wireheading itself”. The government wireheading itself isn’t really the same as humans wireheading. However, probably any wireheading increases the chances of being wiped out by less-stupid aliens. Optimizing for happiness and optimizing for survival aren’t really the same thing. As Grove said, only the paranoid survive.
This was discussed on Facebook. I’ll copy-paste the entire conversation here, since it can only be viewed by people who have Facebook accounts.
Kaj Sotala: Re: Cartesian reasoning, it sounds like Orseau & Ring’s work on creating a version of the AIXI formalism in which the agent is actually embedded in the world, instead of being separated from it, would be relevant. Is there any particular reason why it hasn’t been mentioned?
Tsvi BT: [...] did you look at the paper Kaj posted above?
Luke Muehlhauser: [...] can you state this open problem using the notation from Orseau and Ring’s “Space-Time Embedded Intelligence” (2012), and thusly explain why the problem you’re posing isn’t solved by their own attack on the Cartesian boundary?
Eliezer Yudkowsky: Orseau and Ring say: “It is convenient to envision the space-time-embedded environment as a multi-tape Turing machine with a special tape for the agent.” As near as I can make out by staring at their equations and the accompanying text, their version is something I would still regard as basically Cartesian. Letting the environment modify the agent is still a formalism with a distinct agent and environment. Stating that the agent is part of the environment is still Cartesian if the agent gets a separate tape. That’s why they make no mention of bridging laws.
Tsvi BT:
(Their value definition is Cartesian because the utility is a function of just pi_t. I’m ignoring that as an obvious error, since of course we care about the whole environment. If that’s all you meant, then skip the rest of this comment.)
As I understand, their expression for V boils down to saying that the value of an agent (policy) pi₁ is just the expected value of the environment, given that pi₁ is embedded at time 1. This looks almost vacuously correct as an optimality notion; yes, we want to maximize the value of environment, so what? But it’s not Cartesian. The environment with pi₁ embedded just does whatever its rules prescribe after t=1, which can include destroying the agent, overwriting it, or otherwise modifying it.
I think their formalism does not, despite appearances, treat pi₁ differently from the rest of the environment, except for the part where it is magically embedded at t=1 (and the utility being a function of pi_t). The separate tape for the agent is an immaterial visualization—the environment treats the agent’s tape the same as the rest as the environment. The formalism talks about an agent at each time step, but I could write down a function, that only mentions pi₁, assigning the same value to agents.
Anyway, for either a human or an AI, maximizing V amounts to solving the tiling problem and decision theory and so on, in an unbounded recursion, under uncertainty. A.k.a., “build a good AI”. So, it looks like a optimality notion that is correct (modulo some problems) and non-Cartesian, albeit just about useless. (Some problems: the utility is defined to depend on just the “agent” pi_t at time t, where it should depend on the whole environment; the series is, as usual, made to converge with an inexplicable discount factor; the environment is assumed to be computable and timeful; and we are supposed to maximize over a mysteriously correct prior on computable environments.)
Joshua Fox: I have to disagree with: “Stating that the agent is part of the environment is still Cartesian if the agent gets a separate tape.”
Orseau & Ring, in the paragraph you reference, say “This tape is used by the environment just like any other working-memory tape… The agent’s tape can be seen as a partial internal state of the environment.” So, that “separate” tape is just for purposes of discussion—not really part of their model.
In the next paragraph, they describe a Game-of-Life model in which the cells which are the agent are in no real way isolated from other cells. The designation of some cells as the agent is again, just a convenience and not a significant part of their model.
Orseau & Ring’s real insight is that in the end, the agent is just a utility function.
This eliminates the Cartesian boundary. The agent sees its world (which we can, for purposes of discussion, model as a Turing Machine, Game-of-Life, etc.) holistically—with no separate object which is the agent itself.
In practice, any agent, even if we interpret it using O&R’s approach, is going to have to care about its own implementation, to avoid those anvils. But that is only because protecting the a certain part of the universe (which we call the “embodiment”) is part of optimizing that utility function.
Evolution does not “care” about its own embodiment (whatever that may mean). A gene does not “care” about its embodiment in DNA, just about copying itself.
A superintelligent paper-clip optimizer does not care if its own parts are recycled, so long as the result is improved long-term paper-clip production.
So, following O&R, and taking this abstractly, can’t we just ask how good an agent/utility-function is at getting optimized given its environment?
We would note that many agents/utility-functions which do get optimized happen to be associated with special parts of the universe which we call the “embodiment” of the agent.
Eliezer Yudkowsky: Joshua, I might be underimpressed by O&R due to taking that sort of thing as a background assumption, but I didn’t see any math which struck me as particularly apt to expressing those ideas. Making the agent be a separate tape doesn’t begin to address naturalism vs. Cartesianism the way a bridging law does, and having your equation talk about completely general arbitrary modifications of the agent by the environment doesn’t get started on the agent representing itself within a universe the way that tiling does. I don’t mean to sound too negative on O&R in particular, in general in life I tend to feel positive reinforcement or a sense of progress on relatively rare and special occasions, but this was not one of those occasions. My feelings about O&R was that they set themselves a valid challenge statement, and then wrote down an equation that I wouldn’t have considered progress if I’d written it down myself. I also don’t reward myself for valid challenge statements because I can generate an unlimited number of them easily. I do feel a sense of progress on inventing an interesting well-posed subproblem with respect to a challenge that previously seemed impossible vague as a challenge, but O&R didn’t include what I consider to be an especially well-posed subproblem. Again, I bear no animus to O&R, just trying to explain why it is that I’m gazing with a vaguely distant expression at the celebrating that some other people seem to do when they read the paper.
Luke Muehlhauser: I sent Eliezer’s comment about the Ring & Orseau paper (“As near as I can make out by staring at their equations...”) to Orseau, and Orseau replied:
Orseau doesn’t have time to engage the conversation on Facebook, but he gave me permission to post this bit of our private conversation here.
Friends, I want to draw your attention to my (significant) improvement of the O&R framework here: http://lesswrong.com/lw/h4x/intelligence_metrics_and_decision_theories/ http://lesswrong.com/lw/h93/metatickle_intelligence_metrics_and_friendly/
We can model induction in a monistic fashion pretty well—although at the moment the models are somewhat lacking in advanced inductive capacity/compression abilities. The models are good enough to be built and actually work.
Agents wireheading themselves or accidentally performing fatal experiments on themselves will probably be handled in much the same way that biology has handled it to date—e.g. by liberally sprinkling aversive sensors around the creature’s brain. The argument that such approaches do not scale up is probably wrong—designers will always be smarter than the creatures they build—and will successfully find ways to avoid undesirable self-modifications. If there are limits, they are obviously well above the human level—since individual humans have very limited self-brain-surgery abilities. If this issue does prove to be a significant problem, we won’t have to solve it without superhuman machine intelligence.
The vision of an agent improving its own brain is probably wrong: once you have one machine intelligence, you will soon have many copies of it—and a society of intelligent machines. That’s the easiest way to scale up—as has been proved in biological systems again and again. Agents will be produced in factories run by many such creatures. No individual agent is likely to do much in the way of fundamental redesign on itself. Instead groups of agents will design the next generation of agent.
That still leaves the possibility of a totalitarian world government wireheading itself—or performing fatal experiments on itself. However, a farsighted organization would probably avoid such fates—in order to avoid eternal oblivion at the hands of less short-sighted aliens.
Band-aids as a solution to catastrophes require that we’re able to see all the catastrophes coming. Biology doesn’t care about letting species evolve to extinction, so it’s happy to rely on hacky post-hoc solutions. We do care about whether we go extinct, so we can’t just turn random AGIs loose on our world and worry about all the problems after they’ve arisen.
Odd comment marked in bold. Why do you think that?
I’m confused. Doesn’t this predict that no undesirable technology will ever be (or has ever been) invented, much less sold?
We can’t rely on a superintelligence to provide solutions to problems that need to be solved as a prerequisite to creating an SI it’s safe to ask for help on that class of solutions. Not every buck can be passed to the SI.
What about the vision of an agent improving on its design and then creating the new model of itself? Are you claiming that there will never be AIs used to program improved AIs?
Because any feasible AI will want to self-replicate? Or because its designers will desire a bunch of copies?
What’s the relevant difference between a society of intelligent machines, and a singular intelligent machine with a highly modular reasoning and decision-making architecture? I.e., why did you bring up the ‘society’ topic in the first place?
I’m not seeing it. If wireheading is plausible, then it’s equally plausible given an alien-fearing government, since wireheading the human race needn’t get in the way of putting a smart AI in charge of neutralizing potential alien threats. Direct human involvement won’t always be a requirement.
A society leads to a structure with advantages of power and intelligence over individuals. It means that we’ll always be able to restrain agents in test harnesses, for instance. It means that the designers will be smarter than the designed—via collective intelligence. If the the designers are smarter than the designed, maybe they’ll be able to stop them from wireheading themselves.
What I was talking about was “the possibility of a totalitarian world government wireheading itself”. The government wireheading itself isn’t really the same as humans wireheading. However, probably any wireheading increases the chances of being wiped out by less-stupid aliens. Optimizing for happiness and optimizing for survival aren’t really the same thing. As Grove said, only the paranoid survive.