The main split between the human cases and the AI cases is that the humans are ‘wireheading’ w.r.t. one ‘part’ or slice through their personality that gets to fulfill its desires at the expense of another ‘part’ or slice, metaphorically speaking; pleasure taking precedence over other desires. Also, the winning ‘part’ in each of these cases tends to be a part which values simple subjective pleasure, winning out over parts that have desires over the external world and desires for more complex interactions with that world (in the experience machine you get the complexity but not the external effects).
In the AI case, the AI is performing exactly as it was defined, in an internally unified way; the ideals by which it is called ‘wireheaded’ are only the intentions and ideals of the human programmers.
I also don’t think it’s practically possible to specify a powerful AI which actually operates to achieve some programmer goal over the external world, without the AI’s utility function being explicitly written over a model of that external world, as opposed to its utility function being written over histories of sensory data.
Illustration: In a universe operating according to Conway’s Game of Life or something similar, can you describe how to build an AI that would want to actually maximize the number of gliders, without that AI’s world-model being over explicit world-states and its utility function explicitly counting gliders? Using only the parts of the universe that directly impinge on the AI’s senses—just the parts of the cellular automaton that impinge on the AI’s screen—can you find any maximizable quantity that corresponds to the number of gliders in the outside world? I don’t think you’ll find any possible way to specify a glider-maximizing utility function over sense histories unless you only use the sense histories to update a world-model and have the utility function be only over that world-model, and even then the extra level of indirection might open up a possibility of ‘wireheading’ (of the AI’s real operation vs. programmer-desired glider-maximizing operation) if any number of plausible minor errors were made.
Definition: An agent is an algorithm that models the effects of (several different) possible future actions on the world and performs the action that yields the highest value according to some evaluation procedure.
The word “value” seems unnecessarily value-laden here.
Alternatively: A consequentialist agent is an algorithm with causal connections both to and from the world, which uses the causal effect of the world upon itself (sensory data) to build a predictive model of the world, which it uses to model the causal outcomes of alternative internal states upon the world (the effect of its decisions and actions), evaluates these predicted consequences using some algorithm and assigns the prediction an ordered or continuous quantity (in the standard case, expected utility), and then decides an action corresponding to expected consequences which are thresholded above, relatively high, or maximal in this assigned quantity.
Simpler: A consequentialist agent predicts the effects of alternative actions upon the world, assigns quantities over those consequences, and chooses an action whose predicted effects have high value of this quantity, therefore operating to steer the external world into states corresponding to higher values of this quantity.
The authors argue that [… in addition to some other agents] the goal-seeking agent that gets one utiliton every time it satisfies a pre-specified goal and no utility otherwise [...], will all decide to build and use a delusion box.
They’re using the term “goal seeking agent” in a perverse way. As EY explains in his third and fourth paragraphs, seeking a result defined in sensory-data terms is not the only, or even usual, sense of “goal” that people would attach to the phrase “goal seeking agent”. Nor is that a typical goal that a programmer would want an AI to seek.
Simpler: A consequentialist agent predicts the effects of alternative actions upon the world, assigns quantities over those consequences, and chooses an action whose predicted effects have high value of this quantity, therefore operating to steer the external world into states corresponding to higher values of this quantity.
I like seeing a concise description that doesn’t strictly imply that consequentialists must necessarily seek expected value. (I probably want to seek that as far as I can evaluate my preferences but it doesn’t seem to be the only consequentialist option.)
You are attempting to distinguish between “quantity” and “value”? Or “prediction” and “expectation”? Either way, it doesn’t seem to make very much sense.
I’m not sure I understand the illustration. In particular, I don’t understand what “want” means if it doesn’t mean having a world-model over world-states and counting gliders.
I guess “want” in “AI that would want to actually maximize the number of gliders” refers to having a tendency to produce a lot of gliders. If you have an opaque AI with obfuscated and somewhat faulty “jumble of wires” design, you might be unable to locate its world model in any obvious way, but you might be able to characterize its behavior. The point of the example is to challenge the reader to imagine a design of an AI that achieves the tendency of producing gliders in many environments, but isn’t specified in terms of some kind of world model module with glider counting over that world model.
Any utility function has to be calculated sensory inputs and internal state—since that’s all the information any agent ever has. Any extrapolation of an external world is calculated in turn from sensory inputs and internal state. Either way, the domain of any utility function is ultimately sensory inputs and internal state. There’s not really a ‘problem’ with working from sensory inputs and internal state—that’s what all agents necessarily have to do.
The main split between the human cases and the AI cases is that the humans are ‘wireheading’ w.r.t. one ‘part’ or slice through their personality that gets to fulfill its desires at the expense of another ‘part’ or slice, metaphorically speaking; pleasure taking precedence over other desires. Also, the winning ‘part’ in each of these cases tends to be a part which values simple subjective pleasure, winning out over parts that have desires over the external world and desires for more complex interactions with that world (in the experience machine you get the complexity but not the external effects).
In the AI case, the AI is performing exactly as it was defined, in an internally unified way; the ideals by which it is called ‘wireheaded’ are only the intentions and ideals of the human programmers.
I also don’t think it’s practically possible to specify a powerful AI which actually operates to achieve some programmer goal over the external world, without the AI’s utility function being explicitly written over a model of that external world, as opposed to its utility function being written over histories of sensory data.
Illustration: In a universe operating according to Conway’s Game of Life or something similar, can you describe how to build an AI that would want to actually maximize the number of gliders, without that AI’s world-model being over explicit world-states and its utility function explicitly counting gliders? Using only the parts of the universe that directly impinge on the AI’s senses—just the parts of the cellular automaton that impinge on the AI’s screen—can you find any maximizable quantity that corresponds to the number of gliders in the outside world? I don’t think you’ll find any possible way to specify a glider-maximizing utility function over sense histories unless you only use the sense histories to update a world-model and have the utility function be only over that world-model, and even then the extra level of indirection might open up a possibility of ‘wireheading’ (of the AI’s real operation vs. programmer-desired glider-maximizing operation) if any number of plausible minor errors were made.
The word “value” seems unnecessarily value-laden here.
Alternatively: A consequentialist agent is an algorithm with causal connections both to and from the world, which uses the causal effect of the world upon itself (sensory data) to build a predictive model of the world, which it uses to model the causal outcomes of alternative internal states upon the world (the effect of its decisions and actions), evaluates these predicted consequences using some algorithm and assigns the prediction an ordered or continuous quantity (in the standard case, expected utility), and then decides an action corresponding to expected consequences which are thresholded above, relatively high, or maximal in this assigned quantity.
Simpler: A consequentialist agent predicts the effects of alternative actions upon the world, assigns quantities over those consequences, and chooses an action whose predicted effects have high value of this quantity, therefore operating to steer the external world into states corresponding to higher values of this quantity.
Changed it to “number”.
They’re using the term “goal seeking agent” in a perverse way. As EY explains in his third and fourth paragraphs, seeking a result defined in sensory-data terms is not the only, or even usual, sense of “goal” that people would attach to the phrase “goal seeking agent”. Nor is that a typical goal that a programmer would want an AI to seek.
I like seeing a concise description that doesn’t strictly imply that consequentialists must necessarily seek expected value. (I probably want to seek that as far as I can evaluate my preferences but it doesn’t seem to be the only consequentialist option.)
I’m curious, what other options are you thinking of?
You are attempting to distinguish between “quantity” and “value”? Or “prediction” and “expectation”? Either way, it doesn’t seem to make very much sense.
No.
I’m not sure I understand the illustration. In particular, I don’t understand what “want” means if it doesn’t mean having a world-model over world-states and counting gliders.
I guess “want” in “AI that would want to actually maximize the number of gliders” refers to having a tendency to produce a lot of gliders. If you have an opaque AI with obfuscated and somewhat faulty “jumble of wires” design, you might be unable to locate its world model in any obvious way, but you might be able to characterize its behavior. The point of the example is to challenge the reader to imagine a design of an AI that achieves the tendency of producing gliders in many environments, but isn’t specified in terms of some kind of world model module with glider counting over that world model.
Any utility function has to be calculated sensory inputs and internal state—since that’s all the information any agent ever has. Any extrapolation of an external world is calculated in turn from sensory inputs and internal state. Either way, the domain of any utility function is ultimately sensory inputs and internal state. There’s not really a ‘problem’ with working from sensory inputs and internal state—that’s what all agents necessarily have to do.