Here’s an attack on section 4.1. Consider the possibility that “philosophical ability” (something like the ability to solve confusing problems that can’t be easily formalized) is needed to self-improve beyond some threshold of intelligence, and this same “philosophical ability” also reliably causes one to decide that some particular goal G is the right goal to have, and therefore beyond some threshold of intelligence all agents have goal G. To deny this possibility seems to require more meta-philosophical knowledge than we currently possess.
Yes, to deny it requires more meta-philosophical knowledge than we currently possess. But to affirm it as likely requires more meta-philosophical knowledge than we currently possess. My purpose is to show that it’s very unlikely, not that it’s impossible.
Do you feel I didn’t make that point? Should I have addressed “moral realism” explicitly? I didn’t want to put down the words, because it raises defensive hackles if I start criticising a position directly.
Perhaps I should have said “To conclude that this possibility is very unlikely” instead of “To deny this possibility”. My own intuition seems to assign a probability to it that is greater than “very unlikely” and this was largely unchanged after reading your paper. For example, many of the items in the list in section 4.5, that have to be true if orthogonality was false, can be explained by my hypothesis, and the rest do not seem very unlikely to begin with.
My own intuition seems to assign a probability to it that is greater than “very unlikely”
Why? You’re making an extraordinary claim. Something—undefined—called philosophical ability is needed (for some reason) to self improve and, for some extraordinary and unexplained reason, this ability causes an agent to have a goal G. Where goal G is similarly undefined.
Let me paraphrase: Consider the possibility that “mathematical ability” is needed to self-improve beyond some threshold of intelligence, and this same “mathematical ability” also reliably causes one to decide that some particular goal G is the right goal to have, and therefore beyond some threshold of intelligence all agents have goal G.
Why is this different? What in your intuition is doing the work “philosophical ability” → same goals? If we call it something else than “philosophical ability”, would you have the same intuition? What raises the status of that implication to the level that it’s worthy of consideration?
I’m asking seriously—this is the bit in the argument I consistently fail to understand, the bit that never makes sense to me, but who’s outline I can feel in most counterarguments.
It seems to me there are certain similarities and correlations between thinking about decision theory (which potentially makes one or an AI one builds more powerful) and thinking about axiology (what terminal goals one should have). They’re both “ought” questions, and If you consider the intelligences that we can see or clearly reason about (individual humans, animals, Bayesian EU maximizer, narrow AIs that exist today), there seems a clear correlation between “ability to improve decision theory via philosophical reasoning” (as opposed to CDT-AI changing into XDT and then being stuck with that) and “tendency to choose one’s goals via philosophical reasoning”.
One explanation for this correlation (and also the only explanation I can see at the moment, besides it being accidental) is that something we call “philosophical ability” is responsible for both. Assuming that’s the case, that still leaves the question of whether philosophical ability backed up with enough computing power eventually leads to goal convergence.
One major element of philosophical reasoning seems to be a distaste for and tendency to avoid arbitrariness. It doesn’t seem implausible that for example “the ultimate philosopher” would decide that every goal except pursuit of pleasure / avoidance of pain is arbitrary (and think that pleasure/pain is not arbitrary due to philosophy-of-mind considerations).
One major element of philosophical reasoning seems to be a distaste for and tendency to avoid arbitrariness.
If an agent has goal G1 and sufficient introspective access to know its own goal, how would avoiding arbirtrariness in its goals help it achieve goal G1 better than keeping goal G1 as its goal?
I suspect we humans are driven to philosophize about what our goals ought to be by our lack of introspective access, and that searching for some universal goal, rather than what we ourselves want, is a failure mode of this philosophical inquiry.
I think we don’t just lack introspective access to our goals, but can’t be said to have goals at all (in the sense of preference ordering over some well defined ontology, attached to some decision theory that we’re actually running). For the kind of pseudo-goals we have (behavior tendencies and semantically unclear values expressed in natural language), they don’t seem to have the motivational strength to make us think “I should keep my goal G1 instead of avoiding arbitrariness”, nor is it clear what it would mean to “keep” such pseudo-goals as one self-improves.
What if it’s the case that evolution always or almost always produces agents like us, so the only way they can get real goals in the first place is via philosophy?
The primary point of my comment was to argue that an agent that has a goal in the strong sense would not abandon its goal as a result of philosophical consideration. Your response seems more directed at my afterthought about how our intuitions based on human experience would cause us to miss the primary point.
I think that we humans do have goals, despite not being able to consistantly pursue them. I want myself and my fellow humans to continue our subjective experiences of life in enjoyable ways, without modifying what we enjoy. This includes connections to other people, novel experiences, high challenge, etc. There is, of course, much work to be done to complete this list and fully define all the high level concepts, but in the end I think there are real goals there, which I would like to be embodied in a powerful agent that actually runs a coherent decision theory. Philosophy probably has to play some role in clarifying our “pseudo-goals” as actual goals, but so does looking at our “pseudo-goals”, however arbitrary they may be.
The primary point of my comment was to argue that an agent that has a goal in the strong sense would not abandon its goal as a result of philosophical consideration.
Such an agent would also not change its decision theory as a result of philosophical consideration, which potentially limits its power.
Philosophy probably has to play some role in clarifying our “pseudo-goals” as actual goals, but so does looking at our “pseudo-goals”, however arbitrary they may be.
I wouldn’t argue against this as written, but Stuart was claiming that convergence is “very unlikely” which I think is too strong.
Such an agent would also not change its decision theory as a result of philosophical consideration, which potentially limits its power.
I don’t think that follows, or at least the agent could change its decision theory as a result of some consideration, which may or may not be “philosophical”. We already have the example that a CDT agent that learns in advance it will face Newcomb’s problem could predict it would do better if it switched to TDT.
“ability to improve decision theory via philosophical reasoning” (as opposed to CDT-AI changing into XDT and then being stuck with that)
XDT (or in Eliezer’s words, “crippled and inelegant form of TDT”) is closer to TDT but still worse. For example, XDT would fail to acausally control/trade with other agents living before the time of its self-modification, or in other possible worlds.
Ah, yes, I agree that CDT would modify to XDT rather than TDT, though the fact that it self modifies at all shows that goal driven agents can change decision theories because the new decision theory helps it achieve its goal. I do think that it’s important to consider how a particular decision theory can decide to self modify, and to design an agent with a decision theory that can self modify in good ways.
Not strictly. If strongly goal’d agent determines that a different decision theory (or any change to itself) better maximizes its goal, it would adopt that new decision theory or change.
I agree that humans are not utility-maximizers or similar goal-oriented agents—not in the sense we can’t be modeled as such things, but in the sense that these models do not compress our preferences to any great degree, which happens to be because they are greatly at odds with our underlying mechanisms for determining preference and behavior.
Also, can we even get ‘real goals’ like this? We’re threading onto land of potentially proposing something as silly as blue unicorns on back side of the moon. We use goals to model other human intelligences, that is built into our language, that’s how we imagine other agents, that’s how you predict a wolf, a cat, another ape, etc. The goals are really easy within imagination (which is not reductionist and where the true paperclip count exists as a property of the ‘world’). Outside imagination, though...
If an agent has goal G1 and sufficient introspective access to know its own goal, how would avoiding arbirtrariness in its goals help it achieve goal G1 better than keeping goal G1 as its goal?
Avoiding arbitrariness is useful to epistemic rationality and therefore to instrumental rationality. If an AI has rationality as a goal it will avoid arbitrariness, whether or not that assists with G1.
Avoiding arbitrariness is useful to epistemic rationality and therefore to instrumental rationality.
Avoiding giving credence to arbitrary beliefs is useful to epistemic rationality and therefor to instrumental rationality, and therefor to goal G1. Avoiding arbitrariness in goals still does not help with achieving G1 if G1 is considered arbitrary. Be careful not to conflate different types of arbitrariness.
If an AI has rationality as a goal
Rationality is not an end goal, it is that which you do in pursuit of a goal that is more important to you than being rational.
If an agent has goal G1 and sufficient introspective access to know its own goal, how would avoiding arbirtrariness in its goals help it achieve goal G1 better than keeping goal G1 as its goal?
You are making the standard MIRI assumptions that goals are unupdatable, and not including
rationality (non arbitrariness, etc) as a terminal value. (The latter is particularly odd, as Orthogonality implies it).
I suspect we humans are driven to philosophize about what our goals ought to be by our lack of introspective access, and that searching for some universal goal, rather than what we ourselves want, is a failure mode of this philosophical inquiry.
I suspect we want universal goals for the same reason we want universal laws.
You are making the standard MIRI assumptions that goals are unupdatable
No, I am arguing that agents with goals generally don’t want to update their goals. Neither I nor MIRI assume goals are unupdatable, actually a major component of MIRI’s research is on how to make sure a self improving AI has stable goals.
and don’t include rationality (non arbitrariness, etc) as a terminal value. (The latter is particularly odd, as Orthogonality implies it).
It is possible to have an agent that terminally values meta properties of its own goal system. Such agents, if they are capable of modifying their goal system, will likely self modify to some self-consistent “attractor” system. This does not mean that all agents will converge on a universal goal system. There are different ways that agents can value meta properties of their own goal system, so there are likely many attractors, and many possible agents don’t have such meta values and will not want to modify their goal systems.
It is possible to have an agent that terminally values meta properties of its own goal system. Such agents, if they are capable of modifying their goal system, will likely self modify to some self-consistent “attractor” system. This does not mean that all agents will converge on a universal goal system.
Who asserted they would? Moral agents can have all sorts of goals, They just have to respect each others values. If Smith wants to be an athlete, and Robinson is a budding writer, that doesn’t mean one of them is immoral.
There are different ways that agents can value meta properties of their own goal system,
Ok. That would be a problem with your suggestion of valuing arbitrary meta properties of their goal system. Then lets go back to my suggestion of valuing rationality.
so there are likely many attractors, and many possible agents don’t have such meta values and will not want to modify their goal systems.
Agents will do what they are built to do. If agents that don’t value rationality are dangerous, build ones that do.
MIRI: “We have detemined that cars without bbrakes are dangerous. We have also determined that the best solution is to reduce the speed limit to 10mph”
Everyone else: “We know cars without brakes are dangerous. That’s why we build them with brakes”.
Who asserted they would? Moral agents can have all sorts of goals, They just have to respect each others values. If Smith wants to be an athlete, and Robinson is a budding writer, that doesn’t mean one of them is immoral.
Have to, or else what? And how do we separate moral agents from agents that are not moral?
Ok. That would be a problem with your suggestion of valuing arbitrary meta properties of their goal system. Then lets go back to my suggestion of valuing rationality.
Agents will do what they are built to do. If agents that don’t value rationality are dangerous, build ones that do.
MIRI: “We have detemined that cars without bbrakes are dangerous. We have also determined that the best solution is to reduce the speed limit to 10mph”
Everyone else: “We know cars without brakes are dangerous. That’s why we build them with brakes”.
If the solution is to build agents that “value rationality,” can you explain how to do that? If it’s something so simple as to be analogous to adding brakes to a car, as opposed to, say, programming the car to be able to drive itself (let alone something much more complicated,) then it shouldn’t be so difficult to describe how to do it.
Robin Hanson’s ‘far mode’ (his take on construal level theory) is a plausible match to this ‘something’. Hanson points out that far mode is about general categories and creative metaphors. This is a match to something from AGI research...categorization and analogical inference. This can be linked to Bayesian inference by considering analogical inference as a natural way of reasoning about ‘priors’.
...and, for some extraordinary and unexplained reason, this ability causes an agent to have a goal G.
A plausible explanation is that analogical inference is associated with sentience (subjective experience), as suggested by Douglas Hofstadter (who has stated he thinks ‘analogies’ are the core of conscious cognition). Since sentience is closely associated with moral reasoning, it’s at least plausible that this ability could indeed give rise to converge on a particular G.
Where goal G is similarly undefined.
Here is a way G can be defined:
Analogical inference is concerned with Knowledge Representation (KR), so we could redefine ethics based on ‘representations of values’ (‘narratives’, which as Daniel Dennett has pointed out,indeed seem to be closely linked to subjective experience) rather than external consequences. At this point we can bring in the ideas of Schmidhuber and recall a powerful point made by Hanson (see below).
For maximum efficiency, all AGIs with the aforementioned ‘philosophical ability’ (analogical inference and production of narratives) would try to minimize the complexity of the cognitive processes generating its internal narratives. This could place universal contraints of what these values are. For example, Schmidhuber pointed out that data compression could be used to get a precise definition of ‘beauty’.
Lets now recall a powerful point Hanson made a while back on OB: the brain/mind can be totally defined in terms of a ‘signal processor’. Given this perspective, we could then view the correct G as the ‘signal’ and moral errors as ‘noise’. Algorithmic information theory could then be used to define a complexity metric that would precisely define this G.
Schmidthuber’s definition of beauty is wrong. He says, roughly, that you’re most pleased when after great effort you find a way to compress what was seemingly incompressible. If that were so, I could please you again and again by making up new AES keys with the first k bits random and the rest zero, and using them to generate and give you a few terabytes of random data. You’d have to brute force the key, at which point you’ll have compressed down from terabytes to kilobytes. What beauty! Let’s play the exact game again, with the exact same cipher but a different key, forever.
Right. That said, wireheading, aka the grounding problem, is a huge unsolved philosophical problem, so I’m not sure Schmidhuber is obligated to answer wireheading objections to his theory.
But the theory fails because this fits it but isn’t wireheading, right? It wouldn’t actually be pleasing to play that game.
I think you are right.
The two are errors that practically, with respect to hedonistic extremism, operate in opposing directions. They are similar in form in as much as they fit the abstract notion “undesirable outcomes due to lost purposes when choosing to optimize what turns out to be a poor metric for approximating actual preferences”.
Meh, yeah, maybe? Still seems like other, more substantive objections could be made.
Relatedly, I’m not entirely sure I buy Steve’s logic. PRNGs might not be nearly as interesting as short mathematical descriptions of complex things, like Chaitin’s omega. Arguably collecting as many bits of Chaitin’s omega as possible, or developing similar maths, would in fact be interesting in a human sense. But at that point our models really break down for many reasons, so meh whatever.
Right. That said, wireheading, aka the grounding problem, is a huge unsolved philosophical problem, so I’m not sure Schmidhuber is obligated to answer wireheading objections to his theory.
Unsolved philsophical problem? Huh? No additional philosophical breakthroughs are required for wireheading to not be a problem.
If I want (all things considered, etc) to wirehead, I’ll wirehead. If I don’t want to wirehead I will not wirehead. Wireheading introduces no special additional problems and is handled the same way all other preferences about future states of the universe can be handled.
(Note: It is likely that you have some more specific point regarding in what sense you consider wireheading ‘unsolved’. I welcome explanations or sources.)
Unsolved in the sense that we don’t know how to give computer intelligences intentional states in a way that everyone would be all like “wow that AI clearly has original intentionality and isn’t just coasting off of humans sitting at the end of the chain interpreting their otherwise entirely meaningless symbols”. Maybe this problem is just stupid and will solve itself but we don’t know that yet, hence e.g. Peter’s (unpublished?) paper on goal stability under ontological shifts. (ETA: I likely don’t understand how you’re thinking about the problem.)
Unsolved in the sense that we don’t know how to give computer intelligences intentional states in a way that everyone would be all like “wow that AI clearly has original intentionality and isn’t just coasting off of humans sitting at the end of the chain interpreting their otherwise entirely meaningless symbols”.
Being able to do this would also be a step towards the related goal of trying to give computer intelligences intelligence that we cannot construe as ‘intentionality’ in any morally salient sense, so as to satisfy any “house-elf-like” qualms that we may have.
e.g. Peter’s (unpublished?) paper on goal stability under ontological shifts.
Do philosophers have an incredibly strong ugh field around anything that can be deemed ‘implementation detail’? Clearly, ‘superintelligence’ the string of letters can have what ever ‘goals’ the strings of letters, no objection here. The superintelligence in form of distributed system with millisecond or worse lag between components, and nanosecond or better clock speed, on the other hand...
Looking at your post at http://lesswrong.com/lw/2id/metaphilosophical_mysteries, I can see the sketch of an argument. It goes something like “we know that some decision theories/philosophical processes are ’objectively ’inferior, hence some are objectively superior, hence (wave hands furiously) it is at least possible that some system is objectively best”.
I would counter:
1) The argument is very weak. We know some mathematical axiomatic systems are contradictory, hence inferior. It doesn’t follow from that that there is any “best” system of axioms.
2) A lot of philosophical progress is entirely akin to mathematical progress: showing the consequences of the axioms/assumptions. This is useful progress, but not really relevant to the argument.
3) All the philosophical progress seems to lie on the “how to make better decisions given a goal” side; none of it lies on the “how to have better goals” side. Even the expected utility maximisation result just says “if you are unable to predict effectively over the long term, then to achieve your current goals, it would be more efficient to replace these goals with others compatible with a utility function”.
However, despite my objections, I have to note that the argument is at least an argument, and provides some small evidence in that direction. I’ll try and figure out whether it should be included in the paper.
Other possibility that is easy to see if you are to think more like an engineer and less like philosopher:
The AI is to operate with light-speed delay, and has to be made of multiple nodes. It is entirely possible that some morality systems would not allow efficient solutions to this challenge (i.e. would break into some sort of war between modules, or otherwise fail to intellectually collaborate).
It is likely that there’s only a limited number of good solutions to P2P intelligence design, and the one that would be found would be substantially similar to our own solution of fundamentally same problem, solution which we call ‘morality’, complete with various non-utilitarian quirks.
edit: that is, our ‘morality’ is the set of rules for inter-node interaction in society, and some of such rules just don’t work. Orthogonality thesis for anything in any sense practical is a conjunction of potentially very huge number of propositions (which are assumed false without consideration, by omission) - any sort of consideration not yet considered can break the symmetry between different goals, then another such consideration is incredibly unlikely to add symmetry back.
If an agent with goal G1 acquires sufficient “philosophical ability”, that it concludes that goal G is the right goal to have, that means that it decided that the best way to achieve goal G1 is to pursue goal G. For that to happen, I find it unlikely that goal G is anything other than a clarification of goal G1 in light of some confusion revealed by the “philosophical ability”, and I find it extremely unlikely that there is some universal goal G that works for any goal G1.
Offbeat counter: You’re assuming that this ontology that privileges “goals” over e.g. morality is correct. What if it’s not? Are you extremely confident that you’ve carved up reality correctly? (Recall that EU maximizers haven’t been shown to lead to AGI, and that many philosophers who have thought deeply about the matter hold meta-ethical views opposed to your apparent meta-ethics.) I.e., what if your above analysis is not even wrong?
You’re assuming that this ontology that privileges “goals” over e.g. morality is correct.
I don’t believe that goals are ontologically fundamental. I am reasoning (at a high level of abstraction) about the behavior of a physical system designed to pursue a goal. If I understood what you mean by “morality”, I could reason about a physical system designed to use that and likely predict different behaviors than for the physical system designed to pursue a goal, but that doesn’t change my point about what happens with goals.
Recall that EU maximizers haven’t been shown to lead to AGI
I don’t expect EU maximizers to lead to AGI. I expect EU maximizing AGIs, whatever has led to them, to be effective EU maximizers.
Sorry, I meant “ontology” in the information science sense, not the metaphysics sense; I simply meant that you’re conceptually (not necessarily metaphysically) privileging goals. What if you’re wrong to do that? I suppose I’m suggesting that carving out “goals” might be smuggling in conclusions that make you think universal convergence is unlikely. If you conceptually privileged rational morality instead, as many meta-ethicists do, then your conclusions might change, in which case it seems you’d have to be unjustifiably confident in your “goal”-centric conceptualization.
I think I am only “privileging” goals in a weak sense, since by talking about a goal driven agent, I do not deny the possibility of an agent built on anything else, including your “rational morality”, though I don’t know what that is.
Are you arguing that a goal driven agent is impossible? (Note that this is a stronger claim than it being wiser to build some other sort of agent, which would not contradict my reasoning about what a goal driven agent would do.)
(Yeah, the argument would have been something like, given a sufficiently rich and explanatory concept of “agent”, goal-driven agents might not be possible—or more precisely, they aren’t agents insofar as they’re making tradeoffs in favor of local homeostatic-like improvements as opposed to traditionally-rational, complex, normatively loaded decision policies. Or something like that.)
Let me try to strengthen your point. If an agent with goal G1 acquires sufficient “philosophical ability”, that it concludes that goal G is the right goal to have, that means that it decided that the best way to achieve goal G1 is to pursue what it thinks is the “right goal to have”. This would require it to take a kind of normative stance on goal fulfillment, which would require it to have normative machinery, which would need to be implemented in the agents mind. Is it impossible to create an agent without normative machinery of this kind? Does philosophical ability depend directly on normative machinery?
Here’s an attack on section 4.1. Consider the possibility that “philosophical ability” (something like the ability to solve confusing problems that can’t be easily formalized) is needed to self-improve beyond some threshold of intelligence, and this same “philosophical ability” also reliably causes one to decide that some particular goal G is the right goal to have, and therefore beyond some threshold of intelligence all agents have goal G. To deny this possibility seems to require more meta-philosophical knowledge than we currently possess.
Yes, to deny it requires more meta-philosophical knowledge than we currently possess. But to affirm it as likely requires more meta-philosophical knowledge than we currently possess. My purpose is to show that it’s very unlikely, not that it’s impossible.
Do you feel I didn’t make that point? Should I have addressed “moral realism” explicitly? I didn’t want to put down the words, because it raises defensive hackles if I start criticising a position directly.
Perhaps I should have said “To conclude that this possibility is very unlikely” instead of “To deny this possibility”. My own intuition seems to assign a probability to it that is greater than “very unlikely” and this was largely unchanged after reading your paper. For example, many of the items in the list in section 4.5, that have to be true if orthogonality was false, can be explained by my hypothesis, and the rest do not seem very unlikely to begin with.
Why? You’re making an extraordinary claim. Something—undefined—called philosophical ability is needed (for some reason) to self improve and, for some extraordinary and unexplained reason, this ability causes an agent to have a goal G. Where goal G is similarly undefined.
Let me paraphrase: Consider the possibility that “mathematical ability” is needed to self-improve beyond some threshold of intelligence, and this same “mathematical ability” also reliably causes one to decide that some particular goal G is the right goal to have, and therefore beyond some threshold of intelligence all agents have goal G.
Why is this different? What in your intuition is doing the work “philosophical ability” → same goals? If we call it something else than “philosophical ability”, would you have the same intuition? What raises the status of that implication to the level that it’s worthy of consideration?
I’m asking seriously—this is the bit in the argument I consistently fail to understand, the bit that never makes sense to me, but who’s outline I can feel in most counterarguments.
It seems to me there are certain similarities and correlations between thinking about decision theory (which potentially makes one or an AI one builds more powerful) and thinking about axiology (what terminal goals one should have). They’re both “ought” questions, and If you consider the intelligences that we can see or clearly reason about (individual humans, animals, Bayesian EU maximizer, narrow AIs that exist today), there seems a clear correlation between “ability to improve decision theory via philosophical reasoning” (as opposed to CDT-AI changing into XDT and then being stuck with that) and “tendency to choose one’s goals via philosophical reasoning”.
One explanation for this correlation (and also the only explanation I can see at the moment, besides it being accidental) is that something we call “philosophical ability” is responsible for both. Assuming that’s the case, that still leaves the question of whether philosophical ability backed up with enough computing power eventually leads to goal convergence.
One major element of philosophical reasoning seems to be a distaste for and tendency to avoid arbitrariness. It doesn’t seem implausible that for example “the ultimate philosopher” would decide that every goal except pursuit of pleasure / avoidance of pain is arbitrary (and think that pleasure/pain is not arbitrary due to philosophy-of-mind considerations).
If an agent has goal G1 and sufficient introspective access to know its own goal, how would avoiding arbirtrariness in its goals help it achieve goal G1 better than keeping goal G1 as its goal?
I suspect we humans are driven to philosophize about what our goals ought to be by our lack of introspective access, and that searching for some universal goal, rather than what we ourselves want, is a failure mode of this philosophical inquiry.
I think we don’t just lack introspective access to our goals, but can’t be said to have goals at all (in the sense of preference ordering over some well defined ontology, attached to some decision theory that we’re actually running). For the kind of pseudo-goals we have (behavior tendencies and semantically unclear values expressed in natural language), they don’t seem to have the motivational strength to make us think “I should keep my goal G1 instead of avoiding arbitrariness”, nor is it clear what it would mean to “keep” such pseudo-goals as one self-improves.
What if it’s the case that evolution always or almost always produces agents like us, so the only way they can get real goals in the first place is via philosophy?
The primary point of my comment was to argue that an agent that has a goal in the strong sense would not abandon its goal as a result of philosophical consideration. Your response seems more directed at my afterthought about how our intuitions based on human experience would cause us to miss the primary point.
I think that we humans do have goals, despite not being able to consistantly pursue them. I want myself and my fellow humans to continue our subjective experiences of life in enjoyable ways, without modifying what we enjoy. This includes connections to other people, novel experiences, high challenge, etc. There is, of course, much work to be done to complete this list and fully define all the high level concepts, but in the end I think there are real goals there, which I would like to be embodied in a powerful agent that actually runs a coherent decision theory. Philosophy probably has to play some role in clarifying our “pseudo-goals” as actual goals, but so does looking at our “pseudo-goals”, however arbitrary they may be.
Such an agent would also not change its decision theory as a result of philosophical consideration, which potentially limits its power.
I wouldn’t argue against this as written, but Stuart was claiming that convergence is “very unlikely” which I think is too strong.
I don’t think that follows, or at least the agent could change its decision theory as a result of some consideration, which may or may not be “philosophical”. We already have the example that a CDT agent that learns in advance it will face Newcomb’s problem could predict it would do better if it switched to TDT.
I wrote earlier
XDT (or in Eliezer’s words, “crippled and inelegant form of TDT”) is closer to TDT but still worse. For example, XDT would fail to acausally control/trade with other agents living before the time of its self-modification, or in other possible worlds.
Ah, yes, I agree that CDT would modify to XDT rather than TDT, though the fact that it self modifies at all shows that goal driven agents can change decision theories because the new decision theory helps it achieve its goal. I do think that it’s important to consider how a particular decision theory can decide to self modify, and to design an agent with a decision theory that can self modify in good ways.
Not strictly. If strongly goal’d agent determines that a different decision theory (or any change to itself) better maximizes its goal, it would adopt that new decision theory or change.
I agree that humans are not utility-maximizers or similar goal-oriented agents—not in the sense we can’t be modeled as such things, but in the sense that these models do not compress our preferences to any great degree, which happens to be because they are greatly at odds with our underlying mechanisms for determining preference and behavior.
Also, can we even get ‘real goals’ like this? We’re threading onto land of potentially proposing something as silly as blue unicorns on back side of the moon. We use goals to model other human intelligences, that is built into our language, that’s how we imagine other agents, that’s how you predict a wolf, a cat, another ape, etc. The goals are really easy within imagination (which is not reductionist and where the true paperclip count exists as a property of the ‘world’). Outside imagination, though...
Avoiding arbitrariness is useful to epistemic rationality and therefore to instrumental rationality. If an AI has rationality as a goal it will avoid arbitrariness, whether or not that assists with G1.
Avoiding giving credence to arbitrary beliefs is useful to epistemic rationality and therefor to instrumental rationality, and therefor to goal G1. Avoiding arbitrariness in goals still does not help with achieving G1 if G1 is considered arbitrary. Be careful not to conflate different types of arbitrariness.
Rationality is not an end goal, it is that which you do in pursuit of a goal that is more important to you than being rational.
You are making the standard MIRI assumptions that goals are unupdatable, and not including rationality (non arbitrariness, etc) as a terminal value. (The latter is particularly odd, as Orthogonality implies it).
I suspect we want universal goals for the same reason we want universal laws.
No, I am arguing that agents with goals generally don’t want to update their goals. Neither I nor MIRI assume goals are unupdatable, actually a major component of MIRI’s research is on how to make sure a self improving AI has stable goals.
It is possible to have an agent that terminally values meta properties of its own goal system. Such agents, if they are capable of modifying their goal system, will likely self modify to some self-consistent “attractor” system. This does not mean that all agents will converge on a universal goal system. There are different ways that agents can value meta properties of their own goal system, so there are likely many attractors, and many possible agents don’t have such meta values and will not want to modify their goal systems.
Who asserted they would? Moral agents can have all sorts of goals, They just have to respect each others values. If Smith wants to be an athlete, and Robinson is a budding writer, that doesn’t mean one of them is immoral.
Ok. That would be a problem with your suggestion of valuing arbitrary meta properties of their goal system. Then lets go back to my suggestion of valuing rationality.
Agents will do what they are built to do. If agents that don’t value rationality are dangerous, build ones that do.
MIRI: “We have detemined that cars without bbrakes are dangerous. We have also determined that the best solution is to reduce the speed limit to 10mph”
Everyone else: “We know cars without brakes are dangerous. That’s why we build them with brakes”.
Have to, or else what? And how do we separate moral agents from agents that are not moral?
Valuing rationality for what? What would an agent which “values rationality” do?
If the solution is to build agents that “value rationality,” can you explain how to do that? If it’s something so simple as to be analogous to adding brakes to a car, as opposed to, say, programming the car to be able to drive itself (let alone something much more complicated,) then it shouldn’t be so difficult to describe how to do it.
Have to, logically. Like even numbers have to be divisible,
How do we recognise anything? They have behaviour and characteristics which match the definition.
For itself. I do not accept that rationality can only be instrumental, a means to an end.
The kind of thing EY, the CFAR and other promoters of rationality urge people to do.
In the same kind of very broad terms that MIRI can explain how to build Artificial Obsessive Compulsives.
The analogy was not about simplicity. Illustrative analogies are always simpler than what they are illustrating: that is where their usefulness lies.
Robin Hanson’s ‘far mode’ (his take on construal level theory) is a plausible match to this ‘something’. Hanson points out that far mode is about general categories and creative metaphors. This is a match to something from AGI research...categorization and analogical inference. This can be linked to Bayesian inference by considering analogical inference as a natural way of reasoning about ‘priors’.
A plausible explanation is that analogical inference is associated with sentience (subjective experience), as suggested by Douglas Hofstadter (who has stated he thinks ‘analogies’ are the core of conscious cognition). Since sentience is closely associated with moral reasoning, it’s at least plausible that this ability could indeed give rise to converge on a particular G.
Here is a way G can be defined:
Analogical inference is concerned with Knowledge Representation (KR), so we could redefine ethics based on ‘representations of values’ (‘narratives’, which as Daniel Dennett has pointed out,indeed seem to be closely linked to subjective experience) rather than external consequences. At this point we can bring in the ideas of Schmidhuber and recall a powerful point made by Hanson (see below).
For maximum efficiency, all AGIs with the aforementioned ‘philosophical ability’ (analogical inference and production of narratives) would try to minimize the complexity of the cognitive processes generating its internal narratives. This could place universal contraints of what these values are. For example, Schmidhuber pointed out that data compression could be used to get a precise definition of ‘beauty’.
Lets now recall a powerful point Hanson made a while back on OB: the brain/mind can be totally defined in terms of a ‘signal processor’. Given this perspective, we could then view the correct G as the ‘signal’ and moral errors as ‘noise’. Algorithmic information theory could then be used to define a complexity metric that would precisely define this G.
Schmidthuber’s definition of beauty is wrong. He says, roughly, that you’re most pleased when after great effort you find a way to compress what was seemingly incompressible. If that were so, I could please you again and again by making up new AES keys with the first k bits random and the rest zero, and using them to generate and give you a few terabytes of random data. You’d have to brute force the key, at which point you’ll have compressed down from terabytes to kilobytes. What beauty! Let’s play the exact game again, with the exact same cipher but a different key, forever.
Right. That said, wireheading, aka the grounding problem, is a huge unsolved philosophical problem, so I’m not sure Schmidhuber is obligated to answer wireheading objections to his theory.
But the theory fails because this fits it but isn’t wireheading, right? It wouldn’t actually be pleasing to play that game.
I think you are right.
The two are errors that practically, with respect to hedonistic extremism, operate in opposing directions. They are similar in form in as much as they fit the abstract notion “undesirable outcomes due to lost purposes when choosing to optimize what turns out to be a poor metric for approximating actual preferences”.
Meh, yeah, maybe? Still seems like other, more substantive objections could be made.
Relatedly, I’m not entirely sure I buy Steve’s logic. PRNGs might not be nearly as interesting as short mathematical descriptions of complex things, like Chaitin’s omega. Arguably collecting as many bits of Chaitin’s omega as possible, or developing similar maths, would in fact be interesting in a human sense. But at that point our models really break down for many reasons, so meh whatever.
Unsolved philsophical problem? Huh? No additional philosophical breakthroughs are required for wireheading to not be a problem.
If I want (all things considered, etc) to wirehead, I’ll wirehead. If I don’t want to wirehead I will not wirehead. Wireheading introduces no special additional problems and is handled the same way all other preferences about future states of the universe can be handled.
(Note: It is likely that you have some more specific point regarding in what sense you consider wireheading ‘unsolved’. I welcome explanations or sources.)
Unsolved in the sense that we don’t know how to give computer intelligences intentional states in a way that everyone would be all like “wow that AI clearly has original intentionality and isn’t just coasting off of humans sitting at the end of the chain interpreting their otherwise entirely meaningless symbols”. Maybe this problem is just stupid and will solve itself but we don’t know that yet, hence e.g. Peter’s (unpublished?) paper on goal stability under ontological shifts. (ETA: I likely don’t understand how you’re thinking about the problem.)
Being able to do this would also be a step towards the related goal of trying to give computer intelligences intelligence that we cannot construe as ‘intentionality’ in any morally salient sense, so as to satisfy any “house-elf-like” qualms that we may have.
I assume you mean Ontological Crises in Artificial Agents’ Value Systems? I just finished republishing that one. Originally published form. New SingInst style form. A good read.
Engineering ability suffices:
http://lesswrong.com/lw/cej/general_purpose_intelligence_arguing_the/6lst
Do philosophers have an incredibly strong ugh field around anything that can be deemed ‘implementation detail’? Clearly, ‘superintelligence’ the string of letters can have what ever ‘goals’ the strings of letters, no objection here. The superintelligence in form of distributed system with millisecond or worse lag between components, and nanosecond or better clock speed, on the other hand...
Looking at your post at http://lesswrong.com/lw/2id/metaphilosophical_mysteries, I can see the sketch of an argument. It goes something like “we know that some decision theories/philosophical processes are ’objectively ’inferior, hence some are objectively superior, hence (wave hands furiously) it is at least possible that some system is objectively best”.
I would counter:
1) The argument is very weak. We know some mathematical axiomatic systems are contradictory, hence inferior. It doesn’t follow from that that there is any “best” system of axioms.
2) A lot of philosophical progress is entirely akin to mathematical progress: showing the consequences of the axioms/assumptions. This is useful progress, but not really relevant to the argument.
3) All the philosophical progress seems to lie on the “how to make better decisions given a goal” side; none of it lies on the “how to have better goals” side. Even the expected utility maximisation result just says “if you are unable to predict effectively over the long term, then to achieve your current goals, it would be more efficient to replace these goals with others compatible with a utility function”.
However, despite my objections, I have to note that the argument is at least an argument, and provides some small evidence in that direction. I’ll try and figure out whether it should be included in the paper.
Other possibility that is easy to see if you are to think more like an engineer and less like philosopher:
The AI is to operate with light-speed delay, and has to be made of multiple nodes. It is entirely possible that some morality systems would not allow efficient solutions to this challenge (i.e. would break into some sort of war between modules, or otherwise fail to intellectually collaborate).
It is likely that there’s only a limited number of good solutions to P2P intelligence design, and the one that would be found would be substantially similar to our own solution of fundamentally same problem, solution which we call ‘morality’, complete with various non-utilitarian quirks.
edit: that is, our ‘morality’ is the set of rules for inter-node interaction in society, and some of such rules just don’t work. Orthogonality thesis for anything in any sense practical is a conjunction of potentially very huge number of propositions (which are assumed false without consideration, by omission) - any sort of consideration not yet considered can break the symmetry between different goals, then another such consideration is incredibly unlikely to add symmetry back.
If an agent with goal G1 acquires sufficient “philosophical ability”, that it concludes that goal G is the right goal to have, that means that it decided that the best way to achieve goal G1 is to pursue goal G. For that to happen, I find it unlikely that goal G is anything other than a clarification of goal G1 in light of some confusion revealed by the “philosophical ability”, and I find it extremely unlikely that there is some universal goal G that works for any goal G1.
Offbeat counter: You’re assuming that this ontology that privileges “goals” over e.g. morality is correct. What if it’s not? Are you extremely confident that you’ve carved up reality correctly? (Recall that EU maximizers haven’t been shown to lead to AGI, and that many philosophers who have thought deeply about the matter hold meta-ethical views opposed to your apparent meta-ethics.) I.e., what if your above analysis is not even wrong?
I don’t believe that goals are ontologically fundamental. I am reasoning (at a high level of abstraction) about the behavior of a physical system designed to pursue a goal. If I understood what you mean by “morality”, I could reason about a physical system designed to use that and likely predict different behaviors than for the physical system designed to pursue a goal, but that doesn’t change my point about what happens with goals.
I don’t expect EU maximizers to lead to AGI. I expect EU maximizing AGIs, whatever has led to them, to be effective EU maximizers.
Sorry, I meant “ontology” in the information science sense, not the metaphysics sense; I simply meant that you’re conceptually (not necessarily metaphysically) privileging goals. What if you’re wrong to do that? I suppose I’m suggesting that carving out “goals” might be smuggling in conclusions that make you think universal convergence is unlikely. If you conceptually privileged rational morality instead, as many meta-ethicists do, then your conclusions might change, in which case it seems you’d have to be unjustifiably confident in your “goal”-centric conceptualization.
I think I am only “privileging” goals in a weak sense, since by talking about a goal driven agent, I do not deny the possibility of an agent built on anything else, including your “rational morality”, though I don’t know what that is.
Are you arguing that a goal driven agent is impossible? (Note that this is a stronger claim than it being wiser to build some other sort of agent, which would not contradict my reasoning about what a goal driven agent would do.)
(Yeah, the argument would have been something like, given a sufficiently rich and explanatory concept of “agent”, goal-driven agents might not be possible—or more precisely, they aren’t agents insofar as they’re making tradeoffs in favor of local homeostatic-like improvements as opposed to traditionally-rational, complex, normatively loaded decision policies. Or something like that.)
Let me try to strengthen your point. If an agent with goal G1 acquires sufficient “philosophical ability”, that it concludes that goal G is the right goal to have, that means that it decided that the best way to achieve goal G1 is to pursue what it thinks is the “right goal to have”. This would require it to take a kind of normative stance on goal fulfillment, which would require it to have normative machinery, which would need to be implemented in the agents mind. Is it impossible to create an agent without normative machinery of this kind? Does philosophical ability depend directly on normative machinery?
Let G1=”Figure out the right goal to have”