Is anyone even trying? I know Eliezer wants to build an AGI with the property in question, but I don’t think even he is trying to actually build one, is he?
I don’t expect AGIs to be built soon (10 years as a lower bound, but still very very unlikely), but then again, what’s your point? Eventually, they are bound to appear, and I’m discussing specifically these systems, not Microsoft Word.
Your “only your particular imaginary AGI has that property” implies that you call “AGI” systems that are not autonomous, which looks like obvious misapplication of the term. Human-level AGIs trivially allow construction of autonomous systems in my sense, by creating multiple copies, even if individual such AGIs (as are humans) don’t qualify.
First, let’s dispose of the abuse of the word autonomous, as that English word doesn’t correspond to the property you are describing. If the property in question existed in real life (which it doesn’t), the closest English description would be something like deranged monomaniacal sociopath.
That having been said, given an advanced AGI, it would be possible to reprogram it to be a deranged monomaniacal sociopath. It wouldn’t be trivial, and nobody would have any rational motive for doing it, but it would be possible. What of it? That tells us nothing whatsoever about the best way to go about building an AGI.
First, let’s dispose of the abuse of the word autonomous, as that English word doesn’t correspond to the property you are describing. If the property in question existed in real life (which it doesn’t), the closest English description would be something like deranged monomaniacal sociopath.
Since I use the term as applying to groups of humans, you should debate this point of disagreement before going further. You obviously read in it something I didn’t intend.
Certainly. The property in question is that of being obsessed with a single goal to the point of not only committing absolutely all resources to the pursuit of same, but being willing to commit any crime whatsoever in the process, and being absolutely unwilling to consider modifying the goal in any way under any circumstances. No group of humans (or any other kind of entity) has ever had this property.
This looks like a disagreement about whether there is precise preference (ordering all possible states of the world, etc.) for (specific) humans, that one is unwilling to modify in any way (though probably not able to keep from changing). Should we shift the focus of the argument to that point? (I thought considering the notion of preference for non-anthropomorphic autonomous agents should be easier, but it seems not, in this case.)
I think that’s a good idea—it’s easier to argue about the properties of entities that actually exist.
It seems very clear to me that no human has such a precise preference. We violate the principles of decision theory, such as transitiveness, all the time (which admittedly in some cases can be considered irrational). We adhere to ethical constraints (which cannot be considered irrational). And we often change our preferences in response to experience, rational argument and social pressure, and we even go out of our way to seek out the kinds of experiences, arguments and social interactions that are likely to bring about such change.
Yes, we are not reflectively consistent (change our preference), but is it good? Yes, we make decisions inconsistently, but is it good? The notion of preference, as I use it, refers to such judgments, and any improvement in the situation is described by it as preferable. Preference is not about wants or likes, even less so about actual actions, since even a superintelligence won’t be able to only make most preferable actions.
I’m not sure if I have your concept of preference right.
Could a theist human with a fixed preference do the following: Change their mind about the existence of souls and sign up for cryonics? If they can’t then that is one situation where having a fixed preference is not good.
I’m not sure you can have a fixed preference if you don’t have a fixed ontology, and not having a fixed ontology has been a good thing at least in terms of humanities ability to control the world.
Could a theist human with a fixed preference do the following: Change their mind about the existence of souls and sign up for cryonics? If they can’t then that is one situation where having a fixed preference is not good.
Being at the top of meta, preference is not obviously related to likes, wants or beliefs. It is what you want on reflection, given infinite computational power, etc., but not at all necessarily what you currently believe you want. (Compare to the semantics of a computer program, which is probably uncomputable vs. what you can conclude from its source code in finite time.)
I’m not sure you can have a fixed preference if you don’t have a fixed ontology, and not having a fixed ontology has been a good thing at least in terms of humanities ability to control the world.
This is called the ontology problem in FAI, and I believe I have a satisfactory solution to it for the purposes of FAI (roughly, two agents have the same preference if they agree on what should be done/thought in each epistemic state; here, no reference to the real world is made; for FAI, we only need to duplicate human preference in FAI, not understand it), which I’m currently describing on my blog.
I’ve read some of your blog. I find it hard to pin down and understand something that is not obviously related to what is going on around us.
This is called the ontology problem in FAI, and I believe I have a satisfactory solution to it for the purposes of FAI (roughly, two agents have the same preference if they agree on what should be done/thought in each epistemic state; here, no reference to the real world is made; for FAI, we only need to duplicate human preference in FAI, not understand it)
Hmm, interesting. Do you have a way of separating the epistemic state from the other state of a self-modifying intelligence? Would knowledge about what my goals are come under epistemic state?
I find it hard to pin down and understand something that is not obviously related to what is going on around us.
Me too, but it seems that what we really want, and would like an external agent to implement without further consulting with us, is really a structure with these confusing properties.
Would knowledge about what my goals are come under epistemic state?
Yes, everything you are (as a mind) is epistemic state. A rigid boundary around the mind is necessary to fight the ontology problem, even where people obviously externalize some of their computation, and depend on irrelevant low-level events that affect computation within the brain. (A brain won’t work in this context, though an emulated space ship, like in this metaphor, is fine, in which case preference of the ship is about what should be done on the ship, given each state of the ship.)
(roughly, two agents have the same preference if they agree on what should be done/thought in each epistemic state;
Yes, everything you are (as a mind) is epistemic state. A rigid boundary around the mind is necessary to fight the ontology problem,
Now I am really confused. If an agent is has the same epistemic state as me, that is it is everything that I am (as a mind), then surely it will have the same preference(assuming determinism)!?
Or are you talking about something like the following.
A B and C are agents
forall C. action/thought (A, C) = action/thought( B, C) → same_preference (A , B)
Where action/thought is a function that takes two agents and returns the actions and thoughts that the first agent thinks the second should have. As two humans will somewhat agree what a dog should do depending upon what the dog knows?
Now I am really confused. If an agent is has the same epistemic state as me, that is it is everything that I am (as a mind), then surely it will have the same preference(assuming determinism)!?
Yes, your exact copy has same preference as you, why?
forall C. action/thought (A, C) = action/thought( B, C) → same_preference (A , B)
More like action/thought (A, A) = action/thought( B, B) → same_preference (A , B). I don’t understand why you gave that particular formulation, so not sure if my reply is helpful. The ontologically boxed agents only have preference about their own thoughts/actions, there is no real world or other agents for them, though inside their mind they may have all kinds of concepts that they can consider (for example, agent A can have a concept of agent B, as an ontologically boxed agent).
(roughly, two agents have the same preference if they agree on what should be done/thought in each epistemic state; here
So lets say there is me and a paper clipper, do we share the same preference? If I was everything as a mind the paper clipper was, I would want to paper clip, right? And similarly the paper clipper if it was given my epistemic state would want to do what I do.
So I don’t see how all agents don’t share the same preference, under this definition.
Yes, technically stating this needs work, but the idea should be clear: you and a paperclipper disagree on what should be done by the paperclipper in a given paperclipper’s state.
That was what I was getting at with my A B C example.
A = you
B = paperclipper
C = different paperclipper states
However I am not sure that this solves the ontology problem, as you will have people with bad/simple ontologies judging what people with complex/accurate ontologies should do.
Or is this another stage where we need to give infinite resources? Would that solve the problem?
A = you B = paperclipper C = different paperclipper states
I see. Yes, that should work as an informal explanation.
However I am not sure that this solves the ontology problem, as you will have people with bad/simple ontologies judging what people with complex/accurate ontologies should do.
There is no difference in ontology between different programs, so I’m not sure what you refer to. They are all “boxed” inside their own computations, and they only work with their own computations, though this activity can be interpreted as thinking about external world. I expect the judging of similarity of preference to be some kind of generally uncomputable condition, such as asking whether two given programs (not the agent programs, some constructions of them) have the same outputs, which should be possible to theoretically verify in special cases, for example you know that two copies of the same program have the same outputs.
Okay so now we seem to be agreeing humans and groups thereof do not have the property to which you refer. That’s progress.
As to whether the property in question is good, on the one hand you seem to be saying it is, on the other hand you have agreed that if you could build an AGI that way (which you can’t), you would end up with something that would try to murder you and recycle you as paperclips because you made a one line mistake in writing its utility function. I see a contradiction here. Do you see a contradiction here?
Okay so now we seem to be agreeing humans and groups thereof do not have the property to which you refer. That’s progress.
Since I didn’t indicate changing my mind, that’s an unlikely conclusion. And it’s wrong. What did you interpret as implying that (so that I can give a correct interpretation)?
Not all autonomous agents are reflectively consistent. The autonomous agents that are not reflectively consistent want to become such (or to construct a singleton with their preference that is reflectively consistent). Preference is associated even with agents that are not autonomous (e.g. mice).
Intelligent agents have two thresholds in ability important in the long run: autonomy and reflective consistency. Autonomy is a point where an intelligent agent has a prospect of open-ended development, with a chance to significantly influence the whole world (by building/becoming a reflectively consistent agent). Humanity is autonomous in this sense, as probably are small groups of smart humans if given a much longer lifespan (although cultish attractors may stall progress indefinitely). Reflective consistency is the ability to preserve one’s preference, bringing the specific preference to the future without creating different-preference free-running agents. The principal defects of merely autonomous agents are uncontrollable preference drift and inability to effectively prevent reflectively consistent agents of different preference from taking over the future; only when reflective consistency is achieved, does the drift stop, and the preference extinction risk gets partially alleviated.
As with advanced AI, so is with humanity, there is danger in lack of reflective consistency. An autonomous agent, while not as dangerous as a reflectively consistent agent (though possibly still lethal), is a reflectively consistent agent with alien preference waiting to happen. Most autonomous agents would seek to construct a reflectively consistent agent with same preference, their own kind of FAI. A given autonomous agent can (1) drift from its original preference before becoming reflectively consistent, so that the end-result is different, (2) construct another different-preference autonomous non-reflective agent, which could eventually lead to a different-preference reflective agent, (3) fail at the construction of its FAI, creating a de novo reflectively-consistent agent of wrong preference; or, if all goes well, (4) succeed at building/becoming a reflectively consistent agent of same preference. Humanity faces these risks, and any non-reflective autonomous AI that we may develop in the future would add to them, even if this non-reflective AI shares our preference exactly at the time of construction. A proper Friendly AI has to be reflectively consistent from the start.
Disproof by counterexample: I don’t want to become reflectively consistent in the sense you’re using the phrase.
Edit in response to your edit: the terms autonomous and reflectively consistent are used in the passage you quote to mean different things than you have been using them to mean.
But what do you want? Whatever you want, it is an implicit consistent statement about all time, so the most general wish granted to you consists in establishing a reflectively consistent singleton that implements this statement during all of the future.
For example, I would prefer that people not die, but if some people choose to die, I would not forcibly prevent them, nor would I license any other entity to initiate the use of force for that purpose, so no, I would not wish for a genie that always prevents people from dying no matter what.
I would not wish for a genie that always prevents people from dying no matter what.
What about genies that prevent people from dying conditionally on something, as opposed to always? It’s an artificial limitation you’ve imposed, the FAI can compute its ifs.
Like other people, I care not only about the outcome, but that it was not reached by unethical means; and am prepared to accept that I don’t have a unique ranking order for all outcomes, and that I may be mistaken in some of my preferences, and that I should be more tentative in areas where I am more likely to be mistaken.
Could we aim, ultimately, to build an AGI with such properties? Yes indeed, and if we ever set out to build a self-willed AGI, that is how we should do it—precisely because it would have properties very different from those of the monomaniac utilitarian AGI postulated in most of what’s been written about friendly AI so far.
Yes indeed, and if we ever set out to build a self-willed AGI, that is how we should do it—precisely because it would have properties very different from those of the monomaniac utilitarian AGI postulated in most of what’s been written about friendly AI so far.
Please pin it down: what are you talking about on both accounts (“how we should do it” and “the monomaniac utilitarian AGI”), and where do you place your interpretation of my concept of preference.
I can have a go at that, but a comment box in a thread buried multiple hidden layers down is a pretty cramped place to do it. Figure it’s appropriate for a top-level post? Or we could take it to one of the AGI mailing lists.
I meant to ask for a short indication of what you meant, long description will be a mistake, since you’ll misinterpret a lot of what I meant, given how little of the assumed ideas you agree with or understand the way they are intended.
Signal to humbug ratio on AGI mailing lists is too low.
Well, I had been attempting to give short indications of what I meant already, but I’ll try. Basically, a pure utilitarian (if you could build such an entity of high intelligence, which you can’t) would be a monomaniac, willing to commit any crime in the service of its utility function. That means a ridiculous amount of weight goes onto writing the perfect utility function (which is impossible), and then in an attempt to get around that you end up with lunacy like CEV (which is, very fortunately, impossible), and the whole thing goes off the rails. What I’m proposing is that if anything like a self-willed AGI is ever built, it will have to be done in stages with what it does co-developed with how it does it, which means that by the time it’s being trusted with the capability to do something in the external world, it will already have all sorts of built-in constraints on what it does and how it does it, that will necessarily have been developed along with and be an integral part of the system. That’s the only way it can work (unless we stick to purely smart tool AI, which is also an option), and it means we don’t have to take an exponentially unlikely gamble on writing the perfect utility function.
Well, I feel unable to effectively communicate with you on this topic (the fact that I persisted for so long is due to unusual mood, and isn’t typical—I’ve been answering all comments directed to me for the last day). Good luck, maybe you’ll see the light one day.
No, only your particular imaginary AGI has that property—which is one of several reasons why nobody is actually going to build an AGI like that.
I wish.
Is anyone even trying? I know Eliezer wants to build an AGI with the property in question, but I don’t think even he is trying to actually build one, is he?
I don’t expect AGIs to be built soon (10 years as a lower bound, but still very very unlikely), but then again, what’s your point? Eventually, they are bound to appear, and I’m discussing specifically these systems, not Microsoft Word.
Your “only your particular imaginary AGI has that property” implies that you call “AGI” systems that are not autonomous, which looks like obvious misapplication of the term. Human-level AGIs trivially allow construction of autonomous systems in my sense, by creating multiple copies, even if individual such AGIs (as are humans) don’t qualify.
First, let’s dispose of the abuse of the word autonomous, as that English word doesn’t correspond to the property you are describing. If the property in question existed in real life (which it doesn’t), the closest English description would be something like deranged monomaniacal sociopath.
That having been said, given an advanced AGI, it would be possible to reprogram it to be a deranged monomaniacal sociopath. It wouldn’t be trivial, and nobody would have any rational motive for doing it, but it would be possible. What of it? That tells us nothing whatsoever about the best way to go about building an AGI.
Since I use the term as applying to groups of humans, you should debate this point of disagreement before going further. You obviously read in it something I didn’t intend.
Certainly. The property in question is that of being obsessed with a single goal to the point of not only committing absolutely all resources to the pursuit of same, but being willing to commit any crime whatsoever in the process, and being absolutely unwilling to consider modifying the goal in any way under any circumstances. No group of humans (or any other kind of entity) has ever had this property.
This looks like a disagreement about whether there is precise preference (ordering all possible states of the world, etc.) for (specific) humans, that one is unwilling to modify in any way (though probably not able to keep from changing). Should we shift the focus of the argument to that point? (I thought considering the notion of preference for non-anthropomorphic autonomous agents should be easier, but it seems not, in this case.)
I think that’s a good idea—it’s easier to argue about the properties of entities that actually exist.
It seems very clear to me that no human has such a precise preference. We violate the principles of decision theory, such as transitiveness, all the time (which admittedly in some cases can be considered irrational). We adhere to ethical constraints (which cannot be considered irrational). And we often change our preferences in response to experience, rational argument and social pressure, and we even go out of our way to seek out the kinds of experiences, arguments and social interactions that are likely to bring about such change.
Yes, we are not reflectively consistent (change our preference), but is it good? Yes, we make decisions inconsistently, but is it good? The notion of preference, as I use it, refers to such judgments, and any improvement in the situation is described by it as preferable. Preference is not about wants or likes, even less so about actual actions, since even a superintelligence won’t be able to only make most preferable actions.
I’m not sure if I have your concept of preference right.
Could a theist human with a fixed preference do the following: Change their mind about the existence of souls and sign up for cryonics? If they can’t then that is one situation where having a fixed preference is not good.
I’m not sure you can have a fixed preference if you don’t have a fixed ontology, and not having a fixed ontology has been a good thing at least in terms of humanities ability to control the world.
Being at the top of meta, preference is not obviously related to likes, wants or beliefs. It is what you want on reflection, given infinite computational power, etc., but not at all necessarily what you currently believe you want. (Compare to the semantics of a computer program, which is probably uncomputable vs. what you can conclude from its source code in finite time.)
This is called the ontology problem in FAI, and I believe I have a satisfactory solution to it for the purposes of FAI (roughly, two agents have the same preference if they agree on what should be done/thought in each epistemic state; here, no reference to the real world is made; for FAI, we only need to duplicate human preference in FAI, not understand it), which I’m currently describing on my blog.
I’ve read some of your blog. I find it hard to pin down and understand something that is not obviously related to what is going on around us.
Hmm, interesting. Do you have a way of separating the epistemic state from the other state of a self-modifying intelligence? Would knowledge about what my goals are come under epistemic state?
Me too, but it seems that what we really want, and would like an external agent to implement without further consulting with us, is really a structure with these confusing properties.
Yes, everything you are (as a mind) is epistemic state. A rigid boundary around the mind is necessary to fight the ontology problem, even where people obviously externalize some of their computation, and depend on irrelevant low-level events that affect computation within the brain. (A brain won’t work in this context, though an emulated space ship, like in this metaphor, is fine, in which case preference of the ship is about what should be done on the ship, given each state of the ship.)
Now I am really confused. If an agent is has the same epistemic state as me, that is it is everything that I am (as a mind), then surely it will have the same preference(assuming determinism)!?
Or are you talking about something like the following.
A B and C are agents
forall C. action/thought (A, C) = action/thought( B, C) → same_preference (A , B)
Where action/thought is a function that takes two agents and returns the actions and thoughts that the first agent thinks the second should have. As two humans will somewhat agree what a dog should do depending upon what the dog knows?
Yes, your exact copy has same preference as you, why?
More like action/thought (A, A) = action/thought( B, B) → same_preference (A , B). I don’t understand why you gave that particular formulation, so not sure if my reply is helpful. The ontologically boxed agents only have preference about their own thoughts/actions, there is no real world or other agents for them, though inside their mind they may have all kinds of concepts that they can consider (for example, agent A can have a concept of agent B, as an ontologically boxed agent).
So lets say there is me and a paper clipper, do we share the same preference? If I was everything as a mind the paper clipper was, I would want to paper clip, right? And similarly the paper clipper if it was given my epistemic state would want to do what I do.
So I don’t see how all agents don’t share the same preference, under this definition.
Yes, technically stating this needs work, but the idea should be clear: you and a paperclipper disagree on what should be done by the paperclipper in a given paperclipper’s state.
That was what I was getting at with my A B C example.
A = you B = paperclipper C = different paperclipper states
However I am not sure that this solves the ontology problem, as you will have people with bad/simple ontologies judging what people with complex/accurate ontologies should do.
Or is this another stage where we need to give infinite resources? Would that solve the problem?
I see. Yes, that should work as an informal explanation.
There is no difference in ontology between different programs, so I’m not sure what you refer to. They are all “boxed” inside their own computations, and they only work with their own computations, though this activity can be interpreted as thinking about external world. I expect the judging of similarity of preference to be some kind of generally uncomputable condition, such as asking whether two given programs (not the agent programs, some constructions of them) have the same outputs, which should be possible to theoretically verify in special cases, for example you know that two copies of the same program have the same outputs.
Okay so now we seem to be agreeing humans and groups thereof do not have the property to which you refer. That’s progress.
As to whether the property in question is good, on the one hand you seem to be saying it is, on the other hand you have agreed that if you could build an AGI that way (which you can’t), you would end up with something that would try to murder you and recycle you as paperclips because you made a one line mistake in writing its utility function. I see a contradiction here. Do you see a contradiction here?
Since I didn’t indicate changing my mind, that’s an unlikely conclusion. And it’s wrong. What did you interpret as implying that (so that I can give a correct interpretation)?
“Yes, we are not reflectively consistent (change our preference), but is it good? Yes, we make decisions inconsistently, but is it good?”
Not all autonomous agents are reflectively consistent. The autonomous agents that are not reflectively consistent want to become such (or to construct a singleton with their preference that is reflectively consistent). Preference is associated even with agents that are not autonomous (e.g. mice).
This is discussed in the post Friendly AI: a vector for human preference:
Disproof by counterexample: I don’t want to become reflectively consistent in the sense you’re using the phrase.
Edit in response to your edit: the terms autonomous and reflectively consistent are used in the passage you quote to mean different things than you have been using them to mean.
But what do you want? Whatever you want, it is an implicit consistent statement about all time, so the most general wish granted to you consists in establishing a reflectively consistent singleton that implements this statement during all of the future.
For example, I would prefer that people not die, but if some people choose to die, I would not forcibly prevent them, nor would I license any other entity to initiate the use of force for that purpose, so no, I would not wish for a genie that always prevents people from dying no matter what.
What about genies that prevent people from dying conditionally on something, as opposed to always? It’s an artificial limitation you’ve imposed, the FAI can compute its ifs.
Like other people, I care not only about the outcome, but that it was not reached by unethical means; and am prepared to accept that I don’t have a unique ranking order for all outcomes, and that I may be mistaken in some of my preferences, and that I should be more tentative in areas where I am more likely to be mistaken.
Could we aim, ultimately, to build an AGI with such properties? Yes indeed, and if we ever set out to build a self-willed AGI, that is how we should do it—precisely because it would have properties very different from those of the monomaniac utilitarian AGI postulated in most of what’s been written about friendly AI so far.
Please pin it down: what are you talking about on both accounts (“how we should do it” and “the monomaniac utilitarian AGI”), and where do you place your interpretation of my concept of preference.
I can have a go at that, but a comment box in a thread buried multiple hidden layers down is a pretty cramped place to do it. Figure it’s appropriate for a top-level post? Or we could take it to one of the AGI mailing lists.
I meant to ask for a short indication of what you meant, long description will be a mistake, since you’ll misinterpret a lot of what I meant, given how little of the assumed ideas you agree with or understand the way they are intended.
Signal to humbug ratio on AGI mailing lists is too low.
Well, I had been attempting to give short indications of what I meant already, but I’ll try. Basically, a pure utilitarian (if you could build such an entity of high intelligence, which you can’t) would be a monomaniac, willing to commit any crime in the service of its utility function. That means a ridiculous amount of weight goes onto writing the perfect utility function (which is impossible), and then in an attempt to get around that you end up with lunacy like CEV (which is, very fortunately, impossible), and the whole thing goes off the rails. What I’m proposing is that if anything like a self-willed AGI is ever built, it will have to be done in stages with what it does co-developed with how it does it, which means that by the time it’s being trusted with the capability to do something in the external world, it will already have all sorts of built-in constraints on what it does and how it does it, that will necessarily have been developed along with and be an integral part of the system. That’s the only way it can work (unless we stick to purely smart tool AI, which is also an option), and it means we don’t have to take an exponentially unlikely gamble on writing the perfect utility function.
Citations needed.
Well, I feel unable to effectively communicate with you on this topic (the fact that I persisted for so long is due to unusual mood, and isn’t typical—I’ve been answering all comments directed to me for the last day). Good luck, maybe you’ll see the light one day.