I think we’ve hit on a serious misunderstanding here. Clippy is relatively easy to make; you or I could probably come up with reasonable specifications for what qualifies as a paperclip, and it wouldn’t be too hard to program maximization of paperclips as an AI’s goal.
A completely unintelligent clip-making machine isn’t difficult to make. Or threatening. Clippy is supposed to be threatening due to its superintelligence, (You also need to solve goal stability).
Again, you’re trying to wrap “humanlike plus epistemically and instrumentally rational” into “rational,”
I did not write the quoted phrase, and it is not accurate.
but by bundling in humanlike morality,
I never said anything of the kind. I think it may be possible for a sufficiently rational agent to deduce morality, but that is no way equivalent to hardwiring into the agent, or into the definition of raitonal!
As an aside, if you value rationality as a goal in itself, would you want to be highly epistemically and instrumentally rational, but held at the mercy of a nigh-omnipotent tormentor who ensures that you fail at every task you set yourself to, are held in disdain by all your peers, and are only able to live at a subsistence level?
It’s simple logic that valuing rationality as a goal doesn’t mean valuing only rationality.
Most of the extent to which people ordinarily treat rationality as a goal is instrumental, and the motivations of beings who felt otherwise would probably seem rather absurd to us.
We laugh at the talking-snakes crowd and X-factor watchers, they laugh at the nerds and geeks. So it goes.
A number of schemes have been proposed in the literature.
That doesn’t answer my question. Please describe at least one which you think would be likely to work, and why you think it would work.
You can’t guess? Rationality-as-a-goal.
You’ve been consistently treating rationality-as-a-goal as a black box which solves all these problems, but you haven’t given any indication of how it can be programmed into an AI in such a way that makes it a simpler alternative to solving the Friendliness problem, and indeed when your descriptions seem to entail solving it.
ETA: When I asked you for examples of entities which have rationality as a goal, you gave examples which, by your admission, have other goals which are at the very least additional to rationality. So suppose that we program an intelligent agent which has only rationality as a goal. What does it do?
That doesn’t answer my question. Please describe at least one which you think would be likely to work, and why you think it would work.
I don;t have to, since the default likelihood of ethical objectivism isn’t zero.
You’ve been consistently treating rationality-as-a-goal as a black box which solves all these problems, but you haven’t given any indication of how it can be programmed into an AI in such a way that makes it a simpler alternative to solving the Friendliness problem, and indeed when your descriptions seem to entail solving it.
There a lots of ways of being biased, but few of being unbiased. Rationality, as described by EY, is lack of bias, Friendliness, as described by EY, is a complex and arbitrary set of biases.
What you’re effectively saying here is “I don’t have to offer any argument that I’m right, because it’s not impossible that I’m wrong.”
There a lots of ways of being biased, but few of being unbiased. Rationality, as described by EY, is lack of bias, Friendliness, as described by EY, is a complex and arbitrary set of biases.
Friendliness is a complex and arbitrary set of biases in the sense that human morality is a complex and arbitrary set of biases.
Okay, but I’m prepared to assert that it’s infinitesimally low,
It would have been helpful to argue rather than assert.
What you’re effectively saying here is “I don’t have to offer any argument that I’m right, because it’s not impossible that I’m wrong.”
ETA:
I am not arguing that MR is true. I am arguing that it has a certain probability, which subtracts from the overall probability of the MIRI problem/solution, and that MIRI needs to consider it more thoroughly.
and also that the Orthogonality Thesis applies even in the event that our universe has technically objective morality.
The OT is trivially false under some interpretations, and trivially true under others. I didn’t say it was entirely fasle, and in fact, I have appealed to it. The problem is that the versions that are true are not useful as a stage in the overall MIRI argument. Lack of relevance, in short.
Friendliness is a complex and arbitrary set of biases in the sense that human morality is a complex and arbitrary set of biases.
It would have been helpful to argue rather than assert.
I’m prepared to do so, but I’d be rather more amenable to doing so if you would also argue rather than simply asserting your position.
The OT is trivially false under some interpretations, and trivially true under others. I didn’t say it was entirely fasle, and in fact, I have appealed to it. The problem is that the versions that are true are not useful as a stage in the overall MIRI argument. Lack of relevance, in short.
Can you explain how the Orthogonality Thesis is not true in a relevant way with respect to the friendliness of AI?
I dare say EY would assert that. I wouldn’t.
In which case it should follow that Friendliness is easy, since Friendliness essentially boils down to determining and following what humans think of as “morality.”
If you’re hanging your trust on the objectivity of humanlike morality and its innate relevance to every goal-pursuing optimization force though, you’re placing your trust in something we have virtually no evidence to support the truth of. We may have intuitions to that effect, but there are also understandable reasons for us to hold such intuitions in the absence of their truth, and we have no evidence aside from those intuitions.
I’m prepared to do so, but I’d be rather more amenable to doing so if you would also argue rather than simply asserting your position.
I am not saying anything extraordinary. MR is not absurd, taken seriously by professional philosophers,etc.
Can you explain how the Orthogonality Thesis is not true in a relevant way with respect to the friendliness of AI?
It doesn’t exclude, or even render strongly unlikely, The AI could Figure Out Morality.
The mere presence of Clippies as theoretical possibilities in mindspace doesn’t imply anything about their probability. The OT mindspace needs to be weighted according the practical aims, limitations etc of real-world research.
In which case it should follow that Friendliness is easy, since Friendliness essentially boils down to determining and following what humans think of as “morality.”
Yes: based on my proposal it is no harder than rationality, since it follows from it. But I was explicitly discussing EY’s judgements.
If you’re hanging your trust on the objectivity of humanlike morality
I never said that. I don’t think morality is necessarily human orientated, and I don’t think an AI needs to have an intrinsically human morality to behave morally towards us itself—for the same reason that one can behave politely in a foreign country, or behave ethically towards non-huamn animals.
its innate relevance to every goal-pursuing optimization force
It doesn’t exclude, or even render strongly unlikely, The AI could Figure Out Morality.
This is more or less exactly what the Orthogonality Thesis argues against. That is, even if we suppose that an objective morality exists (something that, unless we have hard evidence for it, we should assume is not the case,) an AI would not care about it by default.
How would you program an AI to determine objective morality and follow that?
The mere presence of Clippies as theoretical possibilities in mindspace doesn’t imply anything about their probability. The OT mindspace needs to be weighted according the practical aims, limitations etc of real-world research.
Yes, but the presence of humanlike intellects in mindspace doesn’t tell us that they’re an easy target to hit in mindspace by aiming for it either.
If you cannot design a humanlike intellect, or point to any specific model by which one could do so, then you’re not in much of a position to assert that it should be an easy task.
I never said that. I don’t think morality is necessarily human orientated, and I don’t think an AI needs to have an intrinsically human morality to behave morally towards us itself—for the same reason that one can behave politely in a foreign country, or behave ethically towards non-huamn animals.
One can behave “politely”, by human standards, towards foreign countries, or “ethically,” by human standards, towards non-human animals. Humans have both evolved drives and game theoretic concerns which motivate these sorts of behaviors. “For the same reasons” does not seem to apply at all here, because
a) A sufficiently powerful AI does not need to cooperate within a greater community of humans, it could easily crush us all. One of the most reproductively successful humans in history was a conqueror who founded an empire which in three generations expanded to include more than a quarter of the total world population at the time. The motivation to gain resources by competition is a drive which exists in opposition to the motivation to minimize risk by cooperation and conflict avoidance. If human intelligence had developed in the absence of the former drive, then we would all be reflexive communists. An AI, on the other hand, is developed in the absence of either drive. To the extent that we want it to behave as if it were an intelligence which had developed in the context of needing to cooperate with others, we’d have to program that in.
b) Our drives to care about other thinking beings are also evolved traits. A machine intelligence does not by default value human beings more than sponges or rocks.
One might program such drives into an AI, but again, this is really complicated to do, and an AI will not simply pull them out of nowhere.
That is, even if we suppose that an objective morality exists (something that, unless we have hard evidence for it, we should assume is not the case,) an AI would not care about it by default.
The OT mindspace may consist of 99% of AIs that don’t care. That is completely irrelvant, becuase it doesn’t translate into
a 99% likelihood of accidentally building a Clippy.
How would you program an AI to determine objective morality and follow that?
Rationality-as-a-goal.
Yes, but the presence of humanlike intellects in mindspace doesn’t tell us that they’re an easy target to hit in mindspace by aiming for it either.
None of this is easy.
If you cannot design a humanlike intellect, or point to any specific model by which one could do so, then you’re not in much of a position to assert that it should be an easy task.
I can’t practically design my AI, and you can;t yours. I can theoretically specify my AI, and you can yours.
a) A sufficiently powerful AI does not need to cooperate within a greater community of humans, it could easily crush us all.
I am not talking about any given AI.
b) Our drives to care about other thinking beings are also evolved traits. A machine intelligence does not by default value human beings more than sponges or rocks.
I am not talking about “default”.
One might program such drives into an AI, but again, this is really complicated to do, and an AI will not simply pull them out of nowhere.
Almost everything in this field is really difficult.
And one doesn’t have to programme them. If sociability is needed to live in societies, then pluck Ais from succesful societies.
The OT mindspace may consist of 99% of AIs that don’t care. That is completely irrelvant, becuase it doesn’t translate into a 99% likelihood of accidentally building a Clippy.
The problem is that the space of minds which are human-friendly is so small that it’s extremely difficult to hit even when we’re trying to hit it.
The broad side of a barn may compose one percent of all possible target space at a hundred paces, while still being easy to hit. A dime on the side of the barn will be much, much harder. Obviously your chances of hitting the dime will be much higher than if you were firing randomly through possible target space, but if you fire at it, you will still probably miss.
Rationality-as-a-goal.
Taboo rationality-as-a-goal, it’s obviously an impediment to this discussion.
The problem is that the space of minds which are human-friendly is so small that it’s extremely difficult to hit even when we’re trying to hit it.
The problem is that the space of minds which are human-friendly is so small that it’s extremely difficult to hit even when we’re trying to hit it.
If by “human-friendly” minds, you mean a mind that is wired up to be human-friendly, and only human-friendly (as in EY’s architecture)., and if you assume that human friendliness is a rag-bag of ad-hoc behaviours with no hope or rational deducibility (as EY also assumes) that would be true.
That may be difficult to hit, but it is not what I am aiming at.
What I am talking about is a mind that has a general purpose rationality (which can be applied to specfic problems., like all rationality), and a general purpose morality (likewise applicable to specific problems). If will not be intrinsically,
compulsively and inflexibly human-friendly, like EY’s architecture. If it finds itself among humans it will be human-friendly because it can (its rational) and because it wants to (it’s moral). OTOH, if it finds itself amongst Tralfamadorians, it will be be Tralfamadorian-friendly.
Taboo rationality-as-a-goal, it’s obviously an impediment to this discussion.
My using words that mean what I say to say what I mean is not the problem. The problem is that you keep inaccurately paraphrasing what I say, and then attacking the paraphrase.
My using words that mean what I say to say what I mean is not the problem. The problem is that you keep inaccurately paraphrasing what I say, and then attacking the paraphrase.
The words do not convey what you mean. If my interpretation of what you mean is inaccurate, then that’s a sign that you need to make your position clearer.
A completely unintelligent clip-making machine isn’t difficult to make. Or threatening. Clippy is supposed to be threatening due to its superintelligence, (You also need to solve goal stability).
I did not write the quoted phrase, and it is not accurate.
I never said anything of the kind. I think it may be possible for a sufficiently rational agent to deduce morality, but that is no way equivalent to hardwiring into the agent, or into the definition of raitonal!
It’s simple logic that valuing rationality as a goal doesn’t mean valuing only rationality.
We laugh at the talking-snakes crowd and X-factor watchers, they laugh at the nerds and geeks. So it goes.
How, and why would it care?
A number of schemes have been proposed in the literature.
You can’t guess? Rationality-as-a-goal.
That doesn’t answer my question. Please describe at least one which you think would be likely to work, and why you think it would work.
You’ve been consistently treating rationality-as-a-goal as a black box which solves all these problems, but you haven’t given any indication of how it can be programmed into an AI in such a way that makes it a simpler alternative to solving the Friendliness problem, and indeed when your descriptions seem to entail solving it.
ETA: When I asked you for examples of entities which have rationality as a goal, you gave examples which, by your admission, have other goals which are at the very least additional to rationality. So suppose that we program an intelligent agent which has only rationality as a goal. What does it do?
I don;t have to, since the default likelihood of ethical objectivism isn’t zero.
There a lots of ways of being biased, but few of being unbiased. Rationality, as described by EY, is lack of bias, Friendliness, as described by EY, is a complex and arbitrary set of biases.
Okay, but I’m prepared to assert that it’s infinitesimally low, and also that the Orthogonality Thesis applies even in the event that our universe has technically objective morality.
What you’re effectively saying here is “I don’t have to offer any argument that I’m right, because it’s not impossible that I’m wrong.”
Friendliness is a complex and arbitrary set of biases in the sense that human morality is a complex and arbitrary set of biases.
It would have been helpful to argue rather than assert.
ETA: I am not arguing that MR is true. I am arguing that it has a certain probability, which subtracts from the overall probability of the MIRI problem/solution, and that MIRI needs to consider it more thoroughly.
The OT is trivially false under some interpretations, and trivially true under others. I didn’t say it was entirely fasle, and in fact, I have appealed to it. The problem is that the versions that are true are not useful as a stage in the overall MIRI argument. Lack of relevance, in short.
I dare say EY would assert that. I wouldn’t.
I’m prepared to do so, but I’d be rather more amenable to doing so if you would also argue rather than simply asserting your position.
Can you explain how the Orthogonality Thesis is not true in a relevant way with respect to the friendliness of AI?
In which case it should follow that Friendliness is easy, since Friendliness essentially boils down to determining and following what humans think of as “morality.”
If you’re hanging your trust on the objectivity of humanlike morality and its innate relevance to every goal-pursuing optimization force though, you’re placing your trust in something we have virtually no evidence to support the truth of. We may have intuitions to that effect, but there are also understandable reasons for us to hold such intuitions in the absence of their truth, and we have no evidence aside from those intuitions.
It doesn’t exclude, or even render strongly unlikely, The AI could Figure Out Morality.
The mere presence of Clippies as theoretical possibilities in mindspace doesn’t imply anything about their probability. The OT mindspace needs to be weighted according the practical aims, limitations etc of real-world research.
Yes: based on my proposal it is no harder than rationality, since it follows from it. But I was explicitly discussing EY’s judgements.
I never said that. I don’t think morality is necessarily human orientated, and I don’t think an AI needs to have an intrinsically human morality to behave morally towards us itself—for the same reason that one can behave politely in a foreign country, or behave ethically towards non-huamn animals.
Never said anything of the kind.
This is more or less exactly what the Orthogonality Thesis argues against. That is, even if we suppose that an objective morality exists (something that, unless we have hard evidence for it, we should assume is not the case,) an AI would not care about it by default.
How would you program an AI to determine objective morality and follow that?
Yes, but the presence of humanlike intellects in mindspace doesn’t tell us that they’re an easy target to hit in mindspace by aiming for it either.
If you cannot design a humanlike intellect, or point to any specific model by which one could do so, then you’re not in much of a position to assert that it should be an easy task.
One can behave “politely”, by human standards, towards foreign countries, or “ethically,” by human standards, towards non-human animals. Humans have both evolved drives and game theoretic concerns which motivate these sorts of behaviors. “For the same reasons” does not seem to apply at all here, because
a) A sufficiently powerful AI does not need to cooperate within a greater community of humans, it could easily crush us all. One of the most reproductively successful humans in history was a conqueror who founded an empire which in three generations expanded to include more than a quarter of the total world population at the time. The motivation to gain resources by competition is a drive which exists in opposition to the motivation to minimize risk by cooperation and conflict avoidance. If human intelligence had developed in the absence of the former drive, then we would all be reflexive communists. An AI, on the other hand, is developed in the absence of either drive. To the extent that we want it to behave as if it were an intelligence which had developed in the context of needing to cooperate with others, we’d have to program that in.
b) Our drives to care about other thinking beings are also evolved traits. A machine intelligence does not by default value human beings more than sponges or rocks.
One might program such drives into an AI, but again, this is really complicated to do, and an AI will not simply pull them out of nowhere.
The OT mindspace may consist of 99% of AIs that don’t care. That is completely irrelvant, becuase it doesn’t translate into a 99% likelihood of accidentally building a Clippy.
Rationality-as-a-goal.
None of this is easy.
I can’t practically design my AI, and you can;t yours. I can theoretically specify my AI, and you can yours.
I am not talking about any given AI.
I am not talking about “default”.
Almost everything in this field is really difficult. And one doesn’t have to programme them. If sociability is needed to live in societies, then pluck Ais from succesful societies.
The problem is that the space of minds which are human-friendly is so small that it’s extremely difficult to hit even when we’re trying to hit it.
The broad side of a barn may compose one percent of all possible target space at a hundred paces, while still being easy to hit. A dime on the side of the barn will be much, much harder. Obviously your chances of hitting the dime will be much higher than if you were firing randomly through possible target space, but if you fire at it, you will still probably miss.
Taboo rationality-as-a-goal, it’s obviously an impediment to this discussion.
If by “human-friendly” minds, you mean a mind that is wired up to be human-friendly, and only human-friendly (as in EY’s architecture)., and if you assume that human friendliness is a rag-bag of ad-hoc behaviours with no hope or rational deducibility (as EY also assumes) that would be true.
That may be difficult to hit, but it is not what I am aiming at.
What I am talking about is a mind that has a general purpose rationality (which can be applied to specfic problems., like all rationality), and a general purpose morality (likewise applicable to specific problems). If will not be intrinsically, compulsively and inflexibly human-friendly, like EY’s architecture. If it finds itself among humans it will be human-friendly because it can (its rational) and because it wants to (it’s moral). OTOH, if it finds itself amongst Tralfamadorians, it will be be Tralfamadorian-friendly.
My using words that mean what I say to say what I mean is not the problem. The problem is that you keep inaccurately paraphrasing what I say, and then attacking the paraphrase.
The words do not convey what you mean. If my interpretation of what you mean is inaccurate, then that’s a sign that you need to make your position clearer.