This is true, but then, neither is AI design a process similar to that by which our own minds were created. Where our own morality is not a natural attractor, it is likely to be a very hard target to hit, particularly when we can’t rigorously describe it ourselves.
You seem to be thinking of Big Design Up Front. There is already an ecosystem of devices which are beign selected for friendliness, because unfriendly gadgets don’t sell.
Can you explain how existing devices are either Friendly or Unfriendly in a sense relevant to that claim? Existing AIs are not intelligences shaped by interaction with other machines, and no existing machines that I’m aware of represent even attempts to be Friendly in the sense that Eliezer uses, where they actually attempt to model our desires.
As-is, human designers attempt to model the desires of humans who make up the marketplace (or at least, the drives that motivate their buying habits, which are not necessarily the same thing,) but as I already noted, humans aren’t able to rigorously define our own desires, and a good portion of the Sequences goes into explaining how a non rigorous formulation of our desires, handed down to a powerful AI, could have extremely negative consequences.
Existing gadgets aren’t friendly in the full FAI sense, but the ecosystem is a basis for incremental development...oen that sidesteps the issue of solving friendliness by Big Design Up Front.
Can you explain how it sidesteps the issue? That is, how it results in the development of AI which implement our own values in a more precise way than we have thus far been able to define ourselves?
As an aside, I really do not buy that the body of existing machines and the developers working on them form something that is meaningfully analogous to an “ecosystem” for the development of AI.
Can you explain how it sidesteps the issue? That is, how it results in the development of AI which implement our own values in a more precise way than we have thus far been able to define ourselves?
This is one of the key ways in which our development of technology differs from an ecosystem. In an ecosystem, mutations are random, and selected entirely on the effectiveness of their ability to propagate themselves in the gene pool. In The development of technology, we do not have random mutations, we have human beings deciding what does or does not seem like a good idea to implement in technology, and then using market forces as feedback. This fails to get us around a) the difficulty of humans actually figuring out strict formalizations of our desires sufficient to make a really powerful AI safe, and b) failure scenarios resulting in “oops, that killed everyone.”
The selection process we actually have does not offer us a single do-over in the event of catastrophic failure, nor does it rigorously select for outputs that, given sufficient power, will not fail catastrophically.
One of the key disanalogies between your “ecosystem” formulation and human development of technology is that natural selection isn’t an actor subject to feedback within the system.
If an organism develops a mutation which is sufficiently unfavorable to the Blind Idiot God, the worst case scenario is that it’s stillborn, or under exceptional circumstances, triggers an evolution to extinction. There is no possible failure state where an organism develops such an unfavorable mutation that evolution itself keels over dead.
However, in an ecosystem where multiple species interrelate and impose selection effects on each other, then a sudden change in circumstances for one species can result in rapid extinction for others.
We impose selection effects on technology, but a sudden change in technology which kills us all would not be a novel occurrence by the standards of ecosystem operation.
ETA: It seems that your argument all along has boiled down to “We’ll just deliberately not do that” when it comes to cases of catastrophic failure. But the argument of Eliezer and MIRI all along has been that such catastrophic failure is much, much harder to avoid than it intuitively appears.
Gadgets are more equivalent to domesticated animals.
We can certainly avoid the clip.py failure made. I amnot arguing that everything else is inherently safe. It is typical of Pascal problems that there are many low probability risks.
We will almost certainly avoid the literal clippy failure mode of an AI trying to maximize paperclips, but that doesn’t mean that it’s at all easy to avoid the more general failure mode of AI which try to optimize something other than what we would really, given full knowledge of the consequences, want them to optimize for.
Can you describe how to give an AI rationality as a goal, and what the consequences would be?
You’ve previously attempted to define “rational” as “humanlike plus instrumentally rational,” but that only packages the Friendliness problem into making an AI rational.
I don’t see why I would have to prove the theoretical possibility of AIs with rationality as a goal, since it is guaranteed by the Orthogonality Thesis. (And it is hardly disputable that minds can have rationality as a goal, since some people do).
I don’t see why I should need to provide a detailed technical explanation of how to do this, since no such explanation
has been put forward for Clippy, whose possibility is always argued fromt he OT.
I don’t see why I should provide a high-level explanation of what rationality is, since there is plenty of such available, not least from CFAR and LW.
In short, an AI with rationality as a goal would behave as human “aspiring rationalists” are enjoined to behave.
I don’t see why I would have to prove the theoretical possibility of AIs with rationality as a goal, since it is guaranteed by the Orthogonality Thesis. (And it is hardly disputable that minds can have rationality as a goal, since some people do).
The entire point, in any case, is not that building such an AI is theoretically impossible, but that it’s mind bogglingly difficult, and that we should expect that most attempts to do so would fail rather than succeed, and that failure would have potentially dire consequences.
I don’t see why I should provide a high-level explanation of what rationality is, since there is plenty of such available, not least from CFAR and LW.
What you mean by “rationality” seems to diverge dramatically from what Less Wrong means by “rationality,” otherwise for an agent to “have rationality as a goal” would be essentially meaningless. That’s why I’m trying to get you to explain precisely what you mean by it.
Can you give an example of a [mind that has rationality as a goal]
Me. Most professional philosophers. Anyone who’s got good at aspiring rationalism.
? So far you haven’t made it clear what having “rationality as a goal” would even mean, but it doesn’t sound like it would be good for much.
Terminal values aren’t supposed to be “for” some meta- or super-terminal value. (There’s a clue in the name...).
The entire point, in any case, is not that building such an AI is theoretically impossible, but that it’s mind bogglingly difficult, and that we should expect that most attempts to do so would fail rather than succeed, and that failure would have potentially dire consequences.
It is difficult in absolute terms, since all AI is.
Explain why it is relatively more difficult than building a Clippy,, or mathematically solving and coding in morality.
would fail rather than succeed, and that failure would have potentially dire consequences.
Failing to correctly code morality into an AI with unupdateable values would have consequences.
What you mean by “rationality” seems to diverge dramatically from what Less Wrong means by “rationality,” otherwise for an agent to “have rationality as a goal” would be essentially meaningless. That’s why I’m trying to get you to explain precisely what you mean by it.
Less wrong means (when talking about AIs), instrumental rationality. I mean what LW, CFAR, etc mean when
they are talking too and about humans: consistency, avoidance of bias, basing beliefs on evidence, etc, etc.
It’s just that those are not merely instrumental, but goals in themselves.
Explain why it is relatively more difficult than building a Clippy,, or mathematically solving and coding in morality.
I think we’ve hit on a serious misunderstanding here. Clippy is relatively easy to make; you or I could probably come up with reasonable specifications for what qualifies as a paperclip, and it wouldn’t be too hard to program maximization of paperclips as an AI’s goal.
Mathematically solving human morality, on the other hand, is mind bogglingly difficult. The reason MIRI is trying to work out how to program Friendliness is not because it’s easy, it’s because a strong AI which isn’t programmed to be Friendly is extremely dangerous.
Again, you’re trying to wrap “humanlike plus epistemically and instrumentally rational” into “rational,” but by bundling in humanlike morality, you’ve essentially wrapped up the Friendliness problem into designing a “rational” AI, and treated this as if it’s a solution. Essentially, what you’re proposing is really, absurdly difficult, and you’re acting like it ought to be easy, and this is exactly the danger that Eliezer spent so much time trying to caution against; approaching this specific extremely difficult task, where failure is likely to result in catastrophe, as if it were easy and one would succeed by default.
As an aside, if you value rationality as a goal in itself, would you want to be highly epistemically and instrumentally rational, but held at the mercy of a nigh-omnipotent tormentor who ensures that you fail at every task you set yourself to, are held in disdain by all your peers, and are only able to live at a subsistence level? Most of the extent to which people ordinarily treat rationality as a goal is instrumental, and the motivations of beings who felt otherwise would probably seem rather absurd to us.
I think we’ve hit on a serious misunderstanding here. Clippy is relatively easy to make; you or I could probably come up with reasonable specifications for what qualifies as a paperclip, and it wouldn’t be too hard to program maximization of paperclips as an AI’s goal.
A completely unintelligent clip-making machine isn’t difficult to make. Or threatening. Clippy is supposed to be threatening due to its superintelligence, (You also need to solve goal stability).
Again, you’re trying to wrap “humanlike plus epistemically and instrumentally rational” into “rational,”
I did not write the quoted phrase, and it is not accurate.
but by bundling in humanlike morality,
I never said anything of the kind. I think it may be possible for a sufficiently rational agent to deduce morality, but that is no way equivalent to hardwiring into the agent, or into the definition of raitonal!
As an aside, if you value rationality as a goal in itself, would you want to be highly epistemically and instrumentally rational, but held at the mercy of a nigh-omnipotent tormentor who ensures that you fail at every task you set yourself to, are held in disdain by all your peers, and are only able to live at a subsistence level?
It’s simple logic that valuing rationality as a goal doesn’t mean valuing only rationality.
Most of the extent to which people ordinarily treat rationality as a goal is instrumental, and the motivations of beings who felt otherwise would probably seem rather absurd to us.
We laugh at the talking-snakes crowd and X-factor watchers, they laugh at the nerds and geeks. So it goes.
A number of schemes have been proposed in the literature.
That doesn’t answer my question. Please describe at least one which you think would be likely to work, and why you think it would work.
You can’t guess? Rationality-as-a-goal.
You’ve been consistently treating rationality-as-a-goal as a black box which solves all these problems, but you haven’t given any indication of how it can be programmed into an AI in such a way that makes it a simpler alternative to solving the Friendliness problem, and indeed when your descriptions seem to entail solving it.
ETA: When I asked you for examples of entities which have rationality as a goal, you gave examples which, by your admission, have other goals which are at the very least additional to rationality. So suppose that we program an intelligent agent which has only rationality as a goal. What does it do?
That doesn’t answer my question. Please describe at least one which you think would be likely to work, and why you think it would work.
I don;t have to, since the default likelihood of ethical objectivism isn’t zero.
You’ve been consistently treating rationality-as-a-goal as a black box which solves all these problems, but you haven’t given any indication of how it can be programmed into an AI in such a way that makes it a simpler alternative to solving the Friendliness problem, and indeed when your descriptions seem to entail solving it.
There a lots of ways of being biased, but few of being unbiased. Rationality, as described by EY, is lack of bias, Friendliness, as described by EY, is a complex and arbitrary set of biases.
What you’re effectively saying here is “I don’t have to offer any argument that I’m right, because it’s not impossible that I’m wrong.”
There a lots of ways of being biased, but few of being unbiased. Rationality, as described by EY, is lack of bias, Friendliness, as described by EY, is a complex and arbitrary set of biases.
Friendliness is a complex and arbitrary set of biases in the sense that human morality is a complex and arbitrary set of biases.
Okay, but I’m prepared to assert that it’s infinitesimally low,
It would have been helpful to argue rather than assert.
What you’re effectively saying here is “I don’t have to offer any argument that I’m right, because it’s not impossible that I’m wrong.”
ETA:
I am not arguing that MR is true. I am arguing that it has a certain probability, which subtracts from the overall probability of the MIRI problem/solution, and that MIRI needs to consider it more thoroughly.
and also that the Orthogonality Thesis applies even in the event that our universe has technically objective morality.
The OT is trivially false under some interpretations, and trivially true under others. I didn’t say it was entirely fasle, and in fact, I have appealed to it. The problem is that the versions that are true are not useful as a stage in the overall MIRI argument. Lack of relevance, in short.
Friendliness is a complex and arbitrary set of biases in the sense that human morality is a complex and arbitrary set of biases.
It would have been helpful to argue rather than assert.
I’m prepared to do so, but I’d be rather more amenable to doing so if you would also argue rather than simply asserting your position.
The OT is trivially false under some interpretations, and trivially true under others. I didn’t say it was entirely fasle, and in fact, I have appealed to it. The problem is that the versions that are true are not useful as a stage in the overall MIRI argument. Lack of relevance, in short.
Can you explain how the Orthogonality Thesis is not true in a relevant way with respect to the friendliness of AI?
I dare say EY would assert that. I wouldn’t.
In which case it should follow that Friendliness is easy, since Friendliness essentially boils down to determining and following what humans think of as “morality.”
If you’re hanging your trust on the objectivity of humanlike morality and its innate relevance to every goal-pursuing optimization force though, you’re placing your trust in something we have virtually no evidence to support the truth of. We may have intuitions to that effect, but there are also understandable reasons for us to hold such intuitions in the absence of their truth, and we have no evidence aside from those intuitions.
I’m prepared to do so, but I’d be rather more amenable to doing so if you would also argue rather than simply asserting your position.
I am not saying anything extraordinary. MR is not absurd, taken seriously by professional philosophers,etc.
Can you explain how the Orthogonality Thesis is not true in a relevant way with respect to the friendliness of AI?
It doesn’t exclude, or even render strongly unlikely, The AI could Figure Out Morality.
The mere presence of Clippies as theoretical possibilities in mindspace doesn’t imply anything about their probability. The OT mindspace needs to be weighted according the practical aims, limitations etc of real-world research.
In which case it should follow that Friendliness is easy, since Friendliness essentially boils down to determining and following what humans think of as “morality.”
Yes: based on my proposal it is no harder than rationality, since it follows from it. But I was explicitly discussing EY’s judgements.
If you’re hanging your trust on the objectivity of humanlike morality
I never said that. I don’t think morality is necessarily human orientated, and I don’t think an AI needs to have an intrinsically human morality to behave morally towards us itself—for the same reason that one can behave politely in a foreign country, or behave ethically towards non-huamn animals.
its innate relevance to every goal-pursuing optimization force
It doesn’t exclude, or even render strongly unlikely, The AI could Figure Out Morality.
This is more or less exactly what the Orthogonality Thesis argues against. That is, even if we suppose that an objective morality exists (something that, unless we have hard evidence for it, we should assume is not the case,) an AI would not care about it by default.
How would you program an AI to determine objective morality and follow that?
The mere presence of Clippies as theoretical possibilities in mindspace doesn’t imply anything about their probability. The OT mindspace needs to be weighted according the practical aims, limitations etc of real-world research.
Yes, but the presence of humanlike intellects in mindspace doesn’t tell us that they’re an easy target to hit in mindspace by aiming for it either.
If you cannot design a humanlike intellect, or point to any specific model by which one could do so, then you’re not in much of a position to assert that it should be an easy task.
I never said that. I don’t think morality is necessarily human orientated, and I don’t think an AI needs to have an intrinsically human morality to behave morally towards us itself—for the same reason that one can behave politely in a foreign country, or behave ethically towards non-huamn animals.
One can behave “politely”, by human standards, towards foreign countries, or “ethically,” by human standards, towards non-human animals. Humans have both evolved drives and game theoretic concerns which motivate these sorts of behaviors. “For the same reasons” does not seem to apply at all here, because
a) A sufficiently powerful AI does not need to cooperate within a greater community of humans, it could easily crush us all. One of the most reproductively successful humans in history was a conqueror who founded an empire which in three generations expanded to include more than a quarter of the total world population at the time. The motivation to gain resources by competition is a drive which exists in opposition to the motivation to minimize risk by cooperation and conflict avoidance. If human intelligence had developed in the absence of the former drive, then we would all be reflexive communists. An AI, on the other hand, is developed in the absence of either drive. To the extent that we want it to behave as if it were an intelligence which had developed in the context of needing to cooperate with others, we’d have to program that in.
b) Our drives to care about other thinking beings are also evolved traits. A machine intelligence does not by default value human beings more than sponges or rocks.
One might program such drives into an AI, but again, this is really complicated to do, and an AI will not simply pull them out of nowhere.
That is, even if we suppose that an objective morality exists (something that, unless we have hard evidence for it, we should assume is not the case,) an AI would not care about it by default.
The OT mindspace may consist of 99% of AIs that don’t care. That is completely irrelvant, becuase it doesn’t translate into
a 99% likelihood of accidentally building a Clippy.
How would you program an AI to determine objective morality and follow that?
Rationality-as-a-goal.
Yes, but the presence of humanlike intellects in mindspace doesn’t tell us that they’re an easy target to hit in mindspace by aiming for it either.
None of this is easy.
If you cannot design a humanlike intellect, or point to any specific model by which one could do so, then you’re not in much of a position to assert that it should be an easy task.
I can’t practically design my AI, and you can;t yours. I can theoretically specify my AI, and you can yours.
a) A sufficiently powerful AI does not need to cooperate within a greater community of humans, it could easily crush us all.
I am not talking about any given AI.
b) Our drives to care about other thinking beings are also evolved traits. A machine intelligence does not by default value human beings more than sponges or rocks.
I am not talking about “default”.
One might program such drives into an AI, but again, this is really complicated to do, and an AI will not simply pull them out of nowhere.
Almost everything in this field is really difficult.
And one doesn’t have to programme them. If sociability is needed to live in societies, then pluck Ais from succesful societies.
The OT mindspace may consist of 99% of AIs that don’t care. That is completely irrelvant, becuase it doesn’t translate into a 99% likelihood of accidentally building a Clippy.
The problem is that the space of minds which are human-friendly is so small that it’s extremely difficult to hit even when we’re trying to hit it.
The broad side of a barn may compose one percent of all possible target space at a hundred paces, while still being easy to hit. A dime on the side of the barn will be much, much harder. Obviously your chances of hitting the dime will be much higher than if you were firing randomly through possible target space, but if you fire at it, you will still probably miss.
Rationality-as-a-goal.
Taboo rationality-as-a-goal, it’s obviously an impediment to this discussion.
The problem is that the space of minds which are human-friendly is so small that it’s extremely difficult to hit even when we’re trying to hit it.
The problem is that the space of minds which are human-friendly is so small that it’s extremely difficult to hit even when we’re trying to hit it.
If by “human-friendly” minds, you mean a mind that is wired up to be human-friendly, and only human-friendly (as in EY’s architecture)., and if you assume that human friendliness is a rag-bag of ad-hoc behaviours with no hope or rational deducibility (as EY also assumes) that would be true.
That may be difficult to hit, but it is not what I am aiming at.
What I am talking about is a mind that has a general purpose rationality (which can be applied to specfic problems., like all rationality), and a general purpose morality (likewise applicable to specific problems). If will not be intrinsically,
compulsively and inflexibly human-friendly, like EY’s architecture. If it finds itself among humans it will be human-friendly because it can (its rational) and because it wants to (it’s moral). OTOH, if it finds itself amongst Tralfamadorians, it will be be Tralfamadorian-friendly.
Taboo rationality-as-a-goal, it’s obviously an impediment to this discussion.
My using words that mean what I say to say what I mean is not the problem. The problem is that you keep inaccurately paraphrasing what I say, and then attacking the paraphrase.
My using words that mean what I say to say what I mean is not the problem. The problem is that you keep inaccurately paraphrasing what I say, and then attacking the paraphrase.
The words do not convey what you mean. If my interpretation of what you mean is inaccurate, then that’s a sign that you need to make your position clearer.
This is only relevant if AGI evolves out of this existing ecosystem. That is possible. Incremental changes by a large number of tech companies copied or dropped in response to market pressure is pretty similar to biological evolution. But just as most species don’t evolve to be more generally intelligent, most devices don’t either. If we develop AGI, it will be by some team that is specifically aiming for it and not worrying about the marketability of intermediary stages.
Like the giraffe reaching for the higher leaves, we (humanity) will stretch our necks out farther with more complex AI systems until we are of no use to our own creation. Our goal is our own destruction. We live to die after all.
This is true, but then, neither is AI design a process similar to that by which our own minds were created. Where our own morality is not a natural attractor, it is likely to be a very hard target to hit, particularly when we can’t rigorously describe it ourselves.
You seem to be thinking of Big Design Up Front. There is already an ecosystem of devices which are beign selected for friendliness, because unfriendly gadgets don’t sell.
Can you explain how existing devices are either Friendly or Unfriendly in a sense relevant to that claim? Existing AIs are not intelligences shaped by interaction with other machines, and no existing machines that I’m aware of represent even attempts to be Friendly in the sense that Eliezer uses, where they actually attempt to model our desires.
As-is, human designers attempt to model the desires of humans who make up the marketplace (or at least, the drives that motivate their buying habits, which are not necessarily the same thing,) but as I already noted, humans aren’t able to rigorously define our own desires, and a good portion of the Sequences goes into explaining how a non rigorous formulation of our desires, handed down to a powerful AI, could have extremely negative consequences.
Existing gadgets aren’t friendly in the full FAI sense, but the ecosystem is a basis for incremental development...oen that sidesteps the issue of solving friendliness by Big Design Up Front.
Can you explain how it sidesteps the issue? That is, how it results in the development of AI which implement our own values in a more precise way than we have thus far been able to define ourselves?
As an aside, I really do not buy that the body of existing machines and the developers working on them form something that is meaningfully analogous to an “ecosystem” for the development of AI.
By variation and selection, as I said.
That doesn’t actually answer the question at all.
This is one of the key ways in which our development of technology differs from an ecosystem. In an ecosystem, mutations are random, and selected entirely on the effectiveness of their ability to propagate themselves in the gene pool. In The development of technology, we do not have random mutations, we have human beings deciding what does or does not seem like a good idea to implement in technology, and then using market forces as feedback. This fails to get us around a) the difficulty of humans actually figuring out strict formalizations of our desires sufficient to make a really powerful AI safe, and b) failure scenarios resulting in “oops, that killed everyone.”
The selection process we actually have does not offer us a single do-over in the event of catastrophic failure, nor does it rigorously select for outputs that, given sufficient power, will not fail catastrophically.
There is no problem of strict formulation, because that is not what I am aiming at, it’s your assumption.
I am aware that the variation isn’t random. I don’t think that is significant.
I don’t think sudden catastrophic failure is likely in incremental/evolutionary progress.
I don’t think mathematical “proof” is going to be as reliable as you think, given the complexity.
One of the key disanalogies between your “ecosystem” formulation and human development of technology is that natural selection isn’t an actor subject to feedback within the system.
If an organism develops a mutation which is sufficiently unfavorable to the Blind Idiot God, the worst case scenario is that it’s stillborn, or under exceptional circumstances, triggers an evolution to extinction. There is no possible failure state where an organism develops such an unfavorable mutation that evolution itself keels over dead.
However, in an ecosystem where multiple species interrelate and impose selection effects on each other, then a sudden change in circumstances for one species can result in rapid extinction for others.
We impose selection effects on technology, but a sudden change in technology which kills us all would not be a novel occurrence by the standards of ecosystem operation.
ETA: It seems that your argument all along has boiled down to “We’ll just deliberately not do that” when it comes to cases of catastrophic failure. But the argument of Eliezer and MIRI all along has been that such catastrophic failure is much, much harder to avoid than it intuitively appears.
Gadgets are more equivalent to domesticated animals.
We can certainly avoid the clip.py failure made. I amnot arguing that everything else is inherently safe. It is typical of Pascal problems that there are many low probability risks.
We will almost certainly avoid the literal clippy failure mode of an AI trying to maximize paperclips, but that doesn’t mean that it’s at all easy to avoid the more general failure mode of AI which try to optimize something other than what we would really, given full knowledge of the consequences, want them to optimize for.
Apart from not solving the value stability problem, and giving them rationality as a goal, not just instrumental rationality.
Can you describe how to give an AI rationality as a goal, and what the consequences would be?
You’ve previously attempted to define “rational” as “humanlike plus instrumentally rational,” but that only packages the Friendliness problem into making an AI rational.
I don’t see why I would have to prove the theoretical possibility of AIs with rationality as a goal, since it is guaranteed by the Orthogonality Thesis. (And it is hardly disputable that minds can have rationality as a goal, since some people do).
I don’t see why I should need to provide a detailed technical explanation of how to do this, since no such explanation has been put forward for Clippy, whose possibility is always argued fromt he OT.
I don’t see why I should provide a high-level explanation of what rationality is, since there is plenty of such available, not least from CFAR and LW.
In short, an AI with rationality as a goal would behave as human “aspiring rationalists” are enjoined to behave.
Can you give an example of any? So far you haven’t made it clear what having “rationality as a goal” would even mean, but it doesn’t sound like it would be good for much.
The entire point, in any case, is not that building such an AI is theoretically impossible, but that it’s mind bogglingly difficult, and that we should expect that most attempts to do so would fail rather than succeed, and that failure would have potentially dire consequences.
What you mean by “rationality” seems to diverge dramatically from what Less Wrong means by “rationality,” otherwise for an agent to “have rationality as a goal” would be essentially meaningless. That’s why I’m trying to get you to explain precisely what you mean by it.
Me. Most professional philosophers. Anyone who’s got good at aspiring rationalism.
Terminal values aren’t supposed to be “for” some meta- or super-terminal value. (There’s a clue in the name...).
It is difficult in absolute terms, since all AI is.
Explain why it is relatively more difficult than building a Clippy,, or mathematically solving and coding in morality.
Failing to correctly code morality into an AI with unupdateable values would have consequences.
Less wrong means (when talking about AIs), instrumental rationality. I mean what LW, CFAR, etc mean when they are talking too and about humans: consistency, avoidance of bias, basing beliefs on evidence, etc, etc.
It’s just that those are not merely instrumental, but goals in themselves.
I think we’ve hit on a serious misunderstanding here. Clippy is relatively easy to make; you or I could probably come up with reasonable specifications for what qualifies as a paperclip, and it wouldn’t be too hard to program maximization of paperclips as an AI’s goal.
Mathematically solving human morality, on the other hand, is mind bogglingly difficult. The reason MIRI is trying to work out how to program Friendliness is not because it’s easy, it’s because a strong AI which isn’t programmed to be Friendly is extremely dangerous.
Again, you’re trying to wrap “humanlike plus epistemically and instrumentally rational” into “rational,” but by bundling in humanlike morality, you’ve essentially wrapped up the Friendliness problem into designing a “rational” AI, and treated this as if it’s a solution. Essentially, what you’re proposing is really, absurdly difficult, and you’re acting like it ought to be easy, and this is exactly the danger that Eliezer spent so much time trying to caution against; approaching this specific extremely difficult task, where failure is likely to result in catastrophe, as if it were easy and one would succeed by default.
As an aside, if you value rationality as a goal in itself, would you want to be highly epistemically and instrumentally rational, but held at the mercy of a nigh-omnipotent tormentor who ensures that you fail at every task you set yourself to, are held in disdain by all your peers, and are only able to live at a subsistence level? Most of the extent to which people ordinarily treat rationality as a goal is instrumental, and the motivations of beings who felt otherwise would probably seem rather absurd to us.
A completely unintelligent clip-making machine isn’t difficult to make. Or threatening. Clippy is supposed to be threatening due to its superintelligence, (You also need to solve goal stability).
I did not write the quoted phrase, and it is not accurate.
I never said anything of the kind. I think it may be possible for a sufficiently rational agent to deduce morality, but that is no way equivalent to hardwiring into the agent, or into the definition of raitonal!
It’s simple logic that valuing rationality as a goal doesn’t mean valuing only rationality.
We laugh at the talking-snakes crowd and X-factor watchers, they laugh at the nerds and geeks. So it goes.
How, and why would it care?
A number of schemes have been proposed in the literature.
You can’t guess? Rationality-as-a-goal.
That doesn’t answer my question. Please describe at least one which you think would be likely to work, and why you think it would work.
You’ve been consistently treating rationality-as-a-goal as a black box which solves all these problems, but you haven’t given any indication of how it can be programmed into an AI in such a way that makes it a simpler alternative to solving the Friendliness problem, and indeed when your descriptions seem to entail solving it.
ETA: When I asked you for examples of entities which have rationality as a goal, you gave examples which, by your admission, have other goals which are at the very least additional to rationality. So suppose that we program an intelligent agent which has only rationality as a goal. What does it do?
I don;t have to, since the default likelihood of ethical objectivism isn’t zero.
There a lots of ways of being biased, but few of being unbiased. Rationality, as described by EY, is lack of bias, Friendliness, as described by EY, is a complex and arbitrary set of biases.
Okay, but I’m prepared to assert that it’s infinitesimally low, and also that the Orthogonality Thesis applies even in the event that our universe has technically objective morality.
What you’re effectively saying here is “I don’t have to offer any argument that I’m right, because it’s not impossible that I’m wrong.”
Friendliness is a complex and arbitrary set of biases in the sense that human morality is a complex and arbitrary set of biases.
It would have been helpful to argue rather than assert.
ETA: I am not arguing that MR is true. I am arguing that it has a certain probability, which subtracts from the overall probability of the MIRI problem/solution, and that MIRI needs to consider it more thoroughly.
The OT is trivially false under some interpretations, and trivially true under others. I didn’t say it was entirely fasle, and in fact, I have appealed to it. The problem is that the versions that are true are not useful as a stage in the overall MIRI argument. Lack of relevance, in short.
I dare say EY would assert that. I wouldn’t.
I’m prepared to do so, but I’d be rather more amenable to doing so if you would also argue rather than simply asserting your position.
Can you explain how the Orthogonality Thesis is not true in a relevant way with respect to the friendliness of AI?
In which case it should follow that Friendliness is easy, since Friendliness essentially boils down to determining and following what humans think of as “morality.”
If you’re hanging your trust on the objectivity of humanlike morality and its innate relevance to every goal-pursuing optimization force though, you’re placing your trust in something we have virtually no evidence to support the truth of. We may have intuitions to that effect, but there are also understandable reasons for us to hold such intuitions in the absence of their truth, and we have no evidence aside from those intuitions.
It doesn’t exclude, or even render strongly unlikely, The AI could Figure Out Morality.
The mere presence of Clippies as theoretical possibilities in mindspace doesn’t imply anything about their probability. The OT mindspace needs to be weighted according the practical aims, limitations etc of real-world research.
Yes: based on my proposal it is no harder than rationality, since it follows from it. But I was explicitly discussing EY’s judgements.
I never said that. I don’t think morality is necessarily human orientated, and I don’t think an AI needs to have an intrinsically human morality to behave morally towards us itself—for the same reason that one can behave politely in a foreign country, or behave ethically towards non-huamn animals.
Never said anything of the kind.
This is more or less exactly what the Orthogonality Thesis argues against. That is, even if we suppose that an objective morality exists (something that, unless we have hard evidence for it, we should assume is not the case,) an AI would not care about it by default.
How would you program an AI to determine objective morality and follow that?
Yes, but the presence of humanlike intellects in mindspace doesn’t tell us that they’re an easy target to hit in mindspace by aiming for it either.
If you cannot design a humanlike intellect, or point to any specific model by which one could do so, then you’re not in much of a position to assert that it should be an easy task.
One can behave “politely”, by human standards, towards foreign countries, or “ethically,” by human standards, towards non-human animals. Humans have both evolved drives and game theoretic concerns which motivate these sorts of behaviors. “For the same reasons” does not seem to apply at all here, because
a) A sufficiently powerful AI does not need to cooperate within a greater community of humans, it could easily crush us all. One of the most reproductively successful humans in history was a conqueror who founded an empire which in three generations expanded to include more than a quarter of the total world population at the time. The motivation to gain resources by competition is a drive which exists in opposition to the motivation to minimize risk by cooperation and conflict avoidance. If human intelligence had developed in the absence of the former drive, then we would all be reflexive communists. An AI, on the other hand, is developed in the absence of either drive. To the extent that we want it to behave as if it were an intelligence which had developed in the context of needing to cooperate with others, we’d have to program that in.
b) Our drives to care about other thinking beings are also evolved traits. A machine intelligence does not by default value human beings more than sponges or rocks.
One might program such drives into an AI, but again, this is really complicated to do, and an AI will not simply pull them out of nowhere.
The OT mindspace may consist of 99% of AIs that don’t care. That is completely irrelvant, becuase it doesn’t translate into a 99% likelihood of accidentally building a Clippy.
Rationality-as-a-goal.
None of this is easy.
I can’t practically design my AI, and you can;t yours. I can theoretically specify my AI, and you can yours.
I am not talking about any given AI.
I am not talking about “default”.
Almost everything in this field is really difficult. And one doesn’t have to programme them. If sociability is needed to live in societies, then pluck Ais from succesful societies.
The problem is that the space of minds which are human-friendly is so small that it’s extremely difficult to hit even when we’re trying to hit it.
The broad side of a barn may compose one percent of all possible target space at a hundred paces, while still being easy to hit. A dime on the side of the barn will be much, much harder. Obviously your chances of hitting the dime will be much higher than if you were firing randomly through possible target space, but if you fire at it, you will still probably miss.
Taboo rationality-as-a-goal, it’s obviously an impediment to this discussion.
If by “human-friendly” minds, you mean a mind that is wired up to be human-friendly, and only human-friendly (as in EY’s architecture)., and if you assume that human friendliness is a rag-bag of ad-hoc behaviours with no hope or rational deducibility (as EY also assumes) that would be true.
That may be difficult to hit, but it is not what I am aiming at.
What I am talking about is a mind that has a general purpose rationality (which can be applied to specfic problems., like all rationality), and a general purpose morality (likewise applicable to specific problems). If will not be intrinsically, compulsively and inflexibly human-friendly, like EY’s architecture. If it finds itself among humans it will be human-friendly because it can (its rational) and because it wants to (it’s moral). OTOH, if it finds itself amongst Tralfamadorians, it will be be Tralfamadorian-friendly.
My using words that mean what I say to say what I mean is not the problem. The problem is that you keep inaccurately paraphrasing what I say, and then attacking the paraphrase.
The words do not convey what you mean. If my interpretation of what you mean is inaccurate, then that’s a sign that you need to make your position clearer.
This is only relevant if AGI evolves out of this existing ecosystem. That is possible. Incremental changes by a large number of tech companies copied or dropped in response to market pressure is pretty similar to biological evolution. But just as most species don’t evolve to be more generally intelligent, most devices don’t either. If we develop AGI, it will be by some team that is specifically aiming for it and not worrying about the marketability of intermediary stages.
No: it is also relevant if AGI builders make use of prior art.
But the variation is purposeful.
Like the giraffe reaching for the higher leaves, we (humanity) will stretch our necks out farther with more complex AI systems until we are of no use to our own creation. Our goal is our own destruction. We live to die after all.