An AI smart enought to talk its way out of a box would be able to understand the implicit complexity: an AI too dumb to understand implicit complexity would be boxable. Where is the problem?
You’re forgetting the ‘seed is not the superintelligence’ lesson from The genie knows, but doesn’t care. If you haven’t read that article, go do so. The seed AI is dumb enough to be boxable, but also too dumb to plausibly solve the entire FAI problem itself.
I am arguing that it would not have to solve AI itself.
The superintelligent AI is smart enough to solve FAI, but also too smart to be safely boxed;
Huh? If it is moral and alien friendly , why would you need to box it?
and it doesn’t help us that an unFriendly superintelligent AI has solved FAI, if by that point it’s too powerful for us to control. You can’t safely pass the buck to a superintelligence to tell us how to build a superintelligence safe enough to pass bucks to.
If it’s friendly, why enslave it?
Things are not inherently dagerous just because they are unpredictable. If you have some independent reason fo thinking something might turn dangerous, then it becomes desirable to predict it.
Yes. The five theses give us reason to expect superintelligent AI to be dangerous by default. Adding more unpredictability to a system that already seems dangerous will generally make it more dangerous.
The five theses are variously irrelevant and misapplied. Details supplied on request.
they are not assumed to develop mysterious blind spots about falconry or mining engineering, Why assume they will develop a blind spot about morality?
‘>The genie knows, but doesn’t care’ means that the genie (i.e., superintelligence) knows how to do human morality (or could easily figure it out, if it felt like trying), but hasn’t been built to care about human morality.
What genie? Who built it that way? If your policy is to build an artificial philosopher, an AI that can solve morality is itself, why would you build it to not act on what it knows?
Knowing how to behave the way humans want you to is not sufficient for actually behaving that way; Eliezer makes that point well in No Universally Compelling Arguments.
No, his argument is irrelevant as explained in this comment.
The worry isn’t that the superintelligence will be dumb about morality; it’s that it will be indifferent to morality,
You don’t have to pre-programme the whole of friendliness or morality to fix that. If you have reason to suspect that there are no intrinsically compelling concepts, then you can build an AI that wants to be moral, but needs to figure otu what that is.
and that by the time it exists it will be too late to safely change that indifference. The seed AI (which is not a superintelligence, but is smart enough to set off a chain of self-modifications that lead to a superintelligence) is dumb about morality (approximately as dumb as humans are, if not dumber), and is also probably not a particularly amazing falconer or miner. It only needs to be a competent programmer, to qualify as a seed AI.
Which is only a problem if you assume, as I don’t, that it will be pre-programming a fixed morality.
The average person manages to solve the problem of being moral themselves, in a good-enough way.
Good enough for going to the grocery store without knifing anyone. Probably not good enough for safely ruling the world. With greater power comes a greater need for moral insight, and a greater risk should that insight be absent.
With greater intelligence comes greater moral insight—unless you create a problem by walling off that part of an AI.
Why isn’t havign a formalisation of morality a prolem with humans?
It is a problem, and it leads to a huge amount of human suffering. It doesn’t mean we get everything wrong, but we do make moral errors on a routine basis; the consequences are mostly non-catastrophic because we’re slow, weak, and have adopted some ‘good-enough’ heuristics for bounded circumstances.
OK. The consequences are non catastrophic. An AI with imperfect, good-enough morality would not be an existential threat.
We know how humans incremently improve as moral reasoners: it’s called the Kohlberg hierarchy.
Just about every contemporary moral psychologist I’ve read or talked to seems to think that Kohlberg’s overall model is false. (Though some may think it’s a useful toy model, and it certainly was hugely influential in its day.) Haidt’s The Emotional Dog and Its Rational Tail gets cited a lot in this context.
And does Haidt’s work mean that everyone is one par, morally? Does it mean that no one can progress in moral insight?
We do have morality tests. Fail them and you get pilloried in the media or sent to jail.
That’s certainly not good enough. Build a superintelligence that optimizes for ‘following the letter of the law’ and you don’t get a superintelligence that cares about humans’ deepest values.
It isn’t good enough for a ceiling: it is good enough for a floor.
If it works like arithmetic, that is if it is an expansion of some basic principles
Human values are an evolutionary hack resulting from adaptations to billions of different selective pressures over billions of years, innumerable side-effects of those adaptations, genetic drift, etc.
De facto ones are, yes. Likewise folk physics is an evolutionary hack. But if we build an AI to do physics, we don’t intend
it to do folk physics, we intend it to do physics.
Arithmetic can be formalized in a few sentences. Why think that humanity’s deepest preferences are anything like that simple?
There’s a theory of morality that can be expressed in a few sentences, and leaves preferences as variables to be filled in later. It’s called utilitarianism.
Our priors should be very low for ‘human value is simple’ just given the etiology of human value, and our failure to converge on any simple predictive or normative theory thus far seems to only confirm this.
So? If value is complex, that doesn’t affect utilitarianism, for instance. You,and other lesswrongian writers, keep behaving as though “values are X” is obviously equivalent to “morality is X”.
“The superintelligent AI is smart enough to solve FAI, but also too smart to be safely boxed;”
Huh? If it is moral and alien friendly , why would you need to box it?
You’re confusing ‘smart enough to solve FAI’ with ‘actually solved FAI’, and you’re confusing ‘actually solved FAI’ with ‘self-modified to become Friendly’. Most possible artificial superintelligences have no desire to invest much time into figuring out human value, and most possible ones that do figure out human value have no desire to replace their own desires with the desires of humans. If the genie knows how to build a Friendly AI, that doesn’t imply that the genie is Friendly; so superintelligence doesn’t in any way imply Friendliness even if it implies the ability to become Friendly.
No, his argument is irrelevant as explained in this comment.
Why does that comment make his point irrelevant? Are you claiming that it’s easy to program superintelligences to be ‘rational″, where ‘rationality’ doesn’t mean instrumental or epistemic rationality but instead means something that involves being a moral paragon? It just looks to me like black-boxing human morality to make it look simpler or more universal.
If you have reason to suspect that there are no intrinsically compelling concepts, then you can build an AI that wants to be moral, but needs to figure otu what that is.
And how do you code that? If the programmers don’t know what ‘be moral’ means, then how do they code the AI to want to ‘be moral’? See Truly Part Of You.
An AI with imperfect, good-enough morality would not be an existential threat.
A human with superintelligence-level superpowers would be an existential threat. An artificial intelligence with superintelligence-level superpowers would therefore also be an existential threat, if it were merely as ethical as a human. If your bar is set low enough to cause an extinction event, you should probably raise your bar a bit.
And does Haidt’s work mean that everyone is one par, morally? Does it mean that no one can progress in moral insight?
No. Read Haidt’s paper, and beware of goalpost drift.
It isn’t good enough for a ceiling: it is good enough for a floor.
No. Human law isn’t built for superintelligences, so it doesn’t put special effort into blocking loopholes that would be available to an ASI. E.g., there’s no law against disassembling the Sun, because no lawmaker anticipated that anyone would have that capability.
There’s a theory of morality that can be expressed in a few sentences, and leaves preferences as variables to be filled in later. It’s called utilitarianism.
… Which isn’t computable, and provides no particular method for figuring out what the variables are. ‘Preferences’ isn’t operationalized.
You,and other lesswrongian writers, keep behaving as though “values are X” is obviously equivalent to “morality is X”.
Values in general are what matters for Friendly AI, not moral values. Moral values are a proper subset of what’s important and worth protecting in humanity.
PART 2
I am arguing that it would not have to solve AI itself.
Huh? If it is moral and alien friendly , why would you need to box it?
If it’s friendly, why enslave it?
The five theses are variously irrelevant and misapplied. Details supplied on request.
‘>The genie knows, but doesn’t care’ means that the genie (i.e., superintelligence) knows how to do human morality (or could easily figure it out, if it felt like trying), but hasn’t been built to care about human morality.
What genie? Who built it that way? If your policy is to build an artificial philosopher, an AI that can solve morality is itself, why would you build it to not act on what it knows?
No, his argument is irrelevant as explained in this comment.
You don’t have to pre-programme the whole of friendliness or morality to fix that. If you have reason to suspect that there are no intrinsically compelling concepts, then you can build an AI that wants to be moral, but needs to figure otu what that is.
Which is only a problem if you assume, as I don’t, that it will be pre-programming a fixed morality.
With greater intelligence comes greater moral insight—unless you create a problem by walling off that part of an AI.
OK. The consequences are non catastrophic. An AI with imperfect, good-enough morality would not be an existential threat.
And does Haidt’s work mean that everyone is one par, morally? Does it mean that no one can progress in moral insight?
It isn’t good enough for a ceiling: it is good enough for a floor.
De facto ones are, yes. Likewise folk physics is an evolutionary hack. But if we build an AI to do physics, we don’t intend it to do folk physics, we intend it to do physics.
There’s a theory of morality that can be expressed in a few sentences, and leaves preferences as variables to be filled in later. It’s called utilitarianism.
So? If value is complex, that doesn’t affect utilitarianism, for instance. You,and other lesswrongian writers, keep behaving as though “values are X” is obviously equivalent to “morality is X”.
You’re confusing ‘smart enough to solve FAI’ with ‘actually solved FAI’, and you’re confusing ‘actually solved FAI’ with ‘self-modified to become Friendly’. Most possible artificial superintelligences have no desire to invest much time into figuring out human value, and most possible ones that do figure out human value have no desire to replace their own desires with the desires of humans. If the genie knows how to build a Friendly AI, that doesn’t imply that the genie is Friendly; so superintelligence doesn’t in any way imply Friendliness even if it implies the ability to become Friendly.
Why does that comment make his point irrelevant? Are you claiming that it’s easy to program superintelligences to be ‘rational″, where ‘rationality’ doesn’t mean instrumental or epistemic rationality but instead means something that involves being a moral paragon? It just looks to me like black-boxing human morality to make it look simpler or more universal.
And how do you code that? If the programmers don’t know what ‘be moral’ means, then how do they code the AI to want to ‘be moral’? See Truly Part Of You.
A human with superintelligence-level superpowers would be an existential threat. An artificial intelligence with superintelligence-level superpowers would therefore also be an existential threat, if it were merely as ethical as a human. If your bar is set low enough to cause an extinction event, you should probably raise your bar a bit.
No. Read Haidt’s paper, and beware of goalpost drift.
No. Human law isn’t built for superintelligences, so it doesn’t put special effort into blocking loopholes that would be available to an ASI. E.g., there’s no law against disassembling the Sun, because no lawmaker anticipated that anyone would have that capability.
… Which isn’t computable, and provides no particular method for figuring out what the variables are. ‘Preferences’ isn’t operationalized.
Values in general are what matters for Friendly AI, not moral values. Moral values are a proper subset of what’s important and worth protecting in humanity.