If they are only as smart as the average person, then all things being equal, they will be as good as the average peson as figuring out morality.
It’s quite possible that I’m below average, but I’m not terribly impressed by my own ability to extrapolate how other average people’s morality works—and that’s with the advantage of being built on hardware that’s designed toward empathy and shared values. I’m pretty confident I’m smarter than my cat, but it’s not evident that I’m correct when I guess at the cat’s moral system. I can be right, at times, but I can be wrong, too.
Worse, that seems a fairly common matter. There are several major political discussions involving moral matters, where it’s conceivable that at least 30% of the population has made an incorrect extrapolation, and probable that in excess of 60% has. And this only gets worse if you consider a time variant : someone who was as smart as the average individual in 1950 would have little problem doing some very unpleasant things to Alan Turing. Society (luckily!) developed since then, but it has mechanisms for development and disposal of concepts that AI do not necessarily have or we may not want them to have.
((This is in addition to general concerns about the universality of intelligence : it’s not clear that the sort of intelligence used for scientific research necessarily overlaps with the sort of intelligence used for philosophy, even if it’s common in humans.))
You seem to be tacitly assuming that the Seed AIs are designing walled-off unupdateable utility functions. But if one assumes a more natural architecture, where moral sense is allowed to evolve with eveythign else, you would expect and incremental succession of AIs to gradually get better at moral reasoning
Well, the obvious problem with not walling off and making unupdateaable the utility function is that the simplest way to maximize the value of a malleable utility function is to update it to something very easy. If you tell an AI that you want it to make you happy, and let it update that utility function, it takes a good deal less bit-twiddling to define “happy” as a steadily increasing counter. If you’re /lucky/, that means your AI breaks down. If not, it’s (weakly) unfriendly.
You can have a higher-level utility function of “do what I mean”, but not only is that harder to define, it has to be walled off, or you have “what I mean” redirected to a steadily increasing counter. And so on and so forth through higher levels of abstraction.
If you were bad at figuring out morality , you would be in jail. I am not sure what you mean by other people’s morality: I find the idea that there can be multiple ,valid effective moralities in society incoherent- like an economy where everyone has their own currency. You are not in jail so you learnt morality.(You don’t seem to believe morality is entirely hardwired , because you regard it as varying across short spans of time)
I also don’t know what you mean by an incorrect eextrapolation. If morality is objective, then most people might be wrong about it. However, an .AI will not pose a threat unless it is worse than the prevailing standard...the absolute standard does not matter.
Why would an .AI dumb enough to believe in 1950s morality be powerful enough to impose its views on a society that knows better?
Why wuld a smart AI lack mechanisms for disposing of concepts? How it could it self improve without such a mechanism ? If it’s too dumb to update,why would it be a threat?
If there is no NGI, there is no AGI. If there is no AGI, there is no threat of AGI. The threat posed by specialised optimisers is quite different...they can be boxed off if they cannot speak.
The failure modes of updateable UFs are wireheading failure modes, not destroy the world failure modes.
It’s quite possible that I’m below average, but I’m not terribly impressed by my own ability to extrapolate how other average people’s morality works—and that’s with the advantage of being built on hardware that’s designed toward empathy and shared values. I’m pretty confident I’m smarter than my cat, but it’s not evident that I’m correct when I guess at the cat’s moral system. I can be right, at times, but I can be wrong, too.
Worse, that seems a fairly common matter. There are several major political discussions involving moral matters, where it’s conceivable that at least 30% of the population has made an incorrect extrapolation, and probable that in excess of 60% has. And this only gets worse if you consider a time variant : someone who was as smart as the average individual in 1950 would have little problem doing some very unpleasant things to Alan Turing. Society (luckily!) developed since then, but it has mechanisms for development and disposal of concepts that AI do not necessarily have or we may not want them to have.
((This is in addition to general concerns about the universality of intelligence : it’s not clear that the sort of intelligence used for scientific research necessarily overlaps with the sort of intelligence used for philosophy, even if it’s common in humans.))
Well, the obvious problem with not walling off and making unupdateaable the utility function is that the simplest way to maximize the value of a malleable utility function is to update it to something very easy. If you tell an AI that you want it to make you happy, and let it update that utility function, it takes a good deal less bit-twiddling to define “happy” as a steadily increasing counter. If you’re /lucky/, that means your AI breaks down. If not, it’s (weakly) unfriendly.
You can have a higher-level utility function of “do what I mean”, but not only is that harder to define, it has to be walled off, or you have “what I mean” redirected to a steadily increasing counter. And so on and so forth through higher levels of abstraction.
If you were bad at figuring out morality , you would be in jail. I am not sure what you mean by other people’s morality: I find the idea that there can be multiple ,valid effective moralities in society incoherent- like an economy where everyone has their own currency. You are not in jail so you learnt morality.(You don’t seem to believe morality is entirely hardwired , because you regard it as varying across short spans of time)
I also don’t know what you mean by an incorrect eextrapolation. If morality is objective, then most people might be wrong about it. However, an .AI will not pose a threat unless it is worse than the prevailing standard...the absolute standard does not matter.
Why would an .AI dumb enough to believe in 1950s morality be powerful enough to impose its views on a society that knows better?
Why wuld a smart AI lack mechanisms for disposing of concepts? How it could it self improve without such a mechanism ? If it’s too dumb to update,why would it be a threat?
If there is no NGI, there is no AGI. If there is no AGI, there is no threat of AGI. The threat posed by specialised optimisers is quite different...they can be boxed off if they cannot speak.
The failure modes of updateable UFs are wireheading failure modes, not destroy the world failure modes.