The AI’s that might actually exist in the future are those that had their origins in human-designed computer programs. Right? If an AI exists in 2050, then it was designed by something designed by … something designed by a human.
Is this really a random sample of all possible minds? I find it conceivable that human-designed AIs are a narrower subset of all the things that could be defined as minds. Maybe, to some extent, any “thinking machine” a human designs will have some features in common with the human mind (because we tend to anthropomorphize our own creations.)
The claim that “Any human-made AI is unlikely to share human values” requires evidence that human values are hard to transmit. Compared to, say, features resembling human faces, which we instinctively build into mechanical objects.
Pretty much everyone I know can doodle a set of features that are recognizably a schematic of a human face; they can even doodle a variety of sets of features that are recognizably different schematics of different expressions.
A great many people can reliably produce two-dimensional representations that are not just schematics but recognizable as individual human faces in specific contexts.
Even I, with relatively little training or talent, can do this tolerably well.
By contrast, I don’t know many people who can reliably capture representations of human values.
That certainly seems to me evidence that human values are harder to transmit than features resembling human faces.
Safety features represent the human value of not getting hurt. Car air bags represent the desire not to die in motor vehicle accidents. Planing down wood represents the desire not to get splinters. Fixing the floorboards helps with not falling down. It seems as though artefacts that encode human values are commonplace and fairly easy to create.
Most artefacts today don’t need many human values to function reasonably safely. The liquidiser needs to know to stop spinning when you take the lid off—but that’s about it.
Cars make good examples of a failure to program in human values. Cars respect the values of both drivers and pedestrians poorly. They are too stupid to know how to behave.
As cars get smarter, their understanding of driver and pedestrian values seems likely to improve dramatically. In this case, one of the primary functions of the machine brains will be to better respect human values. Drivers do not like to crash into other vehicles or pedestrians any more than they like being lost. It is also a case of the programming being relatively complex and difficult.
There’s a reasonable case to be made that this kind of thing is the rule—rather than the exception. In which case, any diverting of funds away from rapidly developing machine intelligence would fairly directly cause harm.
Absolutely agreed that whatever instrumental human values that we think about explicitly enough to encode into our machines (like not killing passengers or pedestrians while driving from point A to point B), or that are implicit enough in the task itself that optimizing for performing that task will necessarily implement those values as well (like not crashing and exploding between A and B) will most likely be instantiated in machine intelligence as we develop it.
Agreed that if that’s the rule rather than the exception—that is, if all or almost all of the things we care about are either things we understand explicitly or things that are implicit in the tasks we attempt to optimize—then building systems that attempt to optimize those things, with explicit safety features, is likely to alleviate more suffering than it causes.
I did mean a more comprehensive/coherent sense. Here’s my thinking.
Fallacy: “If it’s a super-intelligent machine, the very nature of intelligence means it must be wise and good and therefore it won’t kill us all.”
Rejoinder: “Wait, that’s totally not true. Lots of minds could be very powerful at thinking without valuing anything that we value. And that could kill us all. A paperclip maximizer would be a disaster—but it would still be an intelligence.”
Rejoinder to the rejoinder: “Sure, Clippy is a mind, and Clippy is deadly, but are humans likely to produce a Clippy? When we build AI’s, we’ll probably build them as models of how we think, and so they’ll probably resemble us in some ways. If we built AI’s, and the alien race of Vogons also built AI’s, I’d bet our AI’s would probably be a little bit more like us, relatively speaking, and the Vogon AI’s would probably be a little more like Vogons. We’re not drawing at random from mindspace, we’re drawing from the space of minds that humans are likely to build (on purpose or by accident.) Doesn’t mean that our AI’s won’t be dangerous, but they’re not necessarily going to be as alien as 2 thinks.”
Sure, it seems plausible that an AI developed by humans will on average end up in an at-least-marginally different region of mindspace than an AI developed by nonhumans.
And an AI designed to develop new pharmaceuticals will on average end up in an at-least-marginally different region of mindspace than one designed to predict stock market behavior. Sure.
None of that implies safety, as far as I can tell.
Yes, let’s be careful here.
The AI’s that might actually exist in the future are those that had their origins in human-designed computer programs. Right? If an AI exists in 2050, then it was designed by something designed by … something designed by a human.
Is this really a random sample of all possible minds? I find it conceivable that human-designed AIs are a narrower subset of all the things that could be defined as minds. Maybe, to some extent, any “thinking machine” a human designs will have some features in common with the human mind (because we tend to anthropomorphize our own creations.)
The claim that “Any human-made AI is unlikely to share human values” requires evidence that human values are hard to transmit. Compared to, say, features resembling human faces, which we instinctively build into mechanical objects.
Pretty much everyone I know can doodle a set of features that are recognizably a schematic of a human face; they can even doodle a variety of sets of features that are recognizably different schematics of different expressions.
A great many people can reliably produce two-dimensional representations that are not just schematics but recognizable as individual human faces in specific contexts.
Even I, with relatively little training or talent, can do this tolerably well.
By contrast, I don’t know many people who can reliably capture representations of human values.
That certainly seems to me evidence that human values are harder to transmit than features resembling human faces.
Safety features represent the human value of not getting hurt. Car air bags represent the desire not to die in motor vehicle accidents. Planing down wood represents the desire not to get splinters. Fixing the floorboards helps with not falling down. It seems as though artefacts that encode human values are commonplace and fairly easy to create.
Isolated instrumental values, certainly… agreed. (I could quibble about your examples, but that’s beside the point.)
I had understood SarahC to mean “human values” in a more comprehensive/coherent sense, but perhaps I misunderstood.
Most artefacts today don’t need many human values to function reasonably safely. The liquidiser needs to know to stop spinning when you take the lid off—but that’s about it.
Cars make good examples of a failure to program in human values. Cars respect the values of both drivers and pedestrians poorly. They are too stupid to know how to behave.
As cars get smarter, their understanding of driver and pedestrian values seems likely to improve dramatically. In this case, one of the primary functions of the machine brains will be to better respect human values. Drivers do not like to crash into other vehicles or pedestrians any more than they like being lost. It is also a case of the programming being relatively complex and difficult.
There’s a reasonable case to be made that this kind of thing is the rule—rather than the exception. In which case, any diverting of funds away from rapidly developing machine intelligence would fairly directly cause harm.
Absolutely agreed that whatever instrumental human values that we think about explicitly enough to encode into our machines (like not killing passengers or pedestrians while driving from point A to point B), or that are implicit enough in the task itself that optimizing for performing that task will necessarily implement those values as well (like not crashing and exploding between A and B) will most likely be instantiated in machine intelligence as we develop it.
Agreed that if that’s the rule rather than the exception—that is, if all or almost all of the things we care about are either things we understand explicitly or things that are implicit in the tasks we attempt to optimize—then building systems that attempt to optimize those things, with explicit safety features, is likely to alleviate more suffering than it causes.
I did mean a more comprehensive/coherent sense. Here’s my thinking.
Fallacy: “If it’s a super-intelligent machine, the very nature of intelligence means it must be wise and good and therefore it won’t kill us all.”
Rejoinder: “Wait, that’s totally not true. Lots of minds could be very powerful at thinking without valuing anything that we value. And that could kill us all. A paperclip maximizer would be a disaster—but it would still be an intelligence.”
Rejoinder to the rejoinder: “Sure, Clippy is a mind, and Clippy is deadly, but are humans likely to produce a Clippy? When we build AI’s, we’ll probably build them as models of how we think, and so they’ll probably resemble us in some ways. If we built AI’s, and the alien race of Vogons also built AI’s, I’d bet our AI’s would probably be a little bit more like us, relatively speaking, and the Vogon AI’s would probably be a little more like Vogons. We’re not drawing at random from mindspace, we’re drawing from the space of minds that humans are likely to build (on purpose or by accident.) Doesn’t mean that our AI’s won’t be dangerous, but they’re not necessarily going to be as alien as 2 thinks.”
Sure, it seems plausible that an AI developed by humans will on average end up in an at-least-marginally different region of mindspace than an AI developed by nonhumans.
And an AI designed to develop new pharmaceuticals will on average end up in an at-least-marginally different region of mindspace than one designed to predict stock market behavior. Sure.
None of that implies safety, as far as I can tell.