It feels to me like you are straying off the technical issues by looking at a huge picture.
In this case, a picture so huge it’s unsolvable. So here’s an assertion which might be interesting: Its better to focus on clusters of small, manageable machine-ethics problems and gradually build up to a Grand Scheme, or more likely in my guess, a Grand Messy But Workable System, rather than teasing-out a Bible of global ethical abstraction. There’s no working consensus on ethical rules anyway, outside the Three Laws.
An example, maybe already solved:
autonomous cars are coming quite soon, much sooner than most of us thought. Several people have wondered about the machine ethics of a car in a crash situation, assuming you accept Google’s position that humans will never react fast enough to resume control. Various trolley problem-like scenarios of minimizing irrevocable hurt to humans have been kicked around. But I think I already read a solution to the decision problem in the discussion-
a) Ethical-decisions-during-crash is going to be a very rare occurrence.
b) The over-all reduction in accidents is much more significant than a small subset of accidents theoretically made worse by the robot cars.
c) Humans can’t agree on complex algorithms for the hypothetical proposed scenarios anyway.
d) Machines always need a default mode when the planned-for reactions conflict.
So if you accept a-d above then you’ll probably agree that simply having the car slow to stop and pull over to the side as best it can, is the default which will produce the least damage. This is the same routine to follow if the car comes upon debris in the road, a wreck, confusing safety beacons, some catastrophe with the road itself and so forth. It’s pretty much what you’d tell your teenager to do.
But I think there are lessons to draw from the robot cars:
1) The robot, though fully autonomous in every-day situations, will encounter in an accident, an ever-narrowing range of options in its decision-tree so that it will end up with the default option only. In contrast, a human will panic and take action which often adds options to an already-over-loaded decision tree, options which can’t be evaluated in real-time and whose outcomes are probably worse than just stopping as fast as possible anyway.
2) Robots don’t have to be perfect, they just have to be better than humans in the aggregate, and, see #1, default to avoiding action when disaster strikes.
3) Once you get to #2, then you are already better than humans and therefore saving lives and property. At this point the engineers can further tune the robot to improve gradually.
So what about the paper-clip-monster, the AGI that wants to run the world and most important, writes its own code? I agree it could be done in theory, just as we’ll surely have computers running artificial evolution scenarios with DNA, and data-mining/surveillance on a scale so huge it makes the Stazi look like kindergarten. But as everyone has noted, writing your own code is utterly uncharted territory. A lot of LW commentators treat the prospect with myth: they propose an AGI that is better described as an alien overlord than a machine. Myth may be the only way humans can wrap their brains around an idea so big. Engineers won’t even try. They’ll break the problem up into bits, do a lot of error-checking at a level of action they do understand, and run it in the lab to see what happens. For instance if there is still a layered approach to software, the OS might have the safety mechanisms built in, and maybe won’t be self-upgradable, while the self-written code will run in apps that rely on the OS, then after a hundred similar steps of divide-and-conquer the system will be useful and controllable. But truly, I too am just hand-waving in a vacuum. Please continue...
I think the huge picture is pretty important to look at. If we know the goal is far away, then we know that current projects are not going to get their usefulness from solving the whole problem. But that’s fine, there are plenty of other uses for projects. Among others:
Early attempts can serve as landmarks for following ones, to help understand the problem.
Projects might work on implementing pieces that seem potentially useful given the big picture, like scaling with the skill of an externally-given world-model (and then this list can be applied to those sub-problems).
Blue-sky research on things that seem to have potential, without a mind to immediate application. Learning that the goal is far away means we want more blue-sky research.
Projects might be somewhere in-between, trying to integrate a novel development into current well-performing systems.
It feels to me like you are straying off the technical issues by looking at a huge picture.
In this case, a picture so huge it’s unsolvable. So here’s an assertion which might be interesting: Its better to focus on clusters of small, manageable machine-ethics problems and gradually build up to a Grand Scheme, or more likely in my guess, a Grand Messy But Workable System, rather than teasing-out a Bible of global ethical abstraction. There’s no working consensus on ethical rules anyway, outside the Three Laws.
An example, maybe already solved: autonomous cars are coming quite soon, much sooner than most of us thought. Several people have wondered about the machine ethics of a car in a crash situation, assuming you accept Google’s position that humans will never react fast enough to resume control. Various trolley problem-like scenarios of minimizing irrevocable hurt to humans have been kicked around. But I think I already read a solution to the decision problem in the discussion-
a) Ethical-decisions-during-crash is going to be a very rare occurrence. b) The over-all reduction in accidents is much more significant than a small subset of accidents theoretically made worse by the robot cars. c) Humans can’t agree on complex algorithms for the hypothetical proposed scenarios anyway. d) Machines always need a default mode when the planned-for reactions conflict.
So if you accept a-d above then you’ll probably agree that simply having the car slow to stop and pull over to the side as best it can, is the default which will produce the least damage. This is the same routine to follow if the car comes upon debris in the road, a wreck, confusing safety beacons, some catastrophe with the road itself and so forth. It’s pretty much what you’d tell your teenager to do.
But I think there are lessons to draw from the robot cars: 1) The robot, though fully autonomous in every-day situations, will encounter in an accident, an ever-narrowing range of options in its decision-tree so that it will end up with the default option only. In contrast, a human will panic and take action which often adds options to an already-over-loaded decision tree, options which can’t be evaluated in real-time and whose outcomes are probably worse than just stopping as fast as possible anyway. 2) Robots don’t have to be perfect, they just have to be better than humans in the aggregate, and, see #1, default to avoiding action when disaster strikes. 3) Once you get to #2, then you are already better than humans and therefore saving lives and property. At this point the engineers can further tune the robot to improve gradually.
So what about the paper-clip-monster, the AGI that wants to run the world and most important, writes its own code? I agree it could be done in theory, just as we’ll surely have computers running artificial evolution scenarios with DNA, and data-mining/surveillance on a scale so huge it makes the Stazi look like kindergarten. But as everyone has noted, writing your own code is utterly uncharted territory. A lot of LW commentators treat the prospect with myth: they propose an AGI that is better described as an alien overlord than a machine. Myth may be the only way humans can wrap their brains around an idea so big. Engineers won’t even try. They’ll break the problem up into bits, do a lot of error-checking at a level of action they do understand, and run it in the lab to see what happens. For instance if there is still a layered approach to software, the OS might have the safety mechanisms built in, and maybe won’t be self-upgradable, while the self-written code will run in apps that rely on the OS, then after a hundred similar steps of divide-and-conquer the system will be useful and controllable. But truly, I too am just hand-waving in a vacuum. Please continue...
I think the huge picture is pretty important to look at. If we know the goal is far away, then we know that current projects are not going to get their usefulness from solving the whole problem. But that’s fine, there are plenty of other uses for projects. Among others:
Early attempts can serve as landmarks for following ones, to help understand the problem.
Projects might work on implementing pieces that seem potentially useful given the big picture, like scaling with the skill of an externally-given world-model (and then this list can be applied to those sub-problems).
Blue-sky research on things that seem to have potential, without a mind to immediate application. Learning that the goal is far away means we want more blue-sky research.
Projects might be somewhere in-between, trying to integrate a novel development into current well-performing systems.