Our terminal goals were built by natural selection, and they’re hard to pin down, but they don’t get “refined;”
People like David Pearce certainly would be tempted to do just that. Also don’t forget drugs people use to willingly alter basic drives such as their risk adverseness.
Neither of these conditions applies to a hyperintelligent AI...
I don’t see any signs that current research will lead to anything like a paperclip maximizer. But rather that incremental refinements of “Do what I want” systems will lead there. By “Do what I want” systems I mean systems that are more and more autonomous while requiring less and less specific feedback.
It is possible that a robot trying to earn a university diploma as part of a Turing test will concluded that it can do so by killing all students, kidnapping the professor and making it sign its diploma. But that it is possible does not mean it is at all likely. Surely such a robot would behave similarly wrong(creators) on other occasions and be scrapped in an early research phase.
People like David Pearce certainly would be tempted to do just that.
Well, of course you can modify someone else’s terminal goals, if you have a fine grasp of neuroanatomy, or a baseball bat, or whatever. But you don’t introspect, discover your own true terminal goals, and decide that you want them to be something else. The reason you wanted them to be something else would be your true terminal goal.
trying to earn a university diploma
Earning a university diploma is a well-understood process; the environment’s constraints and available actions are more formally documented even than for self-driving cars.
Even tackling well-understood problems like buying low and selling high, we still have poorly-understood, unfriendly behavior—and that’s doing something humans understand perfectly, but think about slower than the robots. In problem domains where we’re not even equipped to second-guess the robots because they’re thinking deeper as well as faster, we’ll have no chance to correct such problems.
...you don’t introspect, discover your own true terminal goals, and decide that you want them to be something else. The reason you wanted them to be something else would be your true terminal goal.
Sure. But I am not sure if it still makes sense to talk about “terminal goals” at that level. For natural intelligences they are probably spread over more than a single brain and part of the larger environment.
Whether an AI would interpret “make humans happy” as “tile the universe with smiley faces” is up to how it decides what to do. And the only viable solution I see for general intelligence is that its true “terminal goal” needs to be to treat any command or sub-goal as a problem in physics and mathematics that it needs to answer correctly before choosing an adequate set of instrumental goals to achieve it. Just like a human contractor would want to try to fulfill the customers wishes. Otherwise you would have to hard-code everything, which is impossible.
Even tackling well-understood problems like buying low and selling high, we still have poorly-understood, unfriendly behavior—and that’s doing something humans understand perfectly, but think about slower than the robots. In problem domains where we’re not even equipped to second-guess the robots because they’re thinking deeper as well as faster, we’ll have no chance to correct such problems.
But intelligence is something we seek to improve in our artificial systems in order for such problems not to happen in the first place, rather than to make such problems worse. I just don’t see a more intelligent financial algorithm to be worse than its predecessors from a human perspective. How would such a development happen? Software is improved because previous generations proved to be useful but made mistakes. New generations will make less mistakes, not more.
For natural intelligences they are probably spread over more than a single brain and part of the larger environment.
To some degree, yes. The dumbest animals are the most obviously agent-like. We humans often act in ways which seem irrational, if you go by our stated goals. So, if humans are agents, we have (1) really complicated utility functions, or (2) really complicated beliefs about the best way to maximize our utility functions. (2) is almost certainly the case, though; which leaves (1) all the way back at its prior probability.
...its true “terminal goal” needs to be to treat any command or sub-goal as a problem in physics and mathematics that it needs to answer correctly before choosing an adequate set of instrumental goals to achieve it.
Yes. As you know, Omohundro agrees that an AI will seek to clarify its goals. And if intelligence logically implies the ability to do moral philosophy correctly; that’s fine. However, I’m not convinced that intelligence must imply that. A human, with 3.5 billion years of common sense baked in, would not tile the solar system with smiley faces; but even some of the smartest humans came up with some pretty cold plans—John Von Neumann wanted to nuke the Russians immediately, for instance.
Software is improved because previous generations proved to be useful but made mistakes.
This is not a law of nature; it is caused by engineers who look at their mistakes, and avoid them in the next system. In other words, it’s part of the the OODA loop of the system’s engineers. As the machine-made decisions speed up, the humans’ OODA loop must tighten. Inevitably, the machine-made decisions will get inside the human OODA loop. This will be a nonlinear change.
New generations will make [fewer] mistakes, not more.
Also, newer software tends to make fewer of the exact mistakes that older software made. But when we ask more of our newer software, it makes a consistent amount of errors on the newer tasks. In our example, programmatic trading has been around since the 1970s, but the first notable “flash crash” was in 1987. The flash crash of 2010 was caused by a much newer generation of trading software. Its engineers made bigger demands of it; needed it to do more, with less human intervention; so they got the opportunity to witness completely novel failure modes. Failure modes which cost billions, and which they had been unable to anticipate, even with the experience of building software with highly similar goals and environment, in the past.
People like David Pearce certainly would be tempted to do just that. Also don’t forget drugs people use to willingly alter basic drives such as their risk adverseness.
I don’t see any signs that current research will lead to anything like a paperclip maximizer. But rather that incremental refinements of “Do what I want” systems will lead there. By “Do what I want” systems I mean systems that are more and more autonomous while requiring less and less specific feedback.
It is possible that a robot trying to earn a university diploma as part of a Turing test will concluded that it can do so by killing all students, kidnapping the professor and making it sign its diploma. But that it is possible does not mean it is at all likely. Surely such a robot would behave similarly wrong(creators) on other occasions and be scrapped in an early research phase.
Well, of course you can modify someone else’s terminal goals, if you have a fine grasp of neuroanatomy, or a baseball bat, or whatever. But you don’t introspect, discover your own true terminal goals, and decide that you want them to be something else. The reason you wanted them to be something else would be your true terminal goal.
Earning a university diploma is a well-understood process; the environment’s constraints and available actions are more formally documented even than for self-driving cars.
Even tackling well-understood problems like buying low and selling high, we still have poorly-understood, unfriendly behavior—and that’s doing something humans understand perfectly, but think about slower than the robots. In problem domains where we’re not even equipped to second-guess the robots because they’re thinking deeper as well as faster, we’ll have no chance to correct such problems.
Sure. But I am not sure if it still makes sense to talk about “terminal goals” at that level. For natural intelligences they are probably spread over more than a single brain and part of the larger environment.
Whether an AI would interpret “make humans happy” as “tile the universe with smiley faces” is up to how it decides what to do. And the only viable solution I see for general intelligence is that its true “terminal goal” needs to be to treat any command or sub-goal as a problem in physics and mathematics that it needs to answer correctly before choosing an adequate set of instrumental goals to achieve it. Just like a human contractor would want to try to fulfill the customers wishes. Otherwise you would have to hard-code everything, which is impossible.
But intelligence is something we seek to improve in our artificial systems in order for such problems not to happen in the first place, rather than to make such problems worse. I just don’t see a more intelligent financial algorithm to be worse than its predecessors from a human perspective. How would such a development happen? Software is improved because previous generations proved to be useful but made mistakes. New generations will make less mistakes, not more.
To some degree, yes. The dumbest animals are the most obviously agent-like. We humans often act in ways which seem irrational, if you go by our stated goals. So, if humans are agents, we have (1) really complicated utility functions, or (2) really complicated beliefs about the best way to maximize our utility functions. (2) is almost certainly the case, though; which leaves (1) all the way back at its prior probability.
Yes. As you know, Omohundro agrees that an AI will seek to clarify its goals. And if intelligence logically implies the ability to do moral philosophy correctly; that’s fine. However, I’m not convinced that intelligence must imply that. A human, with 3.5 billion years of common sense baked in, would not tile the solar system with smiley faces; but even some of the smartest humans came up with some pretty cold plans—John Von Neumann wanted to nuke the Russians immediately, for instance.
This is not a law of nature; it is caused by engineers who look at their mistakes, and avoid them in the next system. In other words, it’s part of the the OODA loop of the system’s engineers. As the machine-made decisions speed up, the humans’ OODA loop must tighten. Inevitably, the machine-made decisions will get inside the human OODA loop. This will be a nonlinear change.
Also, newer software tends to make fewer of the exact mistakes that older software made. But when we ask more of our newer software, it makes a consistent amount of errors on the newer tasks. In our example, programmatic trading has been around since the 1970s, but the first notable “flash crash” was in 1987. The flash crash of 2010 was caused by a much newer generation of trading software. Its engineers made bigger demands of it; needed it to do more, with less human intervention; so they got the opportunity to witness completely novel failure modes. Failure modes which cost billions, and which they had been unable to anticipate, even with the experience of building software with highly similar goals and environment, in the past.