For natural intelligences they are probably spread over more than a single brain and part of the larger environment.
To some degree, yes. The dumbest animals are the most obviously agent-like. We humans often act in ways which seem irrational, if you go by our stated goals. So, if humans are agents, we have (1) really complicated utility functions, or (2) really complicated beliefs about the best way to maximize our utility functions. (2) is almost certainly the case, though; which leaves (1) all the way back at its prior probability.
...its true “terminal goal” needs to be to treat any command or sub-goal as a problem in physics and mathematics that it needs to answer correctly before choosing an adequate set of instrumental goals to achieve it.
Yes. As you know, Omohundro agrees that an AI will seek to clarify its goals. And if intelligence logically implies the ability to do moral philosophy correctly; that’s fine. However, I’m not convinced that intelligence must imply that. A human, with 3.5 billion years of common sense baked in, would not tile the solar system with smiley faces; but even some of the smartest humans came up with some pretty cold plans—John Von Neumann wanted to nuke the Russians immediately, for instance.
Software is improved because previous generations proved to be useful but made mistakes.
This is not a law of nature; it is caused by engineers who look at their mistakes, and avoid them in the next system. In other words, it’s part of the the OODA loop of the system’s engineers. As the machine-made decisions speed up, the humans’ OODA loop must tighten. Inevitably, the machine-made decisions will get inside the human OODA loop. This will be a nonlinear change.
New generations will make [fewer] mistakes, not more.
Also, newer software tends to make fewer of the exact mistakes that older software made. But when we ask more of our newer software, it makes a consistent amount of errors on the newer tasks. In our example, programmatic trading has been around since the 1970s, but the first notable “flash crash” was in 1987. The flash crash of 2010 was caused by a much newer generation of trading software. Its engineers made bigger demands of it; needed it to do more, with less human intervention; so they got the opportunity to witness completely novel failure modes. Failure modes which cost billions, and which they had been unable to anticipate, even with the experience of building software with highly similar goals and environment, in the past.
To some degree, yes. The dumbest animals are the most obviously agent-like. We humans often act in ways which seem irrational, if you go by our stated goals. So, if humans are agents, we have (1) really complicated utility functions, or (2) really complicated beliefs about the best way to maximize our utility functions. (2) is almost certainly the case, though; which leaves (1) all the way back at its prior probability.
Yes. As you know, Omohundro agrees that an AI will seek to clarify its goals. And if intelligence logically implies the ability to do moral philosophy correctly; that’s fine. However, I’m not convinced that intelligence must imply that. A human, with 3.5 billion years of common sense baked in, would not tile the solar system with smiley faces; but even some of the smartest humans came up with some pretty cold plans—John Von Neumann wanted to nuke the Russians immediately, for instance.
This is not a law of nature; it is caused by engineers who look at their mistakes, and avoid them in the next system. In other words, it’s part of the the OODA loop of the system’s engineers. As the machine-made decisions speed up, the humans’ OODA loop must tighten. Inevitably, the machine-made decisions will get inside the human OODA loop. This will be a nonlinear change.
Also, newer software tends to make fewer of the exact mistakes that older software made. But when we ask more of our newer software, it makes a consistent amount of errors on the newer tasks. In our example, programmatic trading has been around since the 1970s, but the first notable “flash crash” was in 1987. The flash crash of 2010 was caused by a much newer generation of trading software. Its engineers made bigger demands of it; needed it to do more, with less human intervention; so they got the opportunity to witness completely novel failure modes. Failure modes which cost billions, and which they had been unable to anticipate, even with the experience of building software with highly similar goals and environment, in the past.