The crux of the matter is that a goal isn’t enough to enable the full potential of general intelligence, you also need to explicitly define how to achieve that goal.
Not really. If you don’t specify how, it will just choose one of the available ways.
For example, a chess program just tries to mate you quickly. Programmers typically don’t tell it how to do that, just what the goal is. Intelligent agents can figure out the details for themselves.
If you don’t specify how, it will just choose one of the available ways.
There needs to be some metric by which it can measure the available ways, as long as you don’t build it to choose one randomly. So if it doesn’t act randomly, why exactly would the favorable option be to act by consuming the whole world to improve its intelligence? Recursive self-improvement is a resource that can be used, not a mandatory way of accomplishing goals. There is nothing fundamentally rational about achieving goals efficiently and quickly. An artificial agent simply doesn’t care if you don’t make it care.
Your title argues against universal instrumental values—but IMO, you don’t have much of a case.
My case is that any instrumental values are relative, even intelligence and goal-preservation. An artificial agent simply doesn’t care not to die whatever it takes, to act as smart and fast as possible or to achieve any given goal economically.
If the AI is a maximizer rather than satisficer, then it will likely have a method for measuring the quality of it’s paths to achieving optimization that can be derived from it’s utility function and it’s model of the world. So the question isn’t whether it will be able to choose a path, but instead is: Is it more likely to choose a path where it sits around risking its own destruction or more likely to get started protecting things that share its goal (including itself) and acheiving some of its subgoals.
Also, if the AI is a satisficer then maybe that would increase its odds of sitting around waiting for continents to drift, but maybe not.
If you don’t specify how, it will just choose one of the available ways.
There needs to be some metric by which it can measure the available ways, as long as you don’t build it to choose one randomly.
So, thinking about the chess program, if the program has two ways to mate in three (and they have the same utility internally) it doesn’t nomally bother to use a random number generator—it just chooses the first one it found, the last one it found, or something like that. The details might depend on the move generation algorithm, or the tree pruning algorithm.
The point is that it still picks one that works, without the original programmer wiring preferences relating to the issue into its utility function.
There is nothing fundamentally rational about achieving goals efficiently and quickly. An artificial agent simply doesn’t care if you don’t make it care.
Sure—though humans often want speed and efficiency—so this is one of the very first preferences they tell their machines about. This seems like a side issue.
An artificial agent simply doesn’t care not to die whatever it takes, to act as smart and fast as possible or to achieve any given goal economically.
Speed arises out of discounting, which is ubiquitous, if not universal. Economy is not really much of an universal instrumental value—just something many people care about. I suppose there are some instrumental reasons for caring about economy—but it is not a great example. Not dying is a universal instrumental value, though—if you die it diminishes your control over the future. A wide range of agents can be expected to strive to avoid dying.
Just to highlight my point, here is a question nobody can answer right now. At what level of general intelligence would a chess program start the unbounded optimization of its chess skills? I don’t think that there is a point where a chess program would unexpectedly take over the world to refine its abilities. You will have to explicitly cause it to do so, it won’t happen as an unexpected implication of a much simpler algorithm. At least not if it works given limited resources.
...humans often want speed and efficiency—so this is one of the very first preferences they tell their machines about.
Yes, yet most of our machines are defined to work under certain spatio-temporal scope boundaries and resource limitations. I am not saying that humans won’t try to make their machines as smart as possible, I am objecting to the idea that it is the implicit result of most AGI designs. I perceive dangerous recursive self-improvement as a natural implication of general intelligence to be as unlikely as an AGI that is automatically friendly.
Causing an artificial general intelligence to consume the world, in order to improve itself, seems to be as hard as making it care about humans. Both concepts seem very natural to agents like us, agents that are the effect of natural selection, that wouldn’t exist if they didn’t win a lot of fitness competitions in the past. But artificial agents lack all of that vast amount of causes that prompt us to do what we do.
Not dying is a universal instrumental value, though—if you die it diminishes your control over the future. A wide range of agents can be expected to strive to avoid dying.
This is a concept that needs to be made explicit in every detail. We know what it means to die, an artificial agent won’t. Does it die if it stops computing? Does it die if it changes its substrate?
There is a huge amount of concepts that we are still unable to describe mathematically. Recursive self-improvement might sound intuitively appealing, but it is nothing that will just happen. Just like friendliness, it takes explicit, mathematically precise definitions to cause an artificial agent to undergo recursive self-improvement.
Not dying is a universal instrumental value, though—if you die it diminishes your control over the future. A wide range of agents can be expected to strive to avoid dying.
This is a concept that needs to be made explicit in every detail. We know what it means to die, an artificial agent won’t. Does it die if it stops computing? Does it die if it changes its substrate?
So, death may have some subtleties, but essentially it involves permanent and drastic loss of function—so cars die, computer systems die. buildings die—etc. For software, we are talking about this.
At what level of general intelligence would a chess program start the unbounded optimization of its chess skills?
You have mostly answered it yourself. Never. Or until a motivation for doing so is provided by some external agent. The biological evolution filled our brains with the intelligence AND a will to do such things as not to only win a chess game, but to use the whole Moon to get enough computing power to be nearly an optimal chess player.
Power without control is nothing. Intelligence without motives is also nothing, in that sense.
There is a huge amount of concepts that we are still unable to describe mathematically. Recursive self-improvement might sound intuitively appealing, but it is nothing that will just happen. Just like friendliness, it takes explicit, mathematically precise definitions to cause an artificial agent to undergo recursive self-improvement.
Machine autocatalysis is already happening. That’s the point of my The Intelligence Explosion Is Happening Now essay. Whatever tech is needed to result in self-improvement is already out there—and the ball is rolling. What happens next is that the man-machine civilisation becomes more machine. That’s the well-known process of automation. The whole process is already self-improving, and it has been since the first living thing.
Self-improvement is not really something where we get to decide whether to build it in.
I perceive dangerous recursive self-improvement as a natural implication of general intelligence to be as unlikely as an AGI that is automatically friendly.
Well already technological progress is acting in an autocatalytic fashion. Progress is fast, and numerous people are losing their jobs and suffering as a result. It seems likely that progress will get faster, and even more people will be affected by this kind of future shock.
We see autocatalytic improvements in technology taking place today—and they seem likely to be more common in the future.
Climbing the Tower of optimisation is not inevitable, but it looks as though it would take a totalitarian government to slow progress down.
I am not saying that humans won’t try to make their machines as smart as possible, I am objecting to the idea that it is the implicit result of most AGI designs. I perceive dangerous recursive self-improvement as a natural implication of general intelligence to be as unlikely as an AGI that is automatically friendly.
Well, there’s a sense in which “most” bridges collapse, “most” ships sink and “most” planes crash.
That sense is not very useful in practice—the actual behaviour of engineered structures depends on a whole bunch of sociological considerations. If yopu want to see whether engineering projects will kill people, you have to look into those issues—because a “counting” argument tells you practically nothing of interest.
Not really. If you don’t specify how, it will just choose one of the available ways.
For example, a chess program just tries to mate you quickly. Programmers typically don’t tell it how to do that, just what the goal is. Intelligent agents can figure out the details for themselves.
Your title argues against universal instrumental values—but IMO, you don’t have much of a case.
There needs to be some metric by which it can measure the available ways, as long as you don’t build it to choose one randomly. So if it doesn’t act randomly, why exactly would the favorable option be to act by consuming the whole world to improve its intelligence? Recursive self-improvement is a resource that can be used, not a mandatory way of accomplishing goals. There is nothing fundamentally rational about achieving goals efficiently and quickly. An artificial agent simply doesn’t care if you don’t make it care.
My case is that any instrumental values are relative, even intelligence and goal-preservation. An artificial agent simply doesn’t care not to die whatever it takes, to act as smart and fast as possible or to achieve any given goal economically.
If the AI is a maximizer rather than satisficer, then it will likely have a method for measuring the quality of it’s paths to achieving optimization that can be derived from it’s utility function and it’s model of the world. So the question isn’t whether it will be able to choose a path, but instead is: Is it more likely to choose a path where it sits around risking its own destruction or more likely to get started protecting things that share its goal (including itself) and acheiving some of its subgoals.
Also, if the AI is a satisficer then maybe that would increase its odds of sitting around waiting for continents to drift, but maybe not.
It doesn’t need to be random. Can can be merely arbitrary.
So, thinking about the chess program, if the program has two ways to mate in three (and they have the same utility internally) it doesn’t nomally bother to use a random number generator—it just chooses the first one it found, the last one it found, or something like that. The details might depend on the move generation algorithm, or the tree pruning algorithm.
The point is that it still picks one that works, without the original programmer wiring preferences relating to the issue into its utility function.
Sure—though humans often want speed and efficiency—so this is one of the very first preferences they tell their machines about. This seems like a side issue.
Speed arises out of discounting, which is ubiquitous, if not universal. Economy is not really much of an universal instrumental value—just something many people care about. I suppose there are some instrumental reasons for caring about economy—but it is not a great example. Not dying is a universal instrumental value, though—if you die it diminishes your control over the future. A wide range of agents can be expected to strive to avoid dying.
Just to highlight my point, here is a question nobody can answer right now. At what level of general intelligence would a chess program start the unbounded optimization of its chess skills? I don’t think that there is a point where a chess program would unexpectedly take over the world to refine its abilities. You will have to explicitly cause it to do so, it won’t happen as an unexpected implication of a much simpler algorithm. At least not if it works given limited resources.
Yes, yet most of our machines are defined to work under certain spatio-temporal scope boundaries and resource limitations. I am not saying that humans won’t try to make their machines as smart as possible, I am objecting to the idea that it is the implicit result of most AGI designs. I perceive dangerous recursive self-improvement as a natural implication of general intelligence to be as unlikely as an AGI that is automatically friendly.
Causing an artificial general intelligence to consume the world, in order to improve itself, seems to be as hard as making it care about humans. Both concepts seem very natural to agents like us, agents that are the effect of natural selection, that wouldn’t exist if they didn’t win a lot of fitness competitions in the past. But artificial agents lack all of that vast amount of causes that prompt us to do what we do.
This is a concept that needs to be made explicit in every detail. We know what it means to die, an artificial agent won’t. Does it die if it stops computing? Does it die if it changes its substrate?
There is a huge amount of concepts that we are still unable to describe mathematically. Recursive self-improvement might sound intuitively appealing, but it is nothing that will just happen. Just like friendliness, it takes explicit, mathematically precise definitions to cause an artificial agent to undergo recursive self-improvement.
So, death may have some subtleties, but essentially it involves permanent and drastic loss of function—so cars die, computer systems die. buildings die—etc. For software, we are talking about this.
You have mostly answered it yourself. Never. Or until a motivation for doing so is provided by some external agent. The biological evolution filled our brains with the intelligence AND a will to do such things as not to only win a chess game, but to use the whole Moon to get enough computing power to be nearly an optimal chess player.
Power without control is nothing. Intelligence without motives is also nothing, in that sense.
Machine autocatalysis is already happening. That’s the point of my The Intelligence Explosion Is Happening Now essay. Whatever tech is needed to result in self-improvement is already out there—and the ball is rolling. What happens next is that the man-machine civilisation becomes more machine. That’s the well-known process of automation. The whole process is already self-improving, and it has been since the first living thing.
Self-improvement is not really something where we get to decide whether to build it in.
Well already technological progress is acting in an autocatalytic fashion. Progress is fast, and numerous people are losing their jobs and suffering as a result. It seems likely that progress will get faster, and even more people will be affected by this kind of future shock.
We see autocatalytic improvements in technology taking place today—and they seem likely to be more common in the future.
Climbing the Tower of optimisation is not inevitable, but it looks as though it would take a totalitarian government to slow progress down.
Well, there’s a sense in which “most” bridges collapse, “most” ships sink and “most” planes crash.
That sense is not very useful in practice—the actual behaviour of engineered structures depends on a whole bunch of sociological considerations. If yopu want to see whether engineering projects will kill people, you have to look into those issues—because a “counting” argument tells you practically nothing of interest.