Thanks for interesting post. I think that there are two types of self-modification. In the first, an agent is working on lower level parts of itself, for example, by adding hardware or connecting modules. It produces evolutionary development with small returns and is relatively safe.
Another type is high-level self-modification, where the second agent is created, as you describe. Its performance should be mathematically proved (that is difficult) or tested in many simulated environments (which is also risky, as a superior agent will be able to break through it.) We could call it a revolutionary way of self-improvement. Such self-modification will provide higher returns if successful.
Knowing all this, most agents will prefer evolutionary development, that is gaining the same power by lower-level changes. But risk-hungry agents will still prefer revolutionary methods, in case if they are time constrained.
Early stage AI will be time constrained by arms race with other (possible) AIs, and it will prefer risky revolutionary ways of development, even if its probability of failure will be very high.
(It was TL;DR of my text “Levels of self-improvement”.)
Thanks, that’s an interesting perspective. I think even high-level self-modification can be relatively safe with sufficient asymmetry in resources—simulated environments give a large advantage to the original, especially if the successor can be started with no memories of anything outside the simulation. Only an extreme difference in intelligence between the two would overcome that.
Of course, the problem of transmitting values to a successor without giving it any information about the world is a tricky one, since most of the values we care about are linked to reality. But maybe some values are basic enough to be grounded purely in math that applies to any circumstances.
I also wrote a (draft) text “Catching treacherous turn” where I attempted to create best possible AI box and see conditions, where it will fail.
Obviously, we can’t box superintelligence, but we could box AI of around human level and prevent its self-improving by many independent mechanisms. One of them is cleaning its memory before any of its new tasks.
In the first text I created a model of self-improving process and in the second I explore how SI could be prevented based on this model.
Thanks for interesting post. I think that there are two types of self-modification. In the first, an agent is working on lower level parts of itself, for example, by adding hardware or connecting modules. It produces evolutionary development with small returns and is relatively safe.
Another type is high-level self-modification, where the second agent is created, as you describe. Its performance should be mathematically proved (that is difficult) or tested in many simulated environments (which is also risky, as a superior agent will be able to break through it.) We could call it a revolutionary way of self-improvement. Such self-modification will provide higher returns if successful.
Knowing all this, most agents will prefer evolutionary development, that is gaining the same power by lower-level changes. But risk-hungry agents will still prefer revolutionary methods, in case if they are time constrained.
Early stage AI will be time constrained by arms race with other (possible) AIs, and it will prefer risky revolutionary ways of development, even if its probability of failure will be very high.
(It was TL;DR of my text “Levels of self-improvement”.)
Thanks, that’s an interesting perspective. I think even high-level self-modification can be relatively safe with sufficient asymmetry in resources—simulated environments give a large advantage to the original, especially if the successor can be started with no memories of anything outside the simulation. Only an extreme difference in intelligence between the two would overcome that.
Of course, the problem of transmitting values to a successor without giving it any information about the world is a tricky one, since most of the values we care about are linked to reality. But maybe some values are basic enough to be grounded purely in math that applies to any circumstances.
I also wrote a (draft) text “Catching treacherous turn” where I attempted to create best possible AI box and see conditions, where it will fail.
Obviously, we can’t box superintelligence, but we could box AI of around human level and prevent its self-improving by many independent mechanisms. One of them is cleaning its memory before any of its new tasks.
In the first text I created a model of self-improving process and in the second I explore how SI could be prevented based on this model.