Why exactly do we want “recursive self-improvement” anyways?
You get that in many goal-directed systems, whether you ask for it or not.
Why not build into the architecture the impossibility of rewriting its own code,
prove the “friendliness” of the software that we put there, and then push the
ON button without qualms.
Impossible is not easy to implement. You can make it difficult for a machine to improve itself, but then that just becomes a challenge that it must overcome in order to reach its goals. If the agent is sufficiently smart, it may find some way of doing it.
Many here think that if you have a sufficiently intelligent agent that wants to do something you don’t want it to do, you are probably soon going to find that it will find some way to get what it wants. Thus the interest in trying to get its goals and your goals better aligned.
Also, humans might well want to let the machine self-improve. They are in a race with competitiors; the machine says it can help with that, and it warns that—if the humans don’t let it—the competitiors are likely to pull ahead...
You get that in many goal-directed systems, whether you ask for it or not.
Impossible is not easy to implement. You can make it difficult for a machine to improve itself, but then that just becomes a challenge that it must overcome in order to reach its goals. If the agent is sufficiently smart, it may find some way of doing it.
Many here think that if you have a sufficiently intelligent agent that wants to do something you don’t want it to do, you are probably soon going to find that it will find some way to get what it wants. Thus the interest in trying to get its goals and your goals better aligned.
Also, humans might well want to let the machine self-improve. They are in a race with competitiors; the machine says it can help with that, and it warns that—if the humans don’t let it—the competitiors are likely to pull ahead...