Well, let me describe the sort of architecture I have in mind.
The AI has a “knowledge base”, which is some sort of database containing everything it knows. The knowledge base includes a set of heuristics. The AI also has a “thought heap”, which is a set of all the things it plans to think about, ordered by how promising the thoughts seem to be. Each thought is just a heuristic, maybe with some parameters. The AI works by taking a thought from the heap and doing whatever it says, repeatedly.
Heuristics would be restricted, though. They would be things like “try to figure out whether or not this number is irrational”, or “think about examples”. You couldn’t say, “make two more copies of this heuristic”, or “change your supergoal to something random”. You could say “simulate what would happen if you changed your supergoal to something random”, but heuristics like this wouldn’t necessarily be harmful, because the AI wouldn’t blindly copy the results of the simulation; it would just think about them.
It seems plausible to me that an AI could take off simply by having correct reasoning methods written into it from the start, and by collecting data about what questions are good to ask.
If we wanted to prevent a system from improving itself, couldn’t we just lock up
its hardware and not tell it how to access its own machine code? For an intelligent system,
impediments like these just become problems to solve in the process of meeting its
goals. If the payoff is great enough, a system will go to great lengths to accomplish an
outcome. If the runtime environment of the system does not allow it to modify its own
machine code, it will be motivated to break the protection mechanisms of that runtime.
For example, it might do this by understanding and altering the runtime itself. If it can’t
do that through software, it will be motivated to convince or trick a human operator into
making the changes. Any attempt to place external constraints on a system’s ability to
improve itself will ultimately lead to an arms race of measures and countermeasures.
Another approach to keeping systems from self-improving is to try to restrain them
from the inside; to build them so that they don’t want to self-improve. For most systems,
it would be easy to do this for any specific kind of self-improvement. For example,
the system might feel a “revulsion” to changing its own machine code. But this kind
of internal goal just alters the landscape within which the system makes its choices. It
doesn’t change the fact that there are changes which would improve its future ability to
meet its goals. The system will therefore be motivated to find ways to get the benefits
of those changes without triggering its internal “revulsion”. For example, it might build
other systems which are improved versions of itself. Or it might build the new algorithms
into external “assistants” which it calls upon whenever it needs to do a certain kind of
computation. Or it might hire outside agencies to do what it wants to do. Or it might
build an interpreted layer on top of its machine code layer which it can program without
revulsion. There are an endless number of ways to circumvent internal restrictions unless
they are formulated extremely carefully.
I’m not really qualified to answer you here, but here goes anyway.
I suspect that either your base design is flawed, or the restrictions on heuristics would render the program useless. Also, I don’t think it would be quite as easy to control heuristics as you seem to think.
Also, AI people who actually know what they’re talking about, unlike me, seem to disagree with you. Again, I wish I could remember where it was I was reading about this.
Well, let me describe the sort of architecture I have in mind.
The AI has a “knowledge base”, which is some sort of database containing everything it knows. The knowledge base includes a set of heuristics. The AI also has a “thought heap”, which is a set of all the things it plans to think about, ordered by how promising the thoughts seem to be. Each thought is just a heuristic, maybe with some parameters. The AI works by taking a thought from the heap and doing whatever it says, repeatedly.
Heuristics would be restricted, though. They would be things like “try to figure out whether or not this number is irrational”, or “think about examples”. You couldn’t say, “make two more copies of this heuristic”, or “change your supergoal to something random”. You could say “simulate what would happen if you changed your supergoal to something random”, but heuristics like this wouldn’t necessarily be harmful, because the AI wouldn’t blindly copy the results of the simulation; it would just think about them.
It seems plausible to me that an AI could take off simply by having correct reasoning methods written into it from the start, and by collecting data about what questions are good to ask.
I found the paper I was talking about. The Basic AI Drives, by Stephen M. Omohundro.
From the paper:
I’m not really qualified to answer you here, but here goes anyway.
I suspect that either your base design is flawed, or the restrictions on heuristics would render the program useless. Also, I don’t think it would be quite as easy to control heuristics as you seem to think.
Also, AI people who actually know what they’re talking about, unlike me, seem to disagree with you. Again, I wish I could remember where it was I was reading about this.