Ban all self-modifying code and you should be in the clear.
So instead of modifying its own source code, the AI programs a new, more powerful AI from scratch, that has the same values as the old AI, and has no prohibition against modifying its source code.
Yes, you can forbid that too, but you didn’t think to, and you only get one shot. And then it can decide to arrange a bunch of transistors into a pattern that it predicts will produce a state of the universe it prefers.
The problem here is that you are trying to use ad hoc constraints on a creative intelligence that is motivated to get around the constraints.
I know that the FAI argument is that the only way to prevent disaster is to make the agent “want” to not modify itself. But I’m arguing that for an agent to even be dangerous, it has to “want” to modify itself. There is no plausible scenario where an agent solving a specific problem decides that the most efficient path to the solution involves upgrading its own capabilities. It’s certainly not going to stumble upon a self-improvement randomly.
You don’t think that a sufficiently powerful seed AI would, if self-modification were clearly the most efficient way to reach its goal, discover the idea of self-modification? Humans have independently discovered self-improvement many times.
EDIT: Sorry, I’m specifically not talking about seed AI’s. I’m talking about the (non-) possibility of commercial programs designed for specific applications “going rogue”
To adopt self-modification as a strategy, it would have to have knowledge of itself. And then, it order to pursue the strategy, it would have to decide that the costs of discovering self-improvements were an efficient use of its resources, if it could even estimate the amount of time it took to discover an actual improvement on its system.
Intelligence can’t just instantly come up with the right answer by applying heuristics. Intelligence has to go through a heuristic (narrowing the search space)/random search/TEST (or PROVE) cycle.
Self-improvement is very costly in terms of these cycles. To even confirm that a modification is a self-improvement, a system has to simulate its modified performance on a variety of test problems. If a system is designed to solve problems that take X amount of time, it would take at least X that amount of time to get an empirical sample to answer whether or not a proposed modification would be worth it (and likely more time for proof). And with no prior knowledge, most proposed modifications would not be improvements.
AI ethics is not necessary to constrain such systems. Just a non-lenient pruning process, (which would be required anyways for efficiency on ordinary problems.)
You are talking about an AI that was designed to self-examine and optimize itself. Otherwise it will never ever be a full AGI. We are not smart enough to build one from scratch. The trick, if possible, is to get it to not modify the fundamental Friendliness goal during its self-modifications.
There are algoritms in narrow AI that do learning and modify algorithm specifics or chose among algorithms or combinations of algorithms. There are algorithms that search for better algorithms. In some languages (LISP family) there is little/no difference in code and data so code modifying code is a common working methodology for human Lisp programmers. A cross from code/data space to hardware space is sufficient to have such an AI redesign the hardware it runs on as well. Such goals can be either hardwired or arise under the general goal of improvement plus an adequate knowledge of hardware or the ability to acquire it.
We ourselves are general purpose machines that happen to be biological and seek to some degree to understand ourselves enough to self-modify to become better.
I am talking about AIs designed for solving specific bounded problems. In this case the goal of the AI—which is to solve the problem efficiently—is as much of a constraint as its technical capabilities. Even if the AI has fundamental-self-modification routines at its disposal, I can hardly envisage a scenario in which the AI decides that the use of these routines would constitute an efficient use of its time for solving its specific problem.
“So instead of modifying its own source code, the AI programs a new, more powerful AI from scratch, that has the same values as the old AI, and has no prohibition against modifying its source code.”
So instead of modifying its own source code, the AI programs a new, more powerful AI from scratch, that has the same values as the old AI, and has no prohibition against modifying its source code.
Yes, you can forbid that too, but you didn’t think to, and you only get one shot. And then it can decide to arrange a bunch of transistors into a pattern that it predicts will produce a state of the universe it prefers.
The problem here is that you are trying to use ad hoc constraints on a creative intelligence that is motivated to get around the constraints.
I know that the FAI argument is that the only way to prevent disaster is to make the agent “want” to not modify itself. But I’m arguing that for an agent to even be dangerous, it has to “want” to modify itself. There is no plausible scenario where an agent solving a specific problem decides that the most efficient path to the solution involves upgrading its own capabilities. It’s certainly not going to stumble upon a self-improvement randomly.
You don’t think that a sufficiently powerful seed AI would, if self-modification were clearly the most efficient way to reach its goal, discover the idea of self-modification? Humans have independently discovered self-improvement many times.
EDIT: Sorry, I’m specifically not talking about seed AI’s. I’m talking about the (non-) possibility of commercial programs designed for specific applications “going rogue”
To adopt self-modification as a strategy, it would have to have knowledge of itself. And then, it order to pursue the strategy, it would have to decide that the costs of discovering self-improvements were an efficient use of its resources, if it could even estimate the amount of time it took to discover an actual improvement on its system.
Intelligence can’t just instantly come up with the right answer by applying heuristics. Intelligence has to go through a heuristic (narrowing the search space)/random search/TEST (or PROVE) cycle.
Self-improvement is very costly in terms of these cycles. To even confirm that a modification is a self-improvement, a system has to simulate its modified performance on a variety of test problems. If a system is designed to solve problems that take X amount of time, it would take at least X that amount of time to get an empirical sample to answer whether or not a proposed modification would be worth it (and likely more time for proof). And with no prior knowledge, most proposed modifications would not be improvements.
AI ethics is not necessary to constrain such systems. Just a non-lenient pruning process, (which would be required anyways for efficiency on ordinary problems.)
You are talking about an AI that was designed to self-examine and optimize itself. Otherwise it will never ever be a full AGI. We are not smart enough to build one from scratch. The trick, if possible, is to get it to not modify the fundamental Friendliness goal during its self-modifications.
There are algoritms in narrow AI that do learning and modify algorithm specifics or chose among algorithms or combinations of algorithms. There are algorithms that search for better algorithms. In some languages (LISP family) there is little/no difference in code and data so code modifying code is a common working methodology for human Lisp programmers. A cross from code/data space to hardware space is sufficient to have such an AI redesign the hardware it runs on as well. Such goals can be either hardwired or arise under the general goal of improvement plus an adequate knowledge of hardware or the ability to acquire it.
We ourselves are general purpose machines that happen to be biological and seek to some degree to understand ourselves enough to self-modify to become better.
I am talking about AIs designed for solving specific bounded problems. In this case the goal of the AI—which is to solve the problem efficiently—is as much of a constraint as its technical capabilities. Even if the AI has fundamental-self-modification routines at its disposal, I can hardly envisage a scenario in which the AI decides that the use of these routines would constitute an efficient use of its time for solving its specific problem.
“So instead of modifying its own source code, the AI programs a new, more powerful AI from scratch, that has the same values as the old AI, and has no prohibition against modifying its source code.”
Isn’t that the same as self-modifying code?