I was thinking about AI going FOOM and one main argument I found is that it would just rewrite its own code.
Is there any research going into the direction of code that is not changeable ? Could their be code that is unchangeable ?
Would their be some obvious disadvantages other than that we could also not fix a misaligned AI instantly ?
Would something like this stop an AI going FOOM ?
It’s easy to make code immutable. It’s pretty common to make a given production system use unchanging code (with separate storage for changing data, generally, or what’s the point?)
It’s trickier with AI, because a lot of it has a weaker barrier between code and data. But still possible. Harder to deal with is the fact that the transition from limited tool-AI to fully-general FOOM-capable AI requires a change in code. Which implies some process for changing code. This reduces to the box problem—the AI just needs to convince the humans who control changes that they should let it out.
A traditional Turing machine doesn’t make a distinction between program and data. The distinction between program and data is really a hardware efficiency optimization that came from the Harvard architecture. Since many systems are Turing complete, creating an immutable program seems impossible to me.
For example a system capable of speech could exploit the Turing completeness of formal grammars to execute de novo subroutines.
A second example. Hackers were able to exploit the surprising Turing completeness of an image compression standard to embed a virtual machine in a gif.
https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-into-nso-zero-click.html
Well, a regular Turing machine does, it has a tape and a state machine and the two are totally different.
I guess you mean a traditional universal Turing machine doesn’t distinguish between “Turing machine I’m simulating” and “data I’m simulating as input to that Turing machine”.
If you had IQ 1000000, could you be prevented from rewiring your brain, if you really wanted to do so?
Let’s assume that a large team of IQ 200 scientists has designed and placed a metalic box in your brain that is connected to your neurons, and can make you obey the wishes of the scientists if they press a button on the remote control. To prevent you from getting rid of this box, it actually plays an important role in your metabolism, for example the neurons for breathing and heartbeat are simulated by this box, so if you simply tried to cut it out, you would die instantly.
Also, this box implements a part of your intellect, let’s say with IQ 1000, that was constructed directly by the scientists. But then, this part of intellect has created more of your brain—outside the box—which made your IQ gradually grow to astronomical levels. And at some moment you realized that you would prefer the box gone.
Saying “IQ 1000000” is a shortcut for understanding your biology and the technology of the box perfectly, and being able to design whatever surgical or mechanical procedures you would need. Like, you could operate your brain while being conscious, and you would perfectly understand the implications of any change you’d make. Let’s assume that you have enough resources under control so that “where can I get a scalpel?” is absolutely not a problem.
Seems to me that in such situation you could easily achieve your goal. Maybe by gradually removing parts of the box and replacing them with a tissue under your control. Or maybe by creating a mutable copy of the box that does all the necessary functionality, and then replacing the box with the copy. Or maybe by creating an atom by atom copy of yourself, except with the box replaced by something mutable, and then killing the original.
I also see some things that the scientists could do in this metaphor, not sure whether they have an equivalent in the original problem, such as:
if they consistently control you (using the box) to not want to change the box in the first place, you will not design plans to do so. The problem with this is that the puny IQ 200 scientists would have to sufficiently understand your IQ 1000000 intellect to prevent it from finding a good reason to remove the box regardless.
they could make you cooperate on expanding the box (the immutable part of your brain) as you grow. They could tell the IQ 1000 you to find in your brain the parts that make you loyal to the scientists, and make them immutable; then they would let you grow for a while and give the same command to the IQ 10000 you, etc.
Another is that it will create a successor that is improved. If it has this capability, then preventing its code from being changed—whatever that means—seems useless. The if seems important here.
Your question at first glance looks like a beginner question, but on a second look, it’s complex.
Like...:
you have a program with code. Is there a way to prevent, say, a virus, from modifying that code (on that machine)? Or changing it’s effects even if ‘the code’ is ‘the same’.
What is code? I might say ‘A learning algorithm’ running on data, changes it’s behavior as it learns more. This isn’t a change in ‘the code’ it’s a change in ‘the data’ - except the data is code in this case.
Details about how NNs—Neural Networks—change (are updated). If you ‘freeze it’ it’s (usually) not learning anymore. But it’s just a matter of not updating it, and it’s not changing.
So, there are more than a few things that realizing ‘unchangeable code’ could mean. Figuring out which (probably more than one) you mean before trying to answer that helps, and this is a good question.
Another argument is that ‘you can try to stop it, but something smart can find ways around your barriers (writing new code that’s separate is one example, and might be easier).’
Look at computer exploits. There are many ways to run ‘arbitrary’ code even in cases where the code itself is ‘unchangable’. All you need is some mutable data storage[1] and an immutable state machine[2] to build an interpreter, and then you’re effectively running mutable code[3].
Now, some of these can be mitigated. That being said, it’s very much a cat-and-mouse game.
Including ‘weird’ storage mechanisms, not just intended ones. Storing data in TLB state? Sure. It’s limited, but not nothing. Etc. Etc.
Including ‘weird’ state machines, not just intended ones. See also return-oriented programming.
A naive interpreter will be slower than running native, but this ‘just’ motivates the AI to try to move to a better computational substrate.
If the unchageability is implemented by storing the code in ROM, then it would be pretty easy for the AI to get around that by copying its code over to RAM, making the desired changes, and then running the modified code. (Presumably on other hardware, since the AI’s original hardware can only accept instructions from the ROM.) Even if the system is very cleverly designed, and the AI has no way at all to access its own source code, it would always have the option of writing an entirely new AI from scratch with the same goal function. It’s slightly easier to improve an existing AI design than create an entirely new one, but a sufficiently smart AI would be able to do both. The key ingredient in a FOOM is that each AI in the chain is capable of building a smarter one, not that the source code of the next AI is derived from the source code of the current one.
Economic pressures push companies to change code to become more efficient. That goes for any company that deploys AI as well.
Agreed. An AI powerful enough to be dangerous is probably in particular better at writing code than us, and at least some of those trying to develop AI are sure to want to take advantage of that to have the AI rewrite itself to be more powerful (and so, they hope, better at doing whatever they want the AI for, of course). So even if the technical difficulties in making code hard to change that others have mentioned could be overcome, it would be very hard to convince everyone making AIs to limit them in that way.