We need to revisit AI rewriting its source code
It feels like community discussion has largely abandoned the topic of AGI having the self-modifying property, which makes sense because there are a lot of more fundamental things to figure out.
But I think we should revisit the question at least in the context of narrow AI, because the tools are now available to accomplish exactly this on several levels. This thought was driven by reading a blog post, Writing BPF Code in Rust.
BPF stands for Berkeley Packet Filter, which was originally for network traffic analysis but has since been used for tracing the Linux kernel. The pitch is that this can now be used to let userspace code run in the kernel, which is to say change the way the kernel is working.
The ability to use code to write code is very old; it is arguably the core insight of LISP. But this is becoming increasingly common as a practice now, including things like writing OCaml or Haskell for generating correct C code, and increasingly powerful code generation tricks in compilers.
It’s also possible to change how the compilers work now. The Julia language has a package named Cassette.jl, which allows dynamically injecting behavior into the Just-in-Time compiler. As it happens both this Julia trick and the BPF trick for the Linux kernel rely on LLVM.
All of this means at present we can in fact write code that modifies itself, and that modifies the way it is compiled, and that modifies the way the environment runs it (assuming the environment is Linux). The infrastructure seems to be present for large-scale, multi-layer self modification to take place. This seems like it makes questions about self modification testable in a way they weren’t before. The BPF and Cassette.jl tricks don’t even require software written for them explicitly, they work on previously existing code whose authors had no idea such capability existed. These methods are independent of ideas like Software 2.0/Differentiable Programming, and combined they make me wonder if the safety problems we are concerned with might actually start appearing at the level of applications first.
In practice, self-modification is a special case of arbitrary code execution; it’s just running a program that looks like yourself, with some changes. That means there are two routes to get there: either communicate with the internet (to, eg, pay Amazon EC2 to run the modified program), or use a security vulnerability. In the context of computer security, preventing arbitrary code execution is an extremely well-studied problem. Unfortunately, the outcome of all the study is that it’s really hard, and multiple vulnerabilities are discovered every year with low probability of them ever stopping.
I’m confused. I read you as suggesting that self-modifying code has recently become possible, but I think that self-modifying code has been possible for about as long as we have had digital computers?
What specific things are possible to do now that weren’t possible before, and what kind of AGI-relevant questions does that make testable?
Self modifying code has been possible but not practical for as long as we have had digital computers. Now it has toolchains, use cases, and in the near future tens to hundreds of people will do it as their day job.
The strong version of my claim is that I expect to see the same kinds of failure modes we are concerned with in AGI pushed down to the level of consumer-grade software, at least in huge applications like social networks and self-driving cars.
I think it is now simple and cheap enough for a single research group to do something like:
Write a formal specification
Which employs learning for some simple purpose
And employs self-modification on one or more levels
Which is to say, it feels like we have enough tooling to start doing “Hello World” grade self-modification tests that account for every level of the stack, in real systems.
I read it more as a call for change in emphasis back towards more attention on self-modification because things have drawn focus on it away to other things, not because anything specific has changed but because the pendulum has swung too far away from where it should be.
I’ll be interested if you have any more specific ideas here. I can’t think of anything because:
The question of “How can an AGI self-modify into a safe and beneficial AGI?” seems pretty similar to “How can a person program a safe and beneficial AGI?”, at least until the system is so superhumanly advanced that it can hopefully figure out the answer itself. So in that sense, everyone is thinking about it all the time.
The challenges of safe self-modification don’t seem wildly different than the challenges of safe learning (after all, learning changes the agent too), including things like goal stability, ontological crises, etc. And whereas learning is basically mandatory, deeper self-modification could (probably IMO) be turned off if necessary, again at least until the system is so superhumanly advanced that it can solve the problem itself. So in that sense, at least some people are sorta thinking about it these days.
I dunno, I just can’t think of any experiment we could do with today’s AI in this domain that would discover or prove something that wasn’t already obvious. (...Which of course doesn’t mean that such experiments don’t exist.)