It’s imaginable to do this work but not remember any of it, i.e. avoid having that work leave traces that can accumulate, but that seems like a delicate, probably unnatural carving.
Is the implication here that modern NNs don’t do this? My own tendency would be to think that they are doing a lot of this—doing a bunch of reasoning which gets thrown away rather than saved. So it seems like modern NNs have simply managed to hit this delicate unnatural carving. (Which in turn suggests that it is not so delicate, and even, not so unnatural.)
Yes, I think there’s stuff that humans do that’s crucial for what makes us smart, that we have to do in order to perform some language tasks, and that the LLM doesn’t do when you ask it to do those tasks, even when it performs well in the local-behavior sense.
Probably no current AI system qualifies as a “strong mind”, for the purposes of this post? Adding various kinds of long term memory is a very natural and probably instrumentally convergent improvement to make to LLM-based systems, though.
I expect that as LLM-based systems get smarter and more agentic, they’ll naturally start hitting on this strategy for self-improvement on their own. If you ask GPT-4 for improvements one could make to LLMs, it will come up with the idea of adding various kinds of memory. AutoGPT and similar solutions are not yet good enough to actually implement these solutions autonomously, but I expect that will change in the near future, and that it will be pretty difficult to get comparable performance out of a memoryless system. As you go even further up the capabilities ladder, it probably gets hard to avoid developing memory, intentionally or accidentally or as a side effect.
Adding long-term memory is risky in the sense that it can accumulate weirdness—like how Bing cut off conversation length to reduce weirdness, even though the AI technology could maintain some kind of coherence over longer conversations.
So I guess that there are competing forces here, as opposed to simple convergent incentives.
Probably no current AI system qualifies as a “strong mind”, for the purposes of this post?
I am reading this post as an argument that current AI technology won’t produce “strong minds”, and I’m pushing back against this argument. EG:
An AI can simply be shut down, until it’s able to and wants to stop you from shutting it down. But can an AI’s improvement be shut down, without shutting down the AI? This can be done for all current AI systems in the framework of finding a fairly limited system by a series of tweaks. Just stop tweaking the system, and it will now behave as a fixed (perhaps stochastic) function that doesn’t provide earth-shaking capabilities.
I suspect that the ex quo that puts a mind on a trajectory to being very strong, is hard to separate from the operation of the mind. Some gestures at why:
Tsvi appears to take the fact that you can stop gradient-descent without stopping the main operation of the NN to be evidence that the whole setup isn’t on a path to produce strong minds.
To me this seems similar to pointing out that we could freeze genetic evolution, and humans would remain about as smart; and then extrapolating from this, to conclude that humans (including genetic evolution) are not on a path to become much smarter.
Although I’ll admit that’s not a great analogy for Tsvi’s argument.
I think it’s a good comparison, though I do think they’re importantly different. Evolution figured out how to make things that figure out how to figure stuff out. So you turn off evolution, and you still have an influx of new ability to figure stuff out, because you have a figure-stuff-out figure-outer. It’s harder to get the human to just figure stuff out without also figuring out more about how to figure stuff out, which is my point.
Tsvi appears to take the fact that you can stop gradient-descent without stopping the main operation of the NN to be evidence that the whole setup isn’t on a path to produce strong minds.
(I don’t see why it appears that I’m thinking that.) Specialized to NNs, what I’m saying is more like: If/when NNs make strong minds, it will be because the training—the explicit-for-us, distal ex quo—found an NN that has its own internal figure-stuff-out figure-outer, and then the figure-stuff-out figure-outer did a lot of figuring out how to figure stuff out, so the NN ended up with a lot of ability to figure stuff out; but a big chunk of the leading edge of that ability to figure stuff out came from the NN’s internal figure-stuff-out figure-outer, not “from the training”; so you can’t turn off the NN’s figure-stuff-out figure-outer just by pausing training. I’m not saying that the setup can’t find an NN-internal figure-stuff-out figure-outer (though I would be surprised if that happens with the exact architectures I’m aware of currently existing).
Is the implication here that modern NNs don’t do this? My own tendency would be to think that they are doing a lot of this—doing a bunch of reasoning which gets thrown away rather than saved. So it seems like modern NNs have simply managed to hit this delicate unnatural carving. (Which in turn suggests that it is not so delicate, and even, not so unnatural.)
Yes, I think there’s stuff that humans do that’s crucial for what makes us smart, that we have to do in order to perform some language tasks, and that the LLM doesn’t do when you ask it to do those tasks, even when it performs well in the local-behavior sense.
Probably no current AI system qualifies as a “strong mind”, for the purposes of this post? Adding various kinds of long term memory is a very natural and probably instrumentally convergent improvement to make to LLM-based systems, though.
I expect that as LLM-based systems get smarter and more agentic, they’ll naturally start hitting on this strategy for self-improvement on their own. If you ask GPT-4 for improvements one could make to LLMs, it will come up with the idea of adding various kinds of memory. AutoGPT and similar solutions are not yet good enough to actually implement these solutions autonomously, but I expect that will change in the near future, and that it will be pretty difficult to get comparable performance out of a memoryless system. As you go even further up the capabilities ladder, it probably gets hard to avoid developing memory, intentionally or accidentally or as a side effect.
Adding long-term memory is risky in the sense that it can accumulate weirdness—like how Bing cut off conversation length to reduce weirdness, even though the AI technology could maintain some kind of coherence over longer conversations.
So I guess that there are competing forces here, as opposed to simple convergent incentives.
I am reading this post as an argument that current AI technology won’t produce “strong minds”, and I’m pushing back against this argument. EG:
Tsvi appears to take the fact that you can stop gradient-descent without stopping the main operation of the NN to be evidence that the whole setup isn’t on a path to produce strong minds.
To me this seems similar to pointing out that we could freeze genetic evolution, and humans would remain about as smart; and then extrapolating from this, to conclude that humans (including genetic evolution) are not on a path to become much smarter.
Although I’ll admit that’s not a great analogy for Tsvi’s argument.
I think it’s a good comparison, though I do think they’re importantly different. Evolution figured out how to make things that figure out how to figure stuff out. So you turn off evolution, and you still have an influx of new ability to figure stuff out, because you have a figure-stuff-out figure-outer. It’s harder to get the human to just figure stuff out without also figuring out more about how to figure stuff out, which is my point.
(I don’t see why it appears that I’m thinking that.) Specialized to NNs, what I’m saying is more like: If/when NNs make strong minds, it will be because the training—the explicit-for-us, distal ex quo—found an NN that has its own internal figure-stuff-out figure-outer, and then the figure-stuff-out figure-outer did a lot of figuring out how to figure stuff out, so the NN ended up with a lot of ability to figure stuff out; but a big chunk of the leading edge of that ability to figure stuff out came from the NN’s internal figure-stuff-out figure-outer, not “from the training”; so you can’t turn off the NN’s figure-stuff-out figure-outer just by pausing training. I’m not saying that the setup can’t find an NN-internal figure-stuff-out figure-outer (though I would be surprised if that happens with the exact architectures I’m aware of currently existing).