Adding long-term memory is risky in the sense that it can accumulate weirdness—like how Bing cut off conversation length to reduce weirdness, even though the AI technology could maintain some kind of coherence over longer conversations.
So I guess that there are competing forces here, as opposed to simple convergent incentives.
Probably no current AI system qualifies as a “strong mind”, for the purposes of this post?
I am reading this post as an argument that current AI technology won’t produce “strong minds”, and I’m pushing back against this argument. EG:
An AI can simply be shut down, until it’s able to and wants to stop you from shutting it down. But can an AI’s improvement be shut down, without shutting down the AI? This can be done for all current AI systems in the framework of finding a fairly limited system by a series of tweaks. Just stop tweaking the system, and it will now behave as a fixed (perhaps stochastic) function that doesn’t provide earth-shaking capabilities.
I suspect that the ex quo that puts a mind on a trajectory to being very strong, is hard to separate from the operation of the mind. Some gestures at why:
Tsvi appears to take the fact that you can stop gradient-descent without stopping the main operation of the NN to be evidence that the whole setup isn’t on a path to produce strong minds.
To me this seems similar to pointing out that we could freeze genetic evolution, and humans would remain about as smart; and then extrapolating from this, to conclude that humans (including genetic evolution) are not on a path to become much smarter.
Although I’ll admit that’s not a great analogy for Tsvi’s argument.
I think it’s a good comparison, though I do think they’re importantly different. Evolution figured out how to make things that figure out how to figure stuff out. So you turn off evolution, and you still have an influx of new ability to figure stuff out, because you have a figure-stuff-out figure-outer. It’s harder to get the human to just figure stuff out without also figuring out more about how to figure stuff out, which is my point.
Tsvi appears to take the fact that you can stop gradient-descent without stopping the main operation of the NN to be evidence that the whole setup isn’t on a path to produce strong minds.
(I don’t see why it appears that I’m thinking that.) Specialized to NNs, what I’m saying is more like: If/when NNs make strong minds, it will be because the training—the explicit-for-us, distal ex quo—found an NN that has its own internal figure-stuff-out figure-outer, and then the figure-stuff-out figure-outer did a lot of figuring out how to figure stuff out, so the NN ended up with a lot of ability to figure stuff out; but a big chunk of the leading edge of that ability to figure stuff out came from the NN’s internal figure-stuff-out figure-outer, not “from the training”; so you can’t turn off the NN’s figure-stuff-out figure-outer just by pausing training. I’m not saying that the setup can’t find an NN-internal figure-stuff-out figure-outer (though I would be surprised if that happens with the exact architectures I’m aware of currently existing).
Adding long-term memory is risky in the sense that it can accumulate weirdness—like how Bing cut off conversation length to reduce weirdness, even though the AI technology could maintain some kind of coherence over longer conversations.
So I guess that there are competing forces here, as opposed to simple convergent incentives.
I am reading this post as an argument that current AI technology won’t produce “strong minds”, and I’m pushing back against this argument. EG:
Tsvi appears to take the fact that you can stop gradient-descent without stopping the main operation of the NN to be evidence that the whole setup isn’t on a path to produce strong minds.
To me this seems similar to pointing out that we could freeze genetic evolution, and humans would remain about as smart; and then extrapolating from this, to conclude that humans (including genetic evolution) are not on a path to become much smarter.
Although I’ll admit that’s not a great analogy for Tsvi’s argument.
I think it’s a good comparison, though I do think they’re importantly different. Evolution figured out how to make things that figure out how to figure stuff out. So you turn off evolution, and you still have an influx of new ability to figure stuff out, because you have a figure-stuff-out figure-outer. It’s harder to get the human to just figure stuff out without also figuring out more about how to figure stuff out, which is my point.
(I don’t see why it appears that I’m thinking that.) Specialized to NNs, what I’m saying is more like: If/when NNs make strong minds, it will be because the training—the explicit-for-us, distal ex quo—found an NN that has its own internal figure-stuff-out figure-outer, and then the figure-stuff-out figure-outer did a lot of figuring out how to figure stuff out, so the NN ended up with a lot of ability to figure stuff out; but a big chunk of the leading edge of that ability to figure stuff out came from the NN’s internal figure-stuff-out figure-outer, not “from the training”; so you can’t turn off the NN’s figure-stuff-out figure-outer just by pausing training. I’m not saying that the setup can’t find an NN-internal figure-stuff-out figure-outer (though I would be surprised if that happens with the exact architectures I’m aware of currently existing).