One point I’ve seen raised by people in the latter group is along the lines of: “It’s very unlikely that we’ll be in a situation where we’re forced to build AI systems vastly more capable than their supervisors. Even if we have a very fast takeoff—say, going from being unable to create human-level AI systems to being able to create very superhuman systems ~overnight—there will probably still be some way to create systems that are only slightly more powerful than our current trusted systems and/or humans; to use these to supervise and align systems slightly more powerful than them; etc. (For example, we could take a very powerful, general algorithm and simply run it on a relatively low amount of compute in order to get a system that isn’t too powerful.)” This seems like a plausible argument that we’re unlikely to be stuck with a large gap between AI systems’ capabilities and their supervisors’ capabilities; I’m not currently clear on what the counter-argument is.
I agree that this is a very promising advantage for Team Safety. I do think that, in order to make good use of this potential advantage, the AI creators need to be cautious going into the process.
One way that I’ve come up with to ‘turn down’ the power of an AI system is to simply inject small amounts of noise into its activations.
I agree that this is a very promising advantage for Team Safety. I do think that, in order to make good use of this potential advantage, the AI creators need to be cautious going into the process.
One way that I’ve come up with to ‘turn down’ the power of an AI system is to simply inject small amounts of noise into its activations.