Huh, seems everyone including me wants to talk about this bit:
This seems like a plausible argument that we’re unlikely to be stuck with a large gap between AI systems’ capabilities and their supervisors’ capabilities; I’m not currently clear on what the counter-argument is.
My own favorite counterargument is simply that I don’t expect all of the relevant labs to be this cautious. The leading lab will have a 0-6 month lead over various other labs, who will have a 0-6 month lead over many others. The leading lab may not even be convinced that Now is the Time to be Cautious. If they are convinced, and slow down to implement the bucket of safety measures you’ve described in this post, then we go to the runner-up lab and see what they do. Repeat. How many labs have to agree to be cautious and go slow before the first few labs can make significant alignment progress and stabilize the situation? Idk, probably somewhere between 4 and 20 would be my guess. Which means the probability that at least one of them will disagree and race ahead is pretty high.
Crux for me would be if implementing this bucket of techniques doesn’t slow you down or cost you lots of money.
(Chiming in late here, sorry!) I think this is a totally valid concern, but I think it’s generally helpful to discuss technical and political challenges separately. I think pessimistic folks often say things like “We have no idea how to align an AI,” and I see this post as a partial counterpoint to that.
Huh, seems everyone including me wants to talk about this bit:
My own favorite counterargument is simply that I don’t expect all of the relevant labs to be this cautious. The leading lab will have a 0-6 month lead over various other labs, who will have a 0-6 month lead over many others. The leading lab may not even be convinced that Now is the Time to be Cautious. If they are convinced, and slow down to implement the bucket of safety measures you’ve described in this post, then we go to the runner-up lab and see what they do. Repeat. How many labs have to agree to be cautious and go slow before the first few labs can make significant alignment progress and stabilize the situation? Idk, probably somewhere between 4 and 20 would be my guess. Which means the probability that at least one of them will disagree and race ahead is pretty high.
Crux for me would be if implementing this bucket of techniques doesn’t slow you down or cost you lots of money.
(Chiming in late here, sorry!) I think this is a totally valid concern, but I think it’s generally helpful to discuss technical and political challenges separately. I think pessimistic folks often say things like “We have no idea how to align an AI,” and I see this post as a partial counterpoint to that.
In addition to a small alignment tax (as you mention), a couple other ways I could see the political side going well would be (a) an AI project using a few-month lead to do huge amounts of further helpful work (https://www.lesswrong.com/posts/jwhcXmigv2LTrbBiB/success-without-dignity-a-nearcasting-story-of-avoiding#The_deployment_problem); (b) a standards-and-monitoring regime blocking less cautious training and deployment.