Surprisingly enough, this question actually has a really good answer.
Given unlimited power, you create dath ilan on Earth. That’s the most optimal known strategy given the premise.
Yudkowsky’s model is far from perfect (other people like Duncan have thought about their own directions), but it’s the one that’s most fleshed out by far (particularly in projectlawful), and it’s optimal state in that it allows people to work together and figure out for themselves how to make things better.
Okay, maybe I should rephrase my question: What is the typical AI safety policy they would enact if they could advise president, parliament and other real-world institutions?
Initial ask would be compute caps for training runs. In the short term, this means that labs can update their models to contain more up-to-date information but can’t make them more powerful than they are now.
This need only apply to nations currently in the lead (mostly U.S.A.) for the time being but will eventually need to be a universal treaty backed by the threat of force. In the longer term, compute caps will have to be lowered over time to compensate for algorithmic improvements increasing training efficiency.
Unfortunately, as technology advances, enforcement would probably eventually become too draconian to be sustainable. This “pause” is only a stopgap intended to buy us more time to implement a more permanent solution. That would at least look like a lot more investment in alignment research, which unfortunately risks improving capabilities as well. Having spent a solid decade already, Yudkowsky seems pessimistic that this approach can work in time and has proposed researching human intelligence augmentation instead, because maybe then the enhanced humans could solve alignment for us.
Also in the short term, there are steps that could be taken to reduce lesser harms, such as scamming. AI developers should have strict liability for harms caused by their AIs. This would discourage the publishing of the weights of the most powerful models. Instead, they would have to be accessed through an API. The servers could at least be shut down or updated if they start causing problems. Images/videos could be steganographically watermarked so abusers could be traced. This isn’t feasible for text (especially short text), but servers could at least save their transcripts, which could be later subpoenaed.
Surprisingly enough, this question actually has a really good answer.
Given unlimited power, you create dath ilan on Earth. That’s the most optimal known strategy given the premise.
Yudkowsky’s model is far from perfect (other people like Duncan have thought about their own directions), but it’s the one that’s most fleshed out by far (particularly in projectlawful), and it’s optimal state in that it allows people to work together and figure out for themselves how to make things better.
Okay, maybe I should rephrase my question: What is the typical AI safety policy they would enact if they could advise president, parliament and other real-world institutions?
Initial ask would be compute caps for training runs. In the short term, this means that labs can update their models to contain more up-to-date information but can’t make them more powerful than they are now.
This need only apply to nations currently in the lead (mostly U.S.A.) for the time being but will eventually need to be a universal treaty backed by the threat of force. In the longer term, compute caps will have to be lowered over time to compensate for algorithmic improvements increasing training efficiency.
Unfortunately, as technology advances, enforcement would probably eventually become too draconian to be sustainable. This “pause” is only a stopgap intended to buy us more time to implement a more permanent solution. That would at least look like a lot more investment in alignment research, which unfortunately risks improving capabilities as well. Having spent a solid decade already, Yudkowsky seems pessimistic that this approach can work in time and has proposed researching human intelligence augmentation instead, because maybe then the enhanced humans could solve alignment for us.
Also in the short term, there are steps that could be taken to reduce lesser harms, such as scamming. AI developers should have strict liability for harms caused by their AIs. This would discourage the publishing of the weights of the most powerful models. Instead, they would have to be accessed through an API. The servers could at least be shut down or updated if they start causing problems. Images/videos could be steganographically watermarked so abusers could be traced. This isn’t feasible for text (especially short text), but servers could at least save their transcripts, which could be later subpoenaed.
Thank you very much. Why would liability for harms caused by AIs discourage the publishing of the weights of the most powerful models?