I don’t get how you can arrive at 0.1% for future AI systems even if NNs are biased against scheming. Humans scheme, the future AI systems trained to be capable of long if-then chains may also learn to scheme, maybe because explicitly changing biases is good for performance. Or even, what, you have <0.1% on future AI systems not using NNs?
Also, not saying “but it doesn’t matter”, but assuming everyone agrees that spectrally biased NN with classifier or whatever is a promising model of a safe system. Do you then propose we should not worry and just make the most advanced AI we can as fast as possible. Or it would be better to first reduce remaining uncertainty about behavior of future systems?
I’m saying <0.1% chance on “world is ended by spontaneous scheming.” I’m not saying no AI will ever do anything that might be well-described as scheming, for any reason.
We therefore conclude that we should assign very low credence to the spontaneous emergence of scheming in future AI systems— perhaps 0.1% or less.
I personally think there is a moderate gap (perhaps factor of 3) between “world is ended by serious[1] spontaneous scheming” and “serious spontaneous scheming”. And, I could imagine updating to a factor of 10 if the world seemed better prepared etc. So, it might be good to clarify this in the post. (Or clarify your comment.)
(I think perhaps spontaneous scheming (prior to human obsolence) is ~25% likely and x-risk conditional on being in one of those worlds which is due to this scheming is about 30% likely for an overall 8% on “world is ended by serious spontaneous scheming” (prior to human obsolence).)
I spent a bunch of time wondering how you could could put 99.9% on no AI ever doing anything that might be well-described as scheming for any reason. I was going to challenge you to list a handful of other claims that you had similar credence in, until I searched the comments for “0.1%” and found this one.
I’m annoyed at this, and I request that you prominently edit the OP.
My comment was false (and strident; worst combo). I accept the strong downvote and I will try to now make a correction.
I said:
I spent a bunch of time wondering how you could could put 99.9% on no AI ever doing anything that might be well-described as scheming for any reason.
What I meant to say was:
I spent a bunch of time wondering how you could put 99.9% on no AI ever doing anything that might be well-described as scheming for any reason, even if you stipulate that it must happen spontaneously.
Well, I have <0.1% on spontaneous scheming, period. I suspect Nora is similar and just misspoke in that comment.
So....I challenge you to list a handful of other claims that you have similar credence in. Special Relativity? P!=NP? Major changes in our understanding of morality or intelligence or mammal psychology? China pulls ahead in AI development? Scaling runs out of steam and gives way to other approaches like mind uploading? Major betrayal against you by a beloved family member? The OP simply says “future AI systems” without specifying anything about these systems, their paradigm, or what offworld colony they may or may not be developed on. Just...all AI systems henceforth forever. Meaning that no AI creators will ever accidentally recapitulate the scheming that is already observed in nature...? That’s such a grand, sweeping claim. If you really think it’s true, I just don’t understand your worldview. If you’ve already explained why somewhere, I hope someone will link me to it.
What do you mean “hugely edited”? What other things would you like us to change? If I were starting from scratch I would of course write the post differently but I don’t think it would be worth my time to make major post hoc edits; I would like to focus on follow up posts.
If it’s spontaneous then yeah, I don’t expect it to happen ~ever really. I was mainly thinking about cases where people intentionally train models to scheme.
I don’t get how you can arrive at 0.1% for future AI systems even if NNs are biased against scheming. Humans scheme, the future AI systems trained to be capable of long if-then chains may also learn to scheme, maybe because explicitly changing biases is good for performance. Or even, what, you have <0.1% on future AI systems not using NNs?
Also, not saying “but it doesn’t matter”, but assuming everyone agrees that spectrally biased NN with classifier or whatever is a promising model of a safe system. Do you then propose we should not worry and just make the most advanced AI we can as fast as possible. Or it would be better to first reduce remaining uncertainty about behavior of future systems?
I’m saying <0.1% chance on “world is ended by spontaneous scheming.” I’m not saying no AI will ever do anything that might be well-described as scheming, for any reason.
The exact language you use in the post is:
I personally think there is a moderate gap (perhaps factor of 3) between “world is ended by serious[1] spontaneous scheming” and “serious spontaneous scheming”. And, I could imagine updating to a factor of 10 if the world seemed better prepared etc. So, it might be good to clarify this in the post. (Or clarify your comment.)
(I think perhaps spontaneous scheming (prior to human obsolence) is ~25% likely and x-risk conditional on being in one of those worlds which is due to this scheming is about 30% likely for an overall 8% on “world is ended by serious spontaneous scheming” (prior to human obsolence).)
serious = somewhat persistant, thoughtful, etc
EDIT: This is wrong. See descendent comments.
I spent a bunch of time wondering how you could could put 99.9% on no AI ever doing anything that might be well-described as scheming for any reason. I was going to challenge you to list a handful of other claims that you had similar credence in, until I searched the comments for “0.1%” and found this one.I’m annoyed at this, and I request that you prominently edit the OP.The post says “we should assign very low credence to the spontaneous emergence of scheming in future AI systems— perhaps 0.1% or less.”
I.e., not “no AI will ever do anything that might be well-described as scheming, for any reason.”
It should be obvious that, if you train an AI to scheme, you can get an AI that schemes.
Damn, woops.
My comment was false (and strident; worst combo). I accept the strong downvote and I will try to now make a correction.
I said:
What I meant to say was:
And now you have also commented:
So....I challenge you to list a handful of other claims that you have similar credence in. Special Relativity? P!=NP? Major changes in our understanding of morality or intelligence or mammal psychology? China pulls ahead in AI development? Scaling runs out of steam and gives way to other approaches like mind uploading? Major betrayal against you by a beloved family member?
The OP simply says “future AI systems” without specifying anything about these systems, their paradigm, or what offworld colony they may or may not be developed on. Just...all AI systems henceforth forever. Meaning that no AI creators will ever accidentally recapitulate the scheming that is already observed in nature...? That’s such a grand, sweeping claim. If you really think it’s true, I just don’t understand your worldview. If you’ve already explained why somewhere, I hope someone will link me to it.
Agree with this hugely, though I could make a partial defense of the confidence given, but yes I’d like this post to be hugely edited.
What do you mean “hugely edited”? What other things would you like us to change? If I were starting from scratch I would of course write the post differently but I don’t think it would be worth my time to make major post hoc edits; I would like to focus on follow up posts.
Specifically, I wanted the edit to be a clarification that you only have a <0.1% probability on spontaneous scheming ending the world.
Well, I have <0.1% on spontaneous scheming, period. I suspect Nora is similar and just misspoke in that comment.
If it’s spontaneous then yeah, I don’t expect it to happen ~ever really. I was mainly thinking about cases where people intentionally train models to scheme.