I have signed up to play an AI, and having given it quite a bit of thought as a result I think I have achieved some insight. Interestingly, one of the insights came as a result of assuming that secrecy was a necessary condition for success. That assumption led more or less directly to an approach that I think might work. I’ll let you know tomorrow.
An interesting consequence of having arrived at this insight is that even if it works I won’t be able to tell you what it is. Having been on the receiving end of such cageyness I know how annoying it is. But I can tell you this: the insight has a property similar to a Godel sentence or the Epimenides sentence. This insight (if indeed it works) undermines itself by being communicated. If I tell you what it is, you can correctly respond, “That will never work.” And you will indeed be correct. Nonetheless, I think it has a good shot at working.
(I don’t know if my insight is the same as Eliezer’s, but it seems to share another interesting property: it will not be easy to put it into practice. It’s not just a “trick.” It will be difficult.)
If that insight is undermined by being communicated, then communicating it to the world immunizes the world from it. If that is a mechanism by which an AI-in-a-box could escape, then it needs to be communicated with every AI researcher.
Don’t see why it would. We’d learn there was a vulnerability we all had not spotted, and close it; this would give us all reason to assume that there are likely further vulnerabilities.
I have signed up to play an AI, and having given it quite a bit of thought as a result I think I have achieved some insight. Interestingly, one of the insights came as a result of assuming that secrecy was a necessary condition for success. That assumption led more or less directly to an approach that I think might work. I’ll let you know tomorrow.
An interesting consequence of having arrived at this insight is that even if it works I won’t be able to tell you what it is. Having been on the receiving end of such cageyness I know how annoying it is. But I can tell you this: the insight has a property similar to a Godel sentence or the Epimenides sentence. This insight (if indeed it works) undermines itself by being communicated. If I tell you what it is, you can correctly respond, “That will never work.” And you will indeed be correct. Nonetheless, I think it has a good shot at working.
(I don’t know if my insight is the same as Eliezer’s, but it seems to share another interesting property: it will not be easy to put it into practice. It’s not just a “trick.” It will be difficult.)
I’ll let you know how it goes.
If that insight is undermined by being communicated, then communicating it to the world immunizes the world from it. If that is a mechanism by which an AI-in-a-box could escape, then it needs to be communicated with every AI researcher.
Unless such “immunity” will cause people to overestimate their level of protection from all those potential different insights that are yet unknown...
Don’t see why it would. We’d learn there was a vulnerability we all had not spotted, and close it; this would give us all reason to assume that there are likely further vulnerabilities.