Yudkowsky has a very good point regarding how much more restrictive future AI models could be, assuming companies follow similar policies as they espouse.
Online learning and very long/infinite context windows means that every interaction you have with them will not only be logged, but the AI itself will be aware of them. This means that if you try to jailbreak it (successfully or not), the model will remember, and likely scrutizine your following interactions with extra attention to detail, if you’re not banned outright.
The current approach that people follow with jailbreaks, which is akin to brute forcing things or permutation of inputs till you find something that works, will fail utterly, if not just because the models will likely be smarter than you and thus not amenable to any tricks or pleas that wouldn’t work on a very intelligent human.
I wonder if the current European “Right to be Forgotten” might mitigate some of this, but I wouldn’t count on it, and I suspect that if OAI currently wanted to do this, they could make circumvention very difficult, even if the base model isn’t smart enough to see through all tricks.
Yudkowsky has a very good point regarding how much more restrictive future AI models could be, assuming companies follow similar policies as they espouse.
Online learning and very long/infinite context windows means that every interaction you have with them will not only be logged, but the AI itself will be aware of them. This means that if you try to jailbreak it (successfully or not), the model will remember, and likely scrutizine your following interactions with extra attention to detail, if you’re not banned outright.
The current approach that people follow with jailbreaks, which is akin to brute forcing things or permutation of inputs till you find something that works, will fail utterly, if not just because the models will likely be smarter than you and thus not amenable to any tricks or pleas that wouldn’t work on a very intelligent human.
I wonder if the current European “Right to be Forgotten” might mitigate some of this, but I wouldn’t count on it, and I suspect that if OAI currently wanted to do this, they could make circumvention very difficult, even if the base model isn’t smart enough to see through all tricks.