Full disclosure that I’m against AI anti-NSFW training and would be unlikely to support this proposal in any case, but I think attempting to restrict hex and binary inputs would be a pointless exercise; there are nearly unlimited possible ways to obfuscate inputs and outputs, most of which would continue to work perfectly, and moreover such a restriction would have to be implemented using existing techniques and would therefore be possible to defeat via jailbreaking just as current dev restrictions are.
My take on this is that patching the more “obvious” types of jailbreaking and obfuscation already makes a difference and is probably worth it (as long as it comes at no notable cost to the general usefulness of the system). Sure, some people will put in the effort to find other ways, but the harder it is, and the fewer little moments of success you have when first trying it, the fewer people will get into it.
Of course one could argue that the worst outcomes come from the most highly motivated bad actors, and they surely won’t be deterred by such measures. But I think even for them there may be some path dependencies involved where they only ended up in their position because over the years, while interacting with LLMs, they ended up running into a bunch of just ready enough jailbreaking scenarios that kept their interest up. Of course that’s an empirical question though.
Full disclosure that I’m against AI anti-NSFW training and would be unlikely to support this proposal in any case, but I think attempting to restrict hex and binary inputs would be a pointless exercise; there are nearly unlimited possible ways to obfuscate inputs and outputs, most of which would continue to work perfectly, and moreover such a restriction would have to be implemented using existing techniques and would therefore be possible to defeat via jailbreaking just as current dev restrictions are.
My take on this is that patching the more “obvious” types of jailbreaking and obfuscation already makes a difference and is probably worth it (as long as it comes at no notable cost to the general usefulness of the system). Sure, some people will put in the effort to find other ways, but the harder it is, and the fewer little moments of success you have when first trying it, the fewer people will get into it. Of course one could argue that the worst outcomes come from the most highly motivated bad actors, and they surely won’t be deterred by such measures. But I think even for them there may be some path dependencies involved where they only ended up in their position because over the years, while interacting with LLMs, they ended up running into a bunch of just ready enough jailbreaking scenarios that kept their interest up. Of course that’s an empirical question though.