None of those seem all that practical to me, except for the mechanistic interpretability SAE clamping, and I do actually expect that to be used for corporate censorship after all the kinks have been worked out of it.
If the current crop of model organisms research has any practical applications, I expect it to be used to reduce jailbreaks, like in adversarial robustness, which is definitely highly correlated with both safety and corporate censorship.
Debate is less clear, but I also don’t really expect practical results from that line of work.
None of those seem all that practical to me, except for the mechanistic interpretability SAE clamping, and I do actually expect that to be used for corporate censorship after all the kinks have been worked out of it.
If the current crop of model organisms research has any practical applications, I expect it to be used to reduce jailbreaks, like in adversarial robustness, which is definitely highly correlated with both safety and corporate censorship.
Debate is less clear, but I also don’t really expect practical results from that line of work.