Some mitigation strategies are more thorough than others. For example, if they hit a High on bioweapons, and responded by filtering a lot of relevant bits of Biological literature out of the pretaining set until it dropped to Medium, I wouldn’t be particularly concerned about that information somehow sneaking back in as the model later got stronger. Though if you’re trying to censor specific knowledge, that might become more challenging for a much more capable models that could have some ability to recreate the missing bits by logical inference from nearby material that wan’t censored out. This is probably more of concern for theory than for practical knowledge.
Some mitigation strategies are more thorough than others. For example, if they hit a High on bioweapons, and responded by filtering a lot of relevant bits of Biological literature out of the pretaining set until it dropped to Medium, I wouldn’t be particularly concerned about that information somehow sneaking back in as the model later got stronger. Though if you’re trying to censor specific knowledge, that might become more challenging for a much more capable models that could have some ability to recreate the missing bits by logical inference from nearby material that wan’t censored out. This is probably more of concern for theory than for practical knowledge.