Yep, as the edit says I don’t think we disagree on the first point—there are versions that are oppressive, but also versions that are not that still have large positive effects.
On the second point, I believe this is because it is much harder to introduce safeguards than to remove them, because removing them is a highly blunt target, whereas good safeguards have to be detailed to avoid false positives (which Llama-2 did not do a good job avoiding, but they did try). This is the key asymmetry here, the amount Meta (or anyone else) spends tuning does not help here.
Yep, as the edit says I don’t think we disagree on the first point—there are versions that are oppressive, but also versions that are not that still have large positive effects.
On the second point, I believe this is because it is much harder to introduce safeguards than to remove them, because removing them is a highly blunt target, whereas good safeguards have to be detailed to avoid false positives (which Llama-2 did not do a good job avoiding, but they did try). This is the key asymmetry here, the amount Meta (or anyone else) spends tuning does not help here.