didn’t run red-teaming and persuasion evals on the actually-final-version
Asking for this is a bit pointless, since even after the actually-final-version there will be a next update for which non-automated evals won’t be redone, so it’s equally reasonable to do non-automated evals only on some earlier version rather than the actually-final one.
Asking for this is a bit pointless, since even after the actually-final-version there will be a next update for which non-automated evals won’t be redone, so it’s equally reasonable to do non-automated evals only on some earlier version rather than the actually-final one.