simon comments on ELI5 Why isn’t alignment easier as models get stronger?

simon 28 Oct 2023 15:02 UTC
4 points
1
I actually agree that stronger models are easier to achieve any given % alignment, but on the other hand the potential bad consequences of any given % misalignment increase for the stronger model (potentially dramatically at certain points, like it can take over).

simon comments on ELI5 Why isn’t alignment *easier* as models get stronger?