redxaxder comments on The Waluigi Effect (mega-post)

redxaxder 5 Mar 2023 0:48 UTC
1 point
0
Under this model training the model to do things you don’t want and then “jailbreaking” it afterward would be a way to prevent classes of behavior.