My guess is the initial steps of getting the models to lie reliably are orders of magnitude harder than just asking a human to lie to you. Once you know how to prompt models into lying, it’s orders of magnitude easier to generate lots of lie data from the models.
My guess is the initial steps of getting the models to lie reliably are orders of magnitude harder than just asking a human to lie to you. Once you know how to prompt models into lying, it’s orders of magnitude easier to generate lots of lie data from the models.