you can anneal whatever combination of the different losses you are using to eventually become exclusively imitative amplification, exclusively debate, or anything else in between
How necessary is annealing for this? Could you choose other optimisation procedures? Or do you refer to annealing in a more general sense?
How necessary is annealing for this? Could you choose other optimisation procedures? Or do you refer to annealing in a more general sense?
“Annealing” here simply means decaying over time (as in learning rate annealing), in this case decaying the influence of one of the losses to zero.