True if you don’t count the training process as part of the optimizer (which is a choice that sometimes makes sense and sometimes doesn’t). If you count the training process as part of the optimizer, then you can of course just flip your loss function or RL signal most of the time.
True if you don’t count the training process as part of the optimizer (which is a choice that sometimes makes sense and sometimes doesn’t). If you count the training process as part of the optimizer, then you can of course just flip your loss function or RL signal most of the time.