My critique is not of actor/critic training processes, but of actor/grader motivational designs. I worried that “critic” would make people think I don’t want to use an evaluative model to provide gradients to the actor. That seems non-doomed to me.
Thank you! I’ve been using the terms “inference algorithm” versus “learning algorithm” to talk about that kind of thing. What you said seems fine too, AFAIK.
Is there a reason you used the term “grader” instead of the AFAICT-more-traditional term “critic”? No big deal, I’m just curious.
My critique is not of actor/critic training processes, but of actor/grader motivational designs. I worried that “critic” would make people think I don’t want to use an evaluative model to provide gradients to the actor. That seems non-doomed to me.
Thank you! I’ve been using the terms “inference algorithm” versus “learning algorithm” to talk about that kind of thing. What you said seems fine too, AFAIK.