We-the-devs choose when to dole out these reinforcement events, either handwriting a simple reinforcement algorithm to do it for us or having human overseers give out reinforcement.
That sounds like are actually writing such reward functions. Is there another project that tries to experimentally verify your predictions?
That sounds like are actually writing such reward functions. Is there another project that tries to experimentally verify your predictions?
That is what we are currently trying to do, mostly focusing on pretrained LMs in text-based games.