hmm, idea, how well’d this work: you have a machine that drops the reward with a certain low probability every second, but you have to put it back rather than eat it if you weren’t doing the task?
hmm, idea, how well’d this work: you have a machine that drops the reward with a certain low probability every second, but you have to put it back rather than eat it if you weren’t doing the task?