Mainly for brevity, but also because it seems to involve quite a drastic change in how the reward function/model as a whole functions. So it doesn’t seem particularly likely that it’ll be implemented.
Mainly for brevity, but also because it seems to involve quite a drastic change in how the reward function/model as a whole functions. So it doesn’t seem particularly likely that it’ll be implemented.