So the general problem is that large changes in QR(st+1,∅) are not penalized?
It’s the delta of that with QR(st+1,at+1) that is penalised, not large changes on its own.
So the general problem is that large changes in QR(st+1,∅) are not penalized?
It’s the delta of that with QR(st+1,at+1) that is penalised, not large changes on its own.