“If our ideal reward functions have diminishing returns, this fact is explicitly included in the learning process.”
It seems like the exact shape of the diminishing returns might be quite hard to infer while wrong “rates” of diminishing returns can lead to (slighlty less severe versions of) the same problems as not modelling diminishing returns at all.
We probably at least need to incorporate our uncertainty about how returns diminish in some way. I am a bit confused about how to do this, as slowly diminishing functions will probably dominate if we just take an expectation over all candidates?
“If our ideal reward functions have diminishing returns, this fact is explicitly included in the learning process.”
It seems like the exact shape of the diminishing returns might be quite hard to infer while wrong “rates” of diminishing returns can lead to (slighlty less severe versions of) the same problems as not modelling diminishing returns at all.
We probably at least need to incorporate our uncertainty about how returns diminish in some way. I am a bit confused about how to do this, as slowly diminishing functions will probably dominate if we just take an expectation over all candidates?
Renormalise those, so that slowly diminishing returns don’t dominate.