I think that the intuition for this argument comes from something like a gradient ascent under an approximate utility function. The agent will spend most of it’s time near what it perceives to be a local(ish) maximum.
So I suspect the argument here is that Optimistic Errors have a better chance of locking into a single local maximum or strategy, which get’s reinforced enough (or not punished enough), even though it is bad in total.
Pessimistic Errors are ones in which the agent strategically avoids locking into maxima, perhaps by Hedonic Adaptation as Dagon suggested. This may miss big opportunities if there are actual, territorial, big maxima, but that may not be as bad (from a satisficer point of view at least).
I think that the intuition for this argument comes from something like a gradient ascent under an approximate utility function. The agent will spend most of it’s time near what it perceives to be a local(ish) maximum.
So I suspect the argument here is that Optimistic Errors have a better chance of locking into a single local maximum or strategy, which get’s reinforced enough (or not punished enough), even though it is bad in total.
Pessimistic Errors are ones in which the agent strategically avoids locking into maxima, perhaps by Hedonic Adaptation as Dagon suggested. This may miss big opportunities if there are actual, territorial, big maxima, but that may not be as bad (from a satisficer point of view at least).
If this is the case, this seems more like a difference in exploration/exploitation strategies.
We do have positively valenced heuristics for exploration—say curiosity and excitement