Yeah, to clarify, I’m also not familiar enough with RL to assess exactly how plausible it is that we’ll see this compensatory convexity, around today’s techniques. For investigating, “Reward shaping” would be a relevant keyword. I hear they do some messy things over there.
But I mention it because there are abstract reasons to expect to see it become a relevant idea in the development of general optimizers, which have to come up with their own reward functions. It also seems relevant in evolutionary learning, where very small advantages over the performance of the previous state of the art equates to a complete victory, so if there are diminishing returns at the top, competition kind of amplifies the stakes, and if an adaptation to this amplification of diminishing returns trickles back into a utility function, you could get a convex agent.
Yeah, to clarify, I’m also not familiar enough with RL to assess exactly how plausible it is that we’ll see this compensatory convexity, around today’s techniques. For investigating, “Reward shaping” would be a relevant keyword. I hear they do some messy things over there.
But I mention it because there are abstract reasons to expect to see it become a relevant idea in the development of general optimizers, which have to come up with their own reward functions. It also seems relevant in evolutionary learning, where very small advantages over the performance of the previous state of the art equates to a complete victory, so if there are diminishing returns at the top, competition kind of amplifies the stakes, and if an adaptation to this amplification of diminishing returns trickles back into a utility function, you could get a convex agent.
Though the ideas in this MLST episode optimization processes crushing out all serendipity and creativity suggest to me that that sort of strict life or death evolutionary processes will never be very effective. There was an assertion that it often isn’t that way in nature. They recommend “Minimum Criterion Coevolution”.