Wei Dai comments on All About Concave and Convex Agents

Wei Dai 26 Mar 2024 22:00 UTC
13 points
0

It’s difficult to trade with exponential agents

“Trade” between exponential agents could look like flipping a coin (biased to reflect relative power) and having the loser give all of their resources to the winner. It could also just look like ordinary trade, where each agent specializes in their comparative advantage, to gather resources/power to prepare for “the final trade”.

“Trade” between exponential and less convex agents could look like making a bet on the size (or rather, potential resources) of the universe, such that the exponential agent gets a bigger share of big universes in exchange for giving up their share of small universes (similar to my proposed trade between a linear agent and a concave agent).

Maybe the real problem with convex agents is that their expected utilities do not converge, i.e., the probabilities of big universes can’t possibly decrease enough with size that their expected utilities sum to finite numbers. (This is also a problem with linear agents, but you can perhaps patch the concept by saying they’re linear in UD-weighted resources, similar to UDASSA. Is it also possible/sensible to patch convex agents in this way?)

However, convexity more closely resembles the intensity deltas needed to push reinforcement learning agent to take greater notice of small advances beyond the low-hanging fruit of its earliest findings, to counteract the naturally concave, diminishing returns that natural optimization problems tend to have.

I’m not familiar enough with RL to know how plausible this is. Can you expand on this, or anyone else want to weigh in?
- mako yass 26 Mar 2024 22:49 UTC
  2 points
  0
  Parent
  Yeah, to clarify, I’m also not familiar enough with RL to assess exactly how plausible it is that we’ll see this compensatory convexity, around today’s techniques. For investigating, “Reward shaping” would be a relevant keyword. I hear they do some messy things over there.
  
  But I mention it because there are abstract reasons to expect to see it become a relevant idea in the development of general optimizers, which have to come up with their own reward functions. It also seems relevant in evolutionary learning, where very small advantages over the performance of the previous state of the art equates to a complete victory, so if there are diminishing returns at the top, competition kind of amplifies the stakes, and if an adaptation to this amplification of diminishing returns trickles back into a utility function, you could get a convex agent.
  - Though the ideas in this MLST episode optimization processes crushing out all serendipity and creativity suggest to me that that sort of strict life or death evolutionary processes will never be very effective. There was an assertion that it often isn’t that way in nature. They recommend “Minimum Criterion Coevolution”.