Or if you buy a shard-theory-esque picture of RL locking in heuristics, what heuristics can get locked in depends on what’s “natural” to learn first, even when training from scratch.
Both of these hypotheses probably should come with caveats though. (About expected reliability, training time, model-free-ness, etc.)
Or if you buy a shard-theory-esque picture of RL locking in heuristics, what heuristics can get locked in depends on what’s “natural” to learn first, even when training from scratch.
Both of these hypotheses probably should come with caveats though. (About expected reliability, training time, model-free-ness, etc.)