The reward landscape of deep reinforcement learning models is probably pretty insane. Perhaps not insane enough to not be liable to singular learning theory-like analysis, since there’s always some probability of doing any sequence of actions, and those probabilities change smoothly as you change weights, so the chances you execute a particular plan change smoothly, and so your expected reward changes smoothly. So maybe there’s analogies to be made to the loss landscape.
Relevant: “Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments”, Sullivan et al 2022.