gwern comments on My hopes for alignment: Singular learning theory and whole brain emulation

gwern 26 Oct 2023 1:12 UTC
5 points
0

The reward landscape of deep reinforcement learning models is probably pretty insane. Perhaps not insane enough to not be liable to singular learning theory-like analysis, since there’s always some probability of doing any sequence of actions, and those probabilities change smoothly as you change weights, so the chances you execute a particular plan change smoothly, and so your expected reward changes smoothly. So maybe there’s analogies to be made to the loss landscape.

Relevant: “Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments”, Sullivan et al 2022.