Reward-seeking reinforcement-learning agents can in some instances serve as models of utility-maximizing, self-modifying agents, but in current practice, RL systems are typically distinct from the agents they produce … In multi-task RL systems, for example, RL “rewards” serve not as sources of value to agents, but as signals that guide training[.]
And an additional point which calls into question the view of RL-produced agents as the product of one big training run (whose reward specification we better get right on the first try), as opposed to the product of an R&D feedback loop with reward as one non-static component:
RL systems per se are not reward-seekers (instead, they provide rewards), but are instead running instances of algorithms that can be seen as evolving in competition with others, with implementations subject to variation and selection by developers. Thus, in current RL practice, developers, RL systems, and agents have distinct purposes and roles.
…
RL algorithms have improved over time, not in response to RL rewards, but through research and development. If we adopt an agent-like perspective, RL algorithms can be viewed as competing in an evolutionary process where success or failure (being retained, modified, discarded, or published) depends on developers’ approval (not “reward”), which will consider not only current performance, but also assessed novelty and promise.
Thanks so much for these references. Additional quotes:
Current AI safety discussions sometimes treat RL systems as agents that seek to maximize reward, and regard RL “reward” as analogous to a utility function. Current RL practice, however, diverges sharply from this model: RL systems comprise often-complex training mechanisms that are fundamentally distinct from the agents they produce, and RL rewards are not equivalent to utility functions.
...
RL rewards are sources of information and direction for RL systems, but are not sources of value for agents. Researchers often employ “reward shaping” to direct RL agents toward a goal, but the rewards used shape the agent’s behavior are conceptually distinct from the value of achieving the goal.
Probably I should get around to reading CAIS, given that it made these points well before I did.
Probably I should get around to reading CAIS, given that it made these points well before I did.
I found it’s a pretty quick read, because the hierarchical/summary/bullet point layout allows one to skip a lot of the bits that are obvious or don’t require further elaboration (which is how he endorsed reading it in this lecture).
A similar point is (briefly) made in K. E. Drexler (2019). Reframing Superintelligence: Comprehensive AI Services as General Intelligence, §18 “Reinforcement learning systems are not equivalent to reward-seeking agents”:
And an additional point which calls into question the view of RL-produced agents as the product of one big training run (whose reward specification we better get right on the first try), as opposed to the product of an R&D feedback loop with reward as one non-static component:
Thanks so much for these references. Additional quotes:
Probably I should get around to reading CAIS, given that it made these points well before I did.
I found it’s a pretty quick read, because the hierarchical/summary/bullet point layout allows one to skip a lot of the bits that are obvious or don’t require further elaboration (which is how he endorsed reading it in this lecture).