dsj comments on Reward is not the optimization target

dsj 31 Mar 2023 4:45 UTC
LW: 10 AF: 8
0
AF
A similar point is (briefly) made in K. E. Drexler (2019). Reframing Superintelligence: Comprehensive AI Services as General Intelligence, §18 “Reinforcement learning systems are not equivalent to reward-seeking agents”:
Reward-seeking reinforcement-learning agents can in some instances serve as models of utility-maximizing, self-modifying agents, but in current practice, RL systems are typically distinct from the agents they produce … In multi-task RL systems, for example, RL “rewards” serve not as sources of value to agents, but as signals that guide training[.]
And an additional point which calls into question the view of RL-produced agents as the product of one big training run (whose reward specification we better get right on the first try), as opposed to the product of an R&D feedback loop with reward as one non-static component:
RL systems per se are not reward-seekers (instead, they provide rewards), but are instead running instances of algorithms that can be seen as evolving in competition with others, with implementations subject to variation and selection by developers. Thus, in current RL practice, developers, RL systems, and agents have distinct purposes and roles.
…
RL algorithms have improved over time, not in response to RL rewards, but through research and development. If we adopt an agent-like perspective, RL algorithms can be viewed as competing in an evolutionary process where success or failure (being retained, modified, discarded, or published) depends on developers’ approval (not “reward”), which will consider not only current performance, but also assessed novelty and promise.
- TurnTrout 3 Apr 2023 17:41 UTC
  5 points
  2
  Parent
  Thanks so much for these references. Additional quotes:
  Current AI safety discussions sometimes treat RL systems as agents that seek
  to maximize reward, and regard RL “reward” as analogous to a utility function.
  Current RL practice, however, diverges sharply from this model: RL systems
  comprise often-complex training mechanisms that are fundamentally distinct
  from the agents they produce, and RL rewards are not equivalent to utility
  functions.
  ...
  RL rewards are sources of information and direction for RL systems,
  but are not sources of value for agents. Researchers often employ “reward
  shaping” to direct RL agents toward a goal, but the rewards used shape the
  agent’s behavior are conceptually distinct from the value of achieving the
  goal.
  Probably I should get around to reading CAIS, given that it made these points well before I did.
  - dsj 3 Apr 2023 18:35 UTC
    1 point
    0
    Parent
    Probably I should get around to reading CAIS, given that it made these points well before I did.
    I found it’s a pretty quick read, because the hierarchical/summary/bullet point layout allows one to skip a lot of the bits that are obvious or don’t require further elaboration (which is how he endorsed reading it in this lecture).