Thanks so much for these references. Additional quotes:
Current AI safety discussions sometimes treat RL systems as agents that seek to maximize reward, and regard RL “reward” as analogous to a utility function. Current RL practice, however, diverges sharply from this model: RL systems comprise often-complex training mechanisms that are fundamentally distinct from the agents they produce, and RL rewards are not equivalent to utility functions.
...
RL rewards are sources of information and direction for RL systems, but are not sources of value for agents. Researchers often employ “reward shaping” to direct RL agents toward a goal, but the rewards used shape the agent’s behavior are conceptually distinct from the value of achieving the goal.
Probably I should get around to reading CAIS, given that it made these points well before I did.
Probably I should get around to reading CAIS, given that it made these points well before I did.
I found it’s a pretty quick read, because the hierarchical/summary/bullet point layout allows one to skip a lot of the bits that are obvious or don’t require further elaboration (which is how he endorsed reading it in this lecture).
Thanks so much for these references. Additional quotes:
Probably I should get around to reading CAIS, given that it made these points well before I did.
I found it’s a pretty quick read, because the hierarchical/summary/bullet point layout allows one to skip a lot of the bits that are obvious or don’t require further elaboration (which is how he endorsed reading it in this lecture).