One particularly interesting recent work in this domain was Leike et al.’s “Learning human objectives by evaluating hypothetical behaviours,” which used human feedback on hypothetical trajectories to learn how to avoid environmental traps. In the context of the capability exploration/objective exploration dichotomy, I think a lot of this work can be viewed as putting a damper on instrumental capability exploration.
Isn’t this work also linked to objective exploration? One of the four “hypothetical behaviours” used is the selection of trajectories which maximizes reward uncertainty. Trajectories which are then evaluated by humans.
Isn’t this work also linked to objective exploration? One of the four “hypothetical behaviours” used is the selection of trajectories which maximizes reward uncertainty. Trajectories which are then evaluated by humans.