Choosing actions which exploit known biases and blind spots in humans (as the Cicero Diplomacy agent may be doing [Bakhtin et al., 2022]) or in learned reward models. 10
I’ve spent several hours reading dialogue involving Cicero, and it’s not at all evident to me that it’s “exploiting known biases and blind spots in humans”. It is, however, good at proposing and negotiating plans, as well as accumulating power within the context of the game.
I’ve spent several hours reading dialogue involving Cicero, and it’s not at all evident to me that it’s “exploiting known biases and blind spots in humans”. It is, however, good at proposing and negotiating plans, as well as accumulating power within the context of the game.