we have decided to slightly update the terminology: in the latest version of our paper (accepted to AAAI, just released on arXiv) we prefer the term instrumental control incentive (ICI), to emphasize that the distinction to “control as a side effect”.
For exactly the same reason, In my own recent paper Counterfactual
Planning, I introduced the terms
direct incentive and indirect incentive, where I frame the
removal of a path to value in a planning world diagram as an action
that will eliminate a direct incentive, but that may leave other
indirect incentives (via other paths to value) intact. In section 6
of the paper and in this post of the
sequence
I develop and apply this terminology in the case of an agent emergency
stop button.
In high-level descriptions of what the technique of creating
indifference via path removal (or balancing terms) does, I have
settled on using the terminology suppresses the incentive instead
of removes the incentive.
I must admit that I have not read many control theory papers, so
any insights from Rebecca about standard terminology from control
theory would be welcome.
Do they have some standard phrasing where they can say things like ‘no
value to control’ while subtly reminding the reader that ‘this does
not imply there will be no side effects?’
On recent terminology innovation:
For exactly the same reason, In my own recent paper Counterfactual Planning, I introduced the terms direct incentive and indirect incentive, where I frame the removal of a path to value in a planning world diagram as an action that will eliminate a direct incentive, but that may leave other indirect incentives (via other paths to value) intact. In section 6 of the paper and in this post of the sequence I develop and apply this terminology in the case of an agent emergency stop button.
In high-level descriptions of what the technique of creating indifference via path removal (or balancing terms) does, I have settled on using the terminology suppresses the incentive instead of removes the incentive.
I must admit that I have not read many control theory papers, so any insights from Rebecca about standard terminology from control theory would be welcome.
Do they have some standard phrasing where they can say things like ‘no value to control’ while subtly reminding the reader that ‘this does not imply there will be no side effects?’