This doesn’t feel like a good summary of what Rohin says in his sequence.
I was not trying to summarize the entire sequence, only summarizing my impressions of some things he said in the first post of the sequence. Those impressions are that Rohin was developing his intuitive notion of goal-directedness in a very different direction than you have been doing, given the examples he provides.
Which would be fine, but it does lead to questions of how much your approach differs. My gut feeling is that the difference in directions might be much larger than can be expressed by the mere adjective ‘behavioral’.
On a more technical note, if your goal is to search for metrics related to “less probability that the AI steals all my money to buy hardware and goons to ensure that it can never be shutdown”, then the metrics that have been most productive in my opinion are, first, ‘indifference’, in the meaning where it is synonymous with ‘not having a control incentive’. Other very relevant metrics are ‘myopia’ or ‘short planning horizons’ (see for example here) and ‘power’ (see my discussion in the post Creating AGI Safety
Interlocks).
(My paper counterfactual
planning has a definition of ‘indifference’ which I designed to be more accessible than the `not having a control incentive’ definition, i.e. more accessible for people not familiar with Pearl’s math.)
None of the above metrics look very much like ‘non-goal-directedness’ to me, with the possible exception of myopia.
I was not trying to summarize the entire sequence, only summarizing my impressions of some things he said in the first post of the sequence. Those impressions are that Rohin was developing his intuitive notion of goal-directedness in a very different direction than you have been doing, given the examples he provides.
Which would be fine, but it does lead to questions of how much your approach differs. My gut feeling is that the difference in directions might be much larger than can be expressed by the mere adjective ‘behavioral’.
On a more technical note, if your goal is to search for metrics related to “less probability that the AI steals all my money to buy hardware and goons to ensure that it can never be shutdown”, then the metrics that have been most productive in my opinion are, first, ‘indifference’, in the meaning where it is synonymous with ‘not having a control incentive’. Other very relevant metrics are ‘myopia’ or ‘short planning horizons’ (see for example here) and ‘power’ (see my discussion in the post Creating AGI Safety Interlocks).
(My paper counterfactual planning has a definition of ‘indifference’ which I designed to be more accessible than the `not having a control incentive’ definition, i.e. more accessible for people not familiar with Pearl’s math.)
None of the above metrics look very much like ‘non-goal-directedness’ to me, with the possible exception of myopia.