Not all such goals have to be instrumental to terminal goals, and in humans the line between instrumental and noninstrumental is not clear. Like at one extreme the instrumental goal is explicitly created by thinking about what would increase money/status, but at another “instrumental” goal is a shard reinforced by a money/status drive which would not change as the money/status drive changes.
Also even if the goal of selling apple pies is entirely instrumental, it’s still interesting that the goal can be dissolved once it’s no longer compatible with the terminal goal of e.g. gaining money. This means that not all goals are dangerously self-preserving.
Yes, exactly. Like, we humans mostly have something that kinda feels intrinsic but that also pays rent and updates with experience, like a Go player’s sense of “elegant” go moves. My current (not confident) guess is that these thingies (that humans mostly have) might be a more basic and likely-to-pop-up-in-AI mathematical structure than are fixed utility functions + updatey beliefs, a la Bayes and VNM. I wish I knew a simple math for them.
I feel like… no, it is not very interesting, it seems pretty trivial? We (agents) have goals, we have relationships between them, like “priorities”, we sometimes abandon goals with low priority in favor of goals with higher priorities. We also can have meta-goals like “how should my systems of goals look like” and “how to abandon and adopt intermediate goals in a reasonable way” and “how to do reflection on goals” and future superintelligent systems probably will have something like that. All of this seems to me coming in package with concept of “goal”.
Not all such goals have to be instrumental to terminal goals, and in humans the line between instrumental and noninstrumental is not clear. Like at one extreme the instrumental goal is explicitly created by thinking about what would increase money/status, but at another “instrumental” goal is a shard reinforced by a money/status drive which would not change as the money/status drive changes.
Also even if the goal of selling apple pies is entirely instrumental, it’s still interesting that the goal can be dissolved once it’s no longer compatible with the terminal goal of e.g. gaining money. This means that not all goals are dangerously self-preserving.
Yes, exactly. Like, we humans mostly have something that kinda feels intrinsic but that also pays rent and updates with experience, like a Go player’s sense of “elegant” go moves. My current (not confident) guess is that these thingies (that humans mostly have) might be a more basic and likely-to-pop-up-in-AI mathematical structure than are fixed utility functions + updatey beliefs, a la Bayes and VNM. I wish I knew a simple math for them.
The simple math is active inference, and the type is almost entirely the same as ‘beliefs’.
I feel like… no, it is not very interesting, it seems pretty trivial? We (agents) have goals, we have relationships between them, like “priorities”, we sometimes abandon goals with low priority in favor of goals with higher priorities. We also can have meta-goals like “how should my systems of goals look like” and “how to abandon and adopt intermediate goals in a reasonable way” and “how to do reflection on goals” and future superintelligent systems probably will have something like that. All of this seems to me coming in package with concept of “goal”.