It seems plausible the common formalism of agents with utility functions is more adequate for describing the individual “subsystems” than the whole human minds. Decisions on the whole mind level are more like results of interactions between the sub-agents; results of multi-agent interaction are not in general an object which is naturally represented by utility function. For example, consider the sequence of game outcomes in repeated PD game. If you take the sequence of game outcomes (e.g. 1: defect-defect, 2:cooperate-defect, … ) as a sequence of actions, the actions are not representing some well behaved preferences, and in general not maximizing some utility function.
I just want to highlight this as what seems to me a particularly important and correct paragraph. I think it manages to capture an important part of the reason why I think that modeling human values as utility functions is the wrong approach, which I hadn’t been able to state as clearly and concisely before.
I just want to highlight this as what seems to me a particularly important and correct paragraph. I think it manages to capture an important part of the reason why I think that modeling human values as utility functions is the wrong approach, which I hadn’t been able to state as clearly and concisely before.