mattmacdermott comments on Is instrumental convergence a thing for virtue-driven agents?

mattmacdermott 3 Apr 2025 5:06 UTC
2 points
0

anything that outputs decisions implies a utility function

I think this is only true in a boring sense and isn’t true in more natural senses. For example, in an MDP, it’s not true that every policy maximises a non-constant utility function over states.
- Davidmanheim 4 Apr 2025 11:20 UTC
  2 points
  0
  Parent
  The boring sense that is enough to say that it increases in intelligence, which was the entire point.
  - mattmacdermott 4 Apr 2025 16:53 UTC
    2 points
    0
    Parent
    Maybe you could spell this out a bit more? What concretely do you mean when you say that anything that outputs decisions implies a utility function — are you thinking of a certain mathematical result/procedure?
- tailcalled 3 Apr 2025 5:18 UTC
  2 points
  0
  Parent
  This.
  In particular imagine if the state space of the MDP factors into three variables x, y and z, and the agent has a bunch of actions with complicated influence on x, y and z but also just some actions that override y directly with a given value.
  In some such MDPs, you might want a policy that does nothing other than copy a specific function of x to y. This policy could easily be seen as a virtue, e.g. if x is some type of event and y is some logging or broadcasting input, then it would be a sort of information-sharing virtue.
  While there are certain circumstances where consequentialism can specify this virtue, it’s quite difficult to do in general. (E.g. you can’t just minimize the difference between f(x) and y because then it might manipulate x instead of y.)