My best guess about the core difference between optimization and agency is the thing I said above about, “a utility function, which they need for picking actions with probabilistic outcomes”.
An agent wants to move the state up an ordering (its optimization criterion). But an agent also has enough modelling ability to know that any given action has (some approximation of) a probability distribution over outcomes. (Maybe this is what you mean by “counterfactuality”.) Let’s say you’ve got a toy model where your ordering over states is A < B < C < D < E and you’re starting out in state C. The only way to decide between [a 30% chance of B + a 70% chance of D] and [a 40% chance of A + a 60% change of E] is to decide on some numerical measure for how much better E is than D, et cetera.
Gradient descent doesn’t have to do this at all. It just looks at the gradient and is like, number go down? Great, we go in the down direction. Similarly, natural selection isn’t doing this either. It’s just generating a bunch of random mutations and then some of them die.
(I’m not totally confident that one couldn’t somehow show some way in which these scenarios can be mathematically described as calculating an expected utility. But I haven’t needed to pull in these ideas for deconfusing myself about optimization.)
Why do you need to decide between those probability distributions? You only need to get one action (or distribution thereof) out. You can do it without deciding, eg by taking their average and sampling. On the other hand vNM tells us utility is being assigned if your choice satisfies some conditions, but vNM = agency is a complicated position to hold.
We know that at some level every physical system is doing gradient descent or a variational version thereof. So depending on the scale you model a system, you would assign different degrees of agency?
By the way gradient descent is a form of local utility minimization, and by tweaking the meaning of ‘local’ one can get many other things (evolution, Bayesian inference, RL, ‘games’, etc).
My best guess about the core difference between optimization and agency is the thing I said above about, “a utility function, which they need for picking actions with probabilistic outcomes”.
An agent wants to move the state up an ordering (its optimization criterion). But an agent also has enough modelling ability to know that any given action has (some approximation of) a probability distribution over outcomes. (Maybe this is what you mean by “counterfactuality”.) Let’s say you’ve got a toy model where your ordering over states is A < B < C < D < E and you’re starting out in state C. The only way to decide between [a 30% chance of B + a 70% chance of D] and [a 40% chance of A + a 60% change of E] is to decide on some numerical measure for how much better E is than D, et cetera.
Gradient descent doesn’t have to do this at all. It just looks at the gradient and is like, number go down? Great, we go in the down direction. Similarly, natural selection isn’t doing this either. It’s just generating a bunch of random mutations and then some of them die.
(I’m not totally confident that one couldn’t somehow show some way in which these scenarios can be mathematically described as calculating an expected utility. But I haven’t needed to pull in these ideas for deconfusing myself about optimization.)
Uhm two comments/questions on this.
Why do you need to decide between those probability distributions? You only need to get one action (or distribution thereof) out. You can do it without deciding, eg by taking their average and sampling. On the other hand vNM tells us utility is being assigned if your choice satisfies some conditions, but vNM = agency is a complicated position to hold.
We know that at some level every physical system is doing gradient descent or a variational version thereof. So depending on the scale you model a system, you would assign different degrees of agency?
By the way gradient descent is a form of local utility minimization, and by tweaking the meaning of ‘local’ one can get many other things (evolution, Bayesian inference, RL, ‘games’, etc).