Main response is in another comment; this is a tangential comment about prescriptive vs descriptive viewpoints on agency.
I think viewing agency as “the pipeline from the prescriptive to the descriptive” systematically misses a lot of key pieces. One central example of this: any properties of (inner/mesa) agents which stem from broad optima, rather than merely optima. (For instance, I expect that modularity of trained/evolved systems mostly comes from broad optima.) Such properties are not prescriptive principles; a narrow optimum is still an optimum. Yet we should expect such properties to apply to agenty systems in practice, including humans, other organisms, and trained ML systems.
The Kelly criterion is another good example: Abram has argued that it’s not a prescriptive principle, but it is still a very strong descriptive principle for agents in suitable environments.
More importantly, I think starting from prescriptive principles makes it much easier to miss a bunch of the key foundational questions—for instance, things like “what is an optimizer?” or “what are goals?”. Questions like these need some kind of answer in order for many prescriptive principles to make sense in the first place.
Also, as far as I can tell to date, there is an asymmetry: a viewpoint starting from prescriptive principles misses key properties, but I have not seen any sign of key principles which would be missed starting from a descriptive viewpoint. (I know of philosophical arguments to the contrary, e.g. this, but I do not expect such things to cash out into any significant technical problem for agency/alignment, any more than I expect arguments about solipsism to cash out into any significant technical problem.)
Main response is in another comment; this is a tangential comment about prescriptive vs descriptive viewpoints on agency.
I think viewing agency as “the pipeline from the prescriptive to the descriptive” systematically misses a lot of key pieces. One central example of this: any properties of (inner/mesa) agents which stem from broad optima, rather than merely optima. (For instance, I expect that modularity of trained/evolved systems mostly comes from broad optima.) Such properties are not prescriptive principles; a narrow optimum is still an optimum. Yet we should expect such properties to apply to agenty systems in practice, including humans, other organisms, and trained ML systems.
The Kelly criterion is another good example: Abram has argued that it’s not a prescriptive principle, but it is still a very strong descriptive principle for agents in suitable environments.
More importantly, I think starting from prescriptive principles makes it much easier to miss a bunch of the key foundational questions—for instance, things like “what is an optimizer?” or “what are goals?”. Questions like these need some kind of answer in order for many prescriptive principles to make sense in the first place.
Also, as far as I can tell to date, there is an asymmetry: a viewpoint starting from prescriptive principles misses key properties, but I have not seen any sign of key principles which would be missed starting from a descriptive viewpoint. (I know of philosophical arguments to the contrary, e.g. this, but I do not expect such things to cash out into any significant technical problem for agency/alignment, any more than I expect arguments about solipsism to cash out into any significant technical problem.)