One of the main arguments in AI risk goes something like:
AI is likely to be a utility maximizer (or goal-directed in some other sense)
Goodhart, instrumental convergence, etc make powerful goal-directed agents dangerous by default
One common answer to this is “ok, how about we make AI which isn’t goal-directed”?
Unconscious Economics says: selection effects will often create the same effect as goal-directedness, even if we’re trying to build a non-goal-directed AI.
Discussions around CAIS are one obvious application. Paul’s “you get what you measure” failure-mode is another. A less-obvious application which I’ve personally run into recently: one strategy to deal with inner optimizers is to design learning algorithms which specifically avoid regions of parameter space in which the trained system will perform optimization. The Unconscious Economics argument says that this won’t actually avoid the risk: selection effects from the outer optimizer will push the trained system to misbehave in exactly the same ways, even without an inner optimizer.
Connection to the Economics Literature
During the past year I’ve found and read a bit more of the formal economics literature related to selection-effect-driven economics.
The most notable work seems to be Nelson and Winter’s “An Evolutionary Theory of Economic Change”, from 1982. It was a book-length attempt to provide a mathematical foundation for microeconomics grounded in selection effects, rather than assuming utility-maximizing agents from the get-go. Reading through that book, it’s pretty clear why the perspective hasn’t taken over economics: Nelson and Winter’s models are not very good. Some of the larger shortcomings:
They limit themselves to competition between firms, and their models contain details which limit their generalization to other kinds of agents
They use a “static” notion of equilibrium (i.e. all agents are individually unchanging), rather than a “dynamic” notion (i.e. distribution of agents is unchanging)
They seem to lack the mathematical skills to prove properties of reasonably general models; instead they rely heavily on simulation
I do not see any of these problems as substantial barriers to a selection-based theory; it’s just that Nelson and Winter did not have the mathematical chops to make it happen, and nobody better seems to have come along since.
Connection to Alignment
One of the main arguments in AI risk goes something like:
AI is likely to be a utility maximizer (or goal-directed in some other sense)
Goodhart, instrumental convergence, etc make powerful goal-directed agents dangerous by default
One common answer to this is “ok, how about we make AI which isn’t goal-directed”?
Unconscious Economics says: selection effects will often create the same effect as goal-directedness, even if we’re trying to build a non-goal-directed AI.
Discussions around CAIS are one obvious application. Paul’s “you get what you measure” failure-mode is another. A less-obvious application which I’ve personally run into recently: one strategy to deal with inner optimizers is to design learning algorithms which specifically avoid regions of parameter space in which the trained system will perform optimization. The Unconscious Economics argument says that this won’t actually avoid the risk: selection effects from the outer optimizer will push the trained system to misbehave in exactly the same ways, even without an inner optimizer.
Connection to the Economics Literature
During the past year I’ve found and read a bit more of the formal economics literature related to selection-effect-driven economics.
The most notable work seems to be Nelson and Winter’s “An Evolutionary Theory of Economic Change”, from 1982. It was a book-length attempt to provide a mathematical foundation for microeconomics grounded in selection effects, rather than assuming utility-maximizing agents from the get-go. Reading through that book, it’s pretty clear why the perspective hasn’t taken over economics: Nelson and Winter’s models are not very good. Some of the larger shortcomings:
They limit themselves to competition between firms, and their models contain details which limit their generalization to other kinds of agents
They use a “static” notion of equilibrium (i.e. all agents are individually unchanging), rather than a “dynamic” notion (i.e. distribution of agents is unchanging)
They seem to lack the mathematical skills to prove properties of reasonably general models; instead they rely heavily on simulation
I do not see any of these problems as substantial barriers to a selection-based theory; it’s just that Nelson and Winter did not have the mathematical chops to make it happen, and nobody better seems to have come along since.