Is Fisherian Runaway Gradient Hacking?
TL;DR: No; there is no directed agency that enforces sexual selection through an exploitable proxy. However, Fisherian runaway is an insightful example of the path-dependence of local search, where an easily acquired and apparently useful proxy goal can be so strongly favored that disadvantageous traits emerge as side effects.
Why are male peacocks so ornamented that they are at greatly increased risk of predation? How could natural selection favor such energetically expensive plumage that offers no discernible survival advantage? The answer is “sex”, or more poetically, “demons in imperfect search”.
Fisherian runaway is a natural process in which an easy-to-measure proxy for a “desired” trait is “hacked” by the optimisation pressure of evolution, leading to “undesired” traits. In the peacock example, a more ornamented tail could serve as a highly visible proxy for male fitness: peacocks that survive with larger tails are more likely to be agile and good at acquiring resources for energy. Alternatively, perhaps a preference for larger tail size is randomly acquired. In any case, once sexual selection by female peacocks has zeroed in on “plumage size” as a desirable feature, males with more plumage will likely have more children, reinforcing the trait in the population. Consequently, females are further driven to mate with large-tail men, as their male offspring will have larger tails and thus be more favored by mates. This selection process may then “run away” and produce peacocks with ever more larger tails via positive feedback, until the fitness detriment of this trait exceeds the benefit of selecting for fitter birds.
In outsourcing to sexual selection, natural selection has found an optimization demon. The overall decrease in peacock fitness is possible because the sexual selection pressure of the peahen locally exceeds the selection pressure imposed by predation and food availability. Peacocks have reached an evolutionary “dead-end”, where a maladaptive trait is dominant and persistent. If peacocks were moved “off distribution” to an environment where predation was harsher or food more scarce, they would fare significantly worse than their less ornamented, “unsexy” ancestors.
Gradient hacking is a process by which an internally acquired “mesa-optimizer” might compromise the optimization process of stochastic gradient descent (SGD) in a machine learning system. A mesa-optimizer might accomplish this by:
Introducing a countervailing, “artificial” performance penalty that “masks” the performance benefits of ML modifications that do well on the SGD objective, but not on the mesa-objective;
“Spoofing” performance benefits of certain ML modifications that are desirable to the mesa-objective by withholding performance gains until their implementation; or
In a reinforcement learning context, selectively sampling environmental states that will either leave the mesa-objective unchanged or “steer” the ML model in a way that favors the mesa-objective.
Mesa-optimization might be an “easily acquired policy” for good performance on a sufficiently complex ML task. Many mesa-objectives that allow for good performance in training may point to a proxy that, when optimized for in deployment, leads to undesirable behavior. Worse still is the case where a mesa-optimizer is instrumentally motivated to “deceive” the SGD objective because it has acquired both a mesa-objective that is misaligned with the outer objective, and the capability to retain or achieve the mesa-objective via gradient hacking.
Fisherian runaway seems similar to the first gradient hacking mechanism in that:
Sexual selection amplifies the proxy objective of “enormous tail plumage” because it serves as a locally good indicator of “fitness”. Producing a fit species is hard for natural selection given its nature of random, undirected search and the sparse feedback provided from the signal of “death-to-predation after maturity” (i.e. after acquiring plumage). Outsourcing to sexual selection based on an easily discerned (tails are enormous!) proxy for fitness allows for quicker, more reliable feedback.
The gradient of peacock fitness adaptations that should lead to globally better fitness is “masked” by local search. All else being equal, peacocks with smaller tails are more agile and energy conserving. Only the local “speed-bump” of sexual selection optimization pressure prevents peacocks from being guided to the optimal trait according to natural selection: far smaller tails.
The runaway amplification of maladaptive traits by sexual selection compromises the apparent objective of natural selection (fitness) in a manner similar to how gradient hacking results in compromised performance on the base objective.
If a peacock is moved “out-of-distribution”, it will “fail hard” according to the objective of natural selection. This is analogous to the framing of proxy misalignment failures as generalization failures.
Fisherian runaway seems unlike gradient hacking in that:
Sexual selection does not “choose” the proxy objective of “larger tails” via an agentic process. Fisherian runaway boosts somewhat arbitrary traits, not just ones that compromise fitness. “Larger tails” may in fact be a randomly acquired preference that is boosted by positive feedback and an implicit “agreement” among the population that larger tails are sexier. There is no “population” employed in SGD, although perhaps there is an analogous feature in genetic algorithms.
Natural and sexual selection are likely far noisier and more susceptible to local minima than SGD. It is unclear if SGD will trap ML models in local minima that sufficiently compromise global performance to the extent of Fisherian runaway.
Fisherian runaway offers the following insights for AI alignment:
For inner alignment, the selection pressure of the outer optimizer should exceed that which the mesa-optimizer can apply. If we desire peacocks to have higher agility or energy conservation, we should shape the training environment such that predation and food scarcity are such strong incentives that any excessive plumage is disfavoured. The existence of mild incentives for fitness without a sufficiently harsh local penalty is what allows maladaptive local processes to experience runaway amplification.
If a trait appears dominant in an AI system, maybe we should not make the Darwinian fallacy and assume that the trait has arisen because it is “purposeful” or globally advantageous. It is unclear to me if the simplicity prior of SGD prohibits the random selection of proxy goals that are boosted by positive feedback mechanisms.
“Agentic” search might not be necessary for something quite similar to gradient hacking to emerge. The local nature of search via SGD might be sufficient to birth optimization demons.
Fisherian runaway in peacock plumage is a surprisingly useful “intuition pump” for exploring gradient hacking. I suspect there are many further examples of possible runaway Fisher processes in nature that could be mined for useful insight, such as that discussed here. Ecological models that favor Fisherian runaway might be adapted into useful mathematical approximations of gradient hacking and allow this phenomenon to be instantiated and studied in minimal ML models.
- 28 Oct 2022 23:01 UTC; 1 point) 's comment on Ryan Kidd’s Shortform by (
TL;DR: Tailed peacocks make better female chicks
Let’s, for a moment, pretend to be a peahen choosing a sexual mate. We have a few options, with different degrees of impressive tails. As stated in the post, it is difficult to tell whether the tail is a good proxy for fitness. Indeed one could either argue that having a big tail is a handicap for the peacock, limiting agility for example, or that it is a strong hint that the peacock is otherwise very fit, despite the big tail. I would argue that, given the information we have, i.e. all the potential male mates survived so far, we shouldn’t assume a higher/lower fitness between them.
But, why do we care anyway? We are not interested in the fitness of our future mate, but rather in the fitness of our future chicks. And here, I think, the tail is relevant.
If we have male chicks, the choice of a mate will influence both the size of their tail and other characteristics like the ability to find food, agility and so on. As before, it doesn’t seem that the tail is a reasonable proxy on how to produce better male chicks.
If we have female chicks, the story is very different. A female chick will partially inherit the agility and general ability to survive from the mate we will choose, but will not inherit the handicap of a big tail. Therefore, we should choose the mate with the biggest tail.
Interesting! I came to a similar conclusion (with less detail) in a post about real-life gradient hacking which contains some other possible examples you might also be interested in (very un-elaborated)
Fisherian runaway doesn’t make any sense to me.
Suppose that each individual in a species of a given sex has some real-valued variable X, which is observable by the other sex. Suppose that, absent considerations about sexual selection by potential mates for the next generation, the evolutionarily optimal value for X is 0. How could we end up with a positive feedback loop involving sexual selection for positive values of X, creating a new evolutionary equilibrium with an optimal value X=1 when taking into account sexual selection? First the other sex ends up with some smaller degree of selection for positive values of X (say selecting most strongly for X=.5). If sexual selection by the next generation of potential mates were the only thing that mattered, then the optimal value of X to select for is .5, since that’s what everyone else is selecting for. That’s stability, not positive feedback. But sexual selection by the next generation of potential mates isn’t the only thing that matters; by stipulation, different values of X have effects on evolutionary fitness other than through sexual selection, with values closer to 0 being better. So, when choosing a mate, one must balance the considerations of sexual selection by the next generation (for which X=.5 is optimal) and other considerations (for which X=0 is optimal), leading to selection for mates with 0<X<.5 being evolutionarily optimal. That’s negative feedback. How do you get positive feedback?
In the context of your model, I see two potential ways that Fisherian runaway might occur:
Within each generation, males that survive with higher X are consistently fitter on average than males that survive with lower X because the fitness required to survive monotonically increases with X. Therefore, in every generation, choosing males with higher X is a good proxy for local improvements in fitness. However, the performance detriments of high X “off-distribution” are never signalled. In an ML context, this is basically distributional shift via proxy misalignment.
Positive feedback that negatively impacts fitness “on-distribution” might occur temporarily if selection for higher X is so strong that it has “acquired momentum” that ensures females will select for higher X males for several generations past the point the trait becomes net costly for fitness. This is possible if the negative effects of the trait take longer to manifest selection pressure than the time window during which sexual selection boosts the trait via preferential mating. This mechanism is temporary, however, but I can see search processes halting prematurely in an ML context.
By “optimal”, I mean in an evidential, rather than causal, sense. That is, the optimal value is that which signals greatest fitness to a mate, rather than the value that is most practically useful otherwise. I took Fisherian runaway to mean that there would be overcorrection, with selection for even more extreme traits than what signals greatest fitness, because of sexual selection by the next generation. So, in my model, the value of X that causally leads to greatest chance of survival could be −1, but high values for X are evidence for other traits that are causally associated with survivability, so X=0 offers best evidence of survivability to potential mates, and Fisherian runaway leads to selection for X=1. Perhaps I’m misinterpreting Fisherian runaway, and it’s just saying that there will be selection for X=0 in this case, instead of over-correcting and selecting for X=1? But then what’s all this talk about later-generation sexual selection, if this doesn’t change the equilibrium?
Ah, so if we start out with an average X=−10, standard deviation 1, and optimal X=0, then selecting for larger X has the same effect as selecting for X closer to 0, and that could end up being what potential mates do, driving X up over the generations, until it is common for individuals to have positive X, but potential mates have learned to select for higher X? Sure, I guess that could happen, but there would then be selection pressure on potential mates to stop selecting for higher X at this point. This would also require a rapid environmental change that shifts the optimal value of X; if environmental changes affecting optimal phenotype aren’t much faster than evolution, then optimal phenotypes shouldn’t be so wildly off the distribution of actual phenotypes.
I think it’s important to distinguish between “fitness as evaluated on the training distribution” (i.e. the set of environments ancestral peacocks roamed) and “fitness as evaluated on a hypothetical deployment distribution” (i.e. the set of possible predation and resource scarcity environments peacocks might suddenly face). Also important is the concept of “path-dependent search” when fitness is a convex function on X which biases local search towards X=1, but has global minimum at X=−1.
In this case, I’m imagining that Fisherian runaway boosts X as long as it still indicates good fitness on-distribution. However, it could be that X=1 is the “local optimum for fitness” and in reality X=−1 is the global optimum for fitness. In this case, the search process has chosen an intiial X-direction that biases sexual selection towards X=1. This is equivalent to gradient descent finding a local minima.
I think I agree with your thoughts here. I do wonder if sexual selection in humans has reached a point where we are deliberately immune to natural selection pressure due to such a distributional shift and acquired capabilities.