[LDSL#1] Performance optimization as a metaphor for life
Followup to: Some epistemological conundrums. Response to: The omnigenic model as a metaphor for life. Part 1 in this post will introduce a concept from programming, part 2 will apply it to life in general. This post is also available on my Substack.
Part 1: Performance optimization
In computer programming, there is a tradeoff between software that is quick and easy for humans to write and read, versus software that is fast for the computer to execute. Moving from the former to the latter is called “performance optimization”.
Moving from the latter to the former is called “high-level rewriting”. This is rarely done intentionally, but a substantial part of the reason computers are sometimes still sluggish despite having gotten orders of magnitude faster is that high-level rewriting has happened implicitly as old software that focused more on performance has been replaced with new software that focuses more on developer speed.
Because programming is done at such a high level, there is almost always a way to do performance optimization in any piece of code. Yet, it’s considered a beginner mistake to just pick a random piece of code to go optimize, which is sometimes encapsulated in sayings like “premature optimization is the root of all evil” or “the epsilon fallacy”. Instead, a recommended approach is to run the code, measure what takes the most resources, and fix that. But why is that so important?
Computers are fast, or alternatively, software/data/people/society is very slow and long-term-oriented compared to individual instructions. Let’s say there’s something a computer can do in 1 microsecond, but you write the code inefficiently, so it takes 1000x too long. This means that it takes… 0.001 seconds, aka basically nothing in most circumstances.
But let’s say that it’s an operation that you need to apply to 10000 datapoints, so you need to repeat it 10000 times. Now suddenly it takes 10 seconds, which is most likely worth optimizing, as otherwise you wouldn’t e.g. want to wait 10 seconds for a page to load. If you need to apply it to each pair of the 10000 datapoints, then you’ve got a 10000 by 10000 grid, or 100000000 repetitions, which will take more than a day to execute, so this absolutely needs to be optimized.
Usually, an execution of some code consists of many pieces whose resource use adds up. Some of these pieces just run a quick operation once, whereas others are multiplied by numbers such as dataset size, internet latency, or similar. This leads to resource use varying by orders of magnitude. If you add up the few pieces that take up the most resources, then there’s usually not much unexplained resource usage, so you should focus on optimizing those pieces, and can’t get much from optimizing the other pieces.
Part 2: As a metaphor for life
In the omnigenic model as a metaphor for life, Scott Alexander presents Fisher’s infinitesimal model (called “the omnigenic model”), which asserts that genes influence phenotypes through a large number of variants of small individual effect. He suggests using this as a general causal model, not just for genomics, but for science and all of life.
Scott Alexander is vaguely aware that this causal model doesn’t quite cover everything:
Now the biopsychosocial model has caught on and everyone agrees that depression is complicated. I don’t know if we’re still at the “dozens of things” stage or the “hundreds of things stage”, but I don’t think anyone seriously thinks it’s fewer than a dozen. The structure of depression seems different from the structure of genetic traits in that one cause can still have a large effect; multiple sclerosis might explain less than 1% of the variance in depressedness, but there will be a small sample of depressives whose condition is almost entirely because of multiple sclerosis. But overall, I think the analogy to genetics is a good one.
… but he doesn’t go that much in depth about it, and ultimately still suggests using similar approaches to genomics, at least as a conceptual ideal, with the greatest challenge being the effect size estimation.
Yet, we could instead use performance optimization as a metaphor for life. Multiple sclerosis only causes slight variance in depressedness, but in those people where it is the cause, not only does it have a huge effect, but also it is presumably relatively obvious, since it is a strong condition with lots of other effects. Multiple sclerosis cannot yet be cured, but it can be somewhat mitigated with certain medical treatments. This I assume is superior to simply treating the depression (with antidepressants?), e.g. since it also helps with other MS effects than simply depression.
I would use performance optimization as a metaphor for life. Similar to how poor software performance in a big piece of code tends to come from singular loops, problems and opportunities in life in general tend to be rare, extreme, individual things.
Scott Alexander brings up another good example in a different blog post, How Bad Are Things?
A perfectly average patient will be a 70 year old woman who used to live somewhere else but who moved her a few years ago after her husband died in order to be closer to family. She has some medical condition or other that prevents her from driving or walking around much, and the family she wanted to be closer to have their own issues, so she has no friends within five hundred miles and never leaves her house except to go to doctors’ appointments. She has one son, who is in jail, and one daughter, who married a drug addict. She also has one grandchild, her only remaining joy in the world – but her drug-addict son-in-law uses access to him as a bargaining chip to make her give him money from her rapidly-dwindling retirement account so he can buy drugs. When she can’t cough up enough quickly enough, he bans her from visiting or talking to the grandchild, plus he tells the grandchild it’s her fault. Her retirement savings are rapidly running out and she has no idea what she will do when they’re gone. Probably end up on the street. Also, her dog just died.
If my patients were to read the above paragraph, there are a handful who would sue me for breach of confidentiality, assuming I had just written down their medical history and gotten a couple of details like the number of children wrong. I didn’t. This is a type.
Here we have a few large factors (most notably wealth, the grandchild, drugs, and aging) which multiplicatively interact to produce a terrible dynamic. If even one of these factors was small, this problem probably wouldn’t be occurring (though e.g. lack of wealth would replace it with a different problem, probably—unless she could move in with her daughter or something). This is precisely analogous to the multiplicative interaction in performance optimization.
If this is how life usually looks, then it suggests precisely the opposite model from the omnigenic model as a metaphor for life. Using statistics to explain variance works well if you have a homogenous population, as then you can find the parameters for the mechanism you are studying. However, statistics works less well if you have a distribution with varying mechanisms.
Probably the biggest problem with statistics is that one might e.g. take the logarithm, in order to capture the multiplicative effects and reduce the sensitivity to outliers. But this makes the statistics entirely focus on the common factors and ignore the special ones, which is a problem because with long tails the special factors are all that matters.
If you are dealing with a case that is affected by a mixture of many different long-tailed phenomena, you can understand it well by figuring out which factors are extreme for this particular case, and what their causes and consequences are. While there will be lots of factors that this doesn’t account for, these factors are “small” and thus don’t really matter.
You might think this is just an artifact of dealing with clinical subjects, who are pre-selected for having a pathology, but this holds even when navigating the absolute basics of your everyday life:
A good analogy would be a table, like in the image above. There’s some big factors that influenced this table; someone put a book and a cup and a watch and a potted plant on it. But these factors don’t actually account for most of the log-variance of the table. The boards of the table have a grain with lots of detailed indentations; the wood is probably covered in dust and microbes of different species. The microbes’ DNA has a lot of entropy, not just in the species-shared aspects, but also in their individual mutations. Further, every molecule vibrates with its own chaotic heat.
One cannot describe the table, only a very small fraction of the factors that influence it. However, the factors vary wildly in their magnitude, and by describing the biggest factors on it, one gets a pretty good idea about the things that matter.
(Even if there are lots of small factors that seem to add up to a big dangerous factor, e.g. some of the microbes are of a dangerous ebola species that could kill you, it is still more feasible to describe them using the cause, e.g. a spill of a sample containing ebola, than it is to list each ebola microbe individually.)
This is related to the difference between normal distributions and log-normal/power law distributions, where the difference is in a normal distribution, the tails are very thin and thus outliers don’t matter at all. In particular, the CLT lets us get normal distributions when we have a large collection of small effects added up, which is probably the case in genetics, but for log-normal/power law distributions, only a few outlier data-points matter at all.
This maps to polycausal vs monocausal distinctions reasonably well.
Indeed, intuitions about which of the 2 patterns/3 distributions is most common or explanatory is likely a huge crux under a whole lot of topics.
Per the missing heritability problem, it’s not even clear genetics works like this, and it’s hard to come up with anything where the case for it working like this is better than for genetics.
I’ll get more into some of this stuff in later posts I think.
Alright, I want to see your take on how the missing heritability problem blocks massively polycausal/normal distributions from being the dominant factor in human traits.
Agree that in other areas, things are more monocausal than genetics/human traits.
First, it should be noted that human traits are usually lognormally distributed, with apparent normal distributions being an artifact. E.g. while IQ is normally distributed, per item response theory it has an exponential relationship to the likelihood of success at difficult tasks. E.g. Most of What You Read on the Internet is Written by Insane People. Etc. So it’s not really about normal distribution vs lognormal distributions, it’s about linear diffusion of lognormals vs exponential interaction[1] of normals[2].
There’s some different solutions to the missing heritability problem. One proposal is rare variants, since they aren’t picked up by most sequencing technology, but the rarer the variant, the larger the effect size can be, so that makes the rare variants end up as our “sparse lognormals”.
But let’s say rare variants are of negligible size, so they don’t give us linear diffusion of lognormals, and instead the longtailedness of human traits is due to some sort of exponential interaction.
Then another thing that could give us missing heritability is if apparent traits aren’t actually the true genetic traits, but rather the true genetic traits trigger some dynamics, with e.g. the largest dynamics dominating, and (the logarithm of) those dynamics are what we end up measuring as traits. But that’s just linear diffusion of sparse lognormals on a phenotypic level of analysis.
As in exp(∑iβixi)
Or, well, short-tailed variables; e.g. alleles are usually modelled as Bernoulli.