Thinking about this post these days… Editing discussions might be better focused on personality: is that feasible, statistically? It seems like it might be, but we don’t know.
The focus on IQ in older discussions strikes me as increasingly misguided. It’s a good trait to start with, because it is important, well-studied, and turns out to be highly tractable, but it should only be a stepping stone to more useful approaches like index scores. There’s also another reason to treat IQ as just a toy example: we are now well into the deep learning revolution, and it’s come so far, and there’s so much scope for scaling & improvement, that it seems like IQ is plummeting in value each year. Already it feels like people get less or more out of AI based on their flexibility and willingness to experiment or to step back & delegate & finish missing pieces. When the LLMs can do all the smart things you ask them to do, the value becomes in asking for good ones, and making good use of them. The future doesn’t seem like it’ll be kind to neurotic, eager-to-please types, but good to those who are unafraid to have clear visions or know what they want, finish projects and—pace Amdahl’s law—make themselves as little of a rate-limiting step as possible.* That is, if you ask, what would be good to edit for, beyond narrow health traits, it seems like the answer is not (just) IQ but non-cognitive traits like Openness or Conscientiousness or (dis?)Agreeableness. So, you should probably start skating towards that puck yesterday.
Problem is, the personality GWASes, last I checked several years ago, were terrible. The PGS % is ~0%, and the GCTAs or LDSC (common SNP heritabilities) not much better, from UK Biobank in particular. The measurements of Big Five seem normal, and the sample sizes seem good, so it doesn’t seem like a mere statistical power or measurement error issue. What gives? GREML-KIN suggests that a good chunk of it may be rare variants, but the situation is still not great:
For neuroticism the final model consisted of contributions from the variance components G and K. Additive common genetic effects explained 11% (SE = 2%) of the variance with pedigree-associated variants explaining an additional 19% (SE = 3%). Whereas none of the environmental components were statistically-significant, the family component accounted for 2% of the variance in the full model and 1% in a model that included only the G and the K in addition to F.
For extraversion the only detectable source of genetic variation came from the G, which accounted for 13% (SE = 2%), with F explaining a further 9% (SE = 1%) of the phenotypic variation. The lack of pedigree-associated genetic effects could be due to low statistical power, as K explained 5% of the variance in the full model and 6% in a GKF model, but with a relatively large SE, estimated at 5%.
This is despite personality traits often clearly being highly heritable, easily 50% (and Neuroticism/Extraversion might even be the best case scenarios for Big Five here—Openness might pick up mostly IQ/EDU, and C/A a wash).
And this is consistent with someevolutionary scenarios like frequency-dependent selection, where personality is seen as a kind of knob on various things like risktaking, where there cannot be any kind of universal a priori optimal level of risktaking.
So simple additive variants will tend to systematically push organisms ‘too high (low)’ and be maladaptive, and fixate, leaving only weirder stuff which has less average effect, like dominance or epistasis.
Which is very bad because from what I recall of formal modeling of the statistical power of GWASes for detecting & estimating specific nonlinear variants, the situation is dire.
Estimating combinatorially many interactions across millions of common & rare variants, if we want to maintain the standard genome-wide false positive rate, means that we will have to adjust for all the tests/comparisons we’ll run, and that is going to push the sample sizes up from the current feasible millions to possibly hundreds of millions or even billions.
(Andrew Gelman’s rule of thumb is that an interaction requires 16x more data, and that’s for the simplest easiest case, so...)
So, this looks pretty bad for any kind of selection process. Rare variants are more expensive to WGS/impute per embryo, they are far more data-expensive to estimate, the sheer rareness means even when estimated they are not useful for selection, and then they turn out to be ceilined at like 13% or 30% for all variants (as opposed to 50% for IQ, with most o that from easy common variants).
Is it bad for editing? Well… maybe?
Editing is hard for IQ, under mutation-selection balance, because large (negative) effects get selected away quicker than small ones. So all that’s left is a ton of little bits of grit in the gears, to be edited away one by one, like picking up sand with tweezers.
But maybe that’s not true of personality?
The effect sizes could be relatively large, because the nonlinear effects are mostly invisible to selection.
And then for the purposes of editing, rather than prediction/selection, maybe the situation isn’t so dire.
We would only need to ‘set’ a few discrete combinations of genes appropriately to potentially get a large personality difference.
And in that case, we don’t need to pass a statistical-significance threshold.
(This is often the case when we pass from a naive NHST approach to a decision-relevant analysis.)
We might only need a reasonable posterior probability for each ‘setting’, and then we can edit a bunch of them, and get a large effect.
If we are wrong, then almost by definition, our edits will average out to no effect on personality.
Is this the case? I dunno. Discussion of the non-additive variants is usually done from the standard GWAS and behavioral genetics perspectives of either maximizing the variance explained of a PGS, or compartmentalizing between variance components.
Neither one directly addresses this question.
It seems like it wouldn’t be hard for a grad student or someone to dig into the existing literature and get some idea of what the implied distribution of effect sizes for personality is, and what the sample size requirements would be, and how that translates into the edit-count vs change curve.
Even if not used in humans, it’d be useful to understand the plasticity of personality, and could potentially be applied to, say, animal welfare in more rapidly adjusting animals to their conditions so they suffer less.
* This would be even more true of things like ‘taste’ or ‘creativity’, but if we can’t do gross personality traits like Extraversion, anything subtler is clearly off the table, no matter how much more important it will become.
A key consideration when selecting for latent mental traits is whether a common pathway model holds for the latent variable under selection. In an ideal common pathway model, all covariance between indicators is mediated by a single underlying construct. When this model fails, selecting for one trait can lead to unintended consequences. For instance, attempting to select for Openness might not reliably increase open-mindedness or creativity. Instead, such selection could inadvertently target specific parts of whatever went into the measurement, like liberal political values, aesthetic preferences, or being the kind of person with an inflated view of yourself. Unlike personality factors, which demonstrate mixed evidence for a coherent latent structure, IQ has been more consistently modeled using a common pathway approach.
TL;DR: Selecting for IQ good. Will get smarter children. Selecting for personality risky. Might get child that likes filling in the rightmost bubble on tests.
I think we would probably want to select much less hard on personality than on IQ. For virtually any one of the big five personality traits there is obviously a downside to becoming too extreme. For IQ that’s not obviously the case.
You’re missing the point. While I agree that we don’t want to select too hard for personality traits, the bigger problem is that we’re not able to robustly select for personality traits the way we’re able to select for IQ. If you try to select for Extraversion, you may end up selecting for people particularly prone to social desirability bias. This isn’t a Goodhart thing; the way our personality tests are currently constructed means that all the personality traits have fairly large correlations with social desirability, which is not what you want to select for. Also, the specific personality traits our tests measure don’t seem real in the same way IQ is real (that’s what testing for a common pathway model tells us).
The key distinction is that IQ demonstrates a robust common pathway structure—different cognitive tests correlate with each other because they’re all tapping into a genuine underlying cognitive ability. In contrast, personality measures often fail common pathway tests, suggesting that the correlations between different personality indicators might arise from multiple distinct sources rather than a single underlying trait. This makes genetic selection for personality traits fundamentally different from selecting for IQ—not just in terms of optimal selection strength, but in terms of whether we can meaningfully select for the intended trait at all.
The problem isn’t just about avoiding extreme personalities—it’s about whether our measurement and selection tools can reliably target the personality constructs we actually care about, rather than accidentally selecting for measurement artifacts or superficial behavioral patterns that don’t reflect genuine underlying traits.
I don’t really see any reason why you couldn’t just do a setwise comparison and check which of the extraversion increasing variants (or combinations of variants if epistatic effects dominate) increase the trait without increasing conformity to social desirability.
In fact if you just select for disagreeableness as well that might just fix the problem.
The key distinction is that IQ demonstrates a robust common pathway structure—different cognitive tests correlate with each other because they’re all tapping into a genuine underlying cognitive ability. In contrast, personality measures often fail common pathway tests, suggesting that the correlations between different personality indicators might arise from multiple distinct sources rather than a single underlying trait. This makes genetic selection for personality traits fundamentally different from selecting for IQ—not just in terms of optimal selection strength, but in terms of whether we can meaningfully select for the intended trait at all.
There is such a thing as a “general factor of personality”. I’m not sure how you can say that the thing IQ is measuring is real while the general factor of personality isn’t.
Sure big 5 aren’t the end-all be-all of personality but they’re decent and there’s no reason you couldn’t invent a more robust measure for the purpose of selection.
FWIW I agree that personality traits are important. A clear case is that you’d want to avoid combining very low conscientiousness with very high disagreeability, because that’s something like antisocial personality disorder or something. But, you don’t want to just select against those traits, because weaker forms might be associated with creative achievement. However, IQ, and more broadly cognitive capacity / problem-solving ability, will not become much less valuable soon.
Using LLMs is an intellectual skill. I would be astonished if IQ was not pretty helpful for that.
For editing adults, it is a good point that lots of them might find a personality tweak very useful, and e.g. if it gave them a big bump in motivation, that would likely be worth more than, say, 5-10 IQ points. An adult is in a good position to tell what’s the delta between their current personality and what might be ideal for their situation.
Deliberately tweaking personality does raise some “dual use” issues. Is there a set of genes that makes someone very unlikely to leave their abusive cult, or makes them loyal obedient citizens to their tyrannical government, or makes them never join the hated outgroup political party? I would be pretty on board with a norm of not doing research into that. Basic “Are there genes that cause personality disorders that ~everyone agrees are bad?” is fine; “motivation” as one undifferentiated category seems fine; Big 5 traits … have some known correlations with political alignment, which brings it into territory I’m not very comfortable with, but if it goes no farther that it might be fine.
Using LLMs is an intellectual skill. I would be astonished if IQ was not pretty helpful for that.
I don’t think it is all that helpful, adjusting for the tasks that people do, after years of watching people use LLMs. Smart people are often too arrogant and proud, and know too much. “It’s just a pile of matrix multiplications and a very complicated if function and therefore can’t do anything” is the sort of thing only a smart person can convince themselves, where a dumb person thinking “I ask the smart little man in the magic box my questions and I get answers” is getting more out of it. (The benefits of LLM usage is also highly context dependent: so you’ll find studies showing LLMs assist most the highest performers, but also ones showing it helps most the lowest.) Like in 2020, the more you knew about AI, the dumber your uses of GPT-3 were, because you ‘knew’ that it couldn’t do anything and you had to hold its hand to do everything and you had to phrase everything in baby talk etc. You had to unlearn everything you knew and anthropomorphize it to meaningfully explore prompting. This requires a certain flexibility of mind that has less to do with IQ and more to do with, say, schizophrenia -the people in Cyborgism, who do the most interesting things with LLMs, are not extraordinarily intelligent. They are, however, kinda weird and crazy.
Smart people are often too arrogant and proud, and know too much.
I thought that might be the case. If you looked at GPT-3 or 3.5, then, the higher the quality of your own work, the less helpful (and, potentially, the more destructive and disruptive) it is to substitute in the LLM’s work; so higher IQ in these early years of LLMs may correlate with dismissing them and having little experience using them.
But this is a temporary effect. Those who initially dismissed LLMs will eventually come round; and, among younger people, especially as LLMs get better, higher-IQ people who try LLMs for the first time will find them worthwhile and use them just as much as their peers. And if you have two people who have both spent N hours using the same LLM for the same purposes, higher IQ will help, all else being equal.
Of course, if you’re simply reporting a correlation you observe, then all else is likely not equal. Please think about selection effects, such as those described here.
I think it is very unclear that we want fewer ‘maladaptive’ people in the world in the sense that we can measure with personality traits such as the big five.
Would reducing the number of outliers in neuroticism also reduce the number of people emotionally invested in X-risk? The downstream results of such a modification do not seem to be clear.
It seems like producing a more homogeneous personality distribution would also reduce the robustness of society.
The core weirdness of this post to me is that the first conditioning on LLM/AI does all the IQ tasks, and humans are not involved in auditing that system in a case where high IQ is important. Personally, I feel like assuming that AI does all the IQ tasks is a moot case. We are pets or dead in that case.
The reason why we want editing for IQ is because we want something unusual like “+1SD above von Neumann”, I’m not sure we want something beyond statistical range of human personality traits. Why do not select outliers from population using personality testing and give them high intelligence?
I’m not sure we want something beyond statistical range of human personality traits
Obviously it is untrue that editing is useless if it ‘only’ gives you a von Neumann. Similarly for personality. We don’t reify sets of personality traits as much as IQ, which is more obvious, but there are definitely many people who achieved remarkable things through force of personality. (Think figures like Lee Kuan Yew or Napoleon or Elon Musk comes to mind as an example: they were smart, and lucky, and made good choices, but there is clearly still a lot left over to explain.) And because personality is many things and there seems to be a pipeline model of output, you quickly get very few people at the tails who assemble all the right components. (Gignac has a paper making this point more explicitly.)
Why do not select outliers from population using personality testing and give them high intelligence?
You’re acting like it’s uncontroversially true that you have unlimited edits and can change any property at any time in development. I don’t think that is the case.* There is going to be an editing budget and limits to editing. One might as well ask the opposite question: why not select intelligence outliers from the population and give them high personality traits? (Well, to know you don’t want to do that, you would have to have some idea of how well personality editing would work—which we don’t. That’s my point!)
* Actually, the whole adult thing is a bit of a red herring. I believe even OP has largely abandoned the idea of adult editing and gone back to embryo-based approaches...? This is just a convenient place to drop my comment about uses of editing which will matter more over the next 30 years.
Thinking about this post these days… Editing discussions might be better focused on personality: is that feasible, statistically? It seems like it might be, but we don’t know.
The focus on IQ in older discussions strikes me as increasingly misguided. It’s a good trait to start with, because it is important, well-studied, and turns out to be highly tractable, but it should only be a stepping stone to more useful approaches like index scores. There’s also another reason to treat IQ as just a toy example: we are now well into the deep learning revolution, and it’s come so far, and there’s so much scope for scaling & improvement, that it seems like IQ is plummeting in value each year. Already it feels like people get less or more out of AI based on their flexibility and willingness to experiment or to step back & delegate & finish missing pieces. When the LLMs can do all the smart things you ask them to do, the value becomes in asking for good ones, and making good use of them. The future doesn’t seem like it’ll be kind to neurotic, eager-to-please types, but good to those who are unafraid to have clear visions or know what they want, finish projects and—pace Amdahl’s law—make themselves as little of a rate-limiting step as possible.* That is, if you ask, what would be good to edit for, beyond narrow health traits, it seems like the answer is not (just) IQ but non-cognitive traits like Openness or Conscientiousness or (dis?)Agreeableness. So, you should probably start skating towards that puck yesterday.
Problem is, the personality GWASes, last I checked several years ago, were terrible. The PGS % is ~0%, and the GCTAs or LDSC (common SNP heritabilities) not much better, from UK Biobank in particular. The measurements of Big Five seem normal, and the sample sizes seem good, so it doesn’t seem like a mere statistical power or measurement error issue. What gives? GREML-KIN suggests that a good chunk of it may be rare variants, but the situation is still not great:
This is despite personality traits often clearly being highly heritable, easily 50% (and Neuroticism/Extraversion might even be the best case scenarios for Big Five here—Openness might pick up mostly IQ/EDU, and C/A a wash). And this is consistent with some evolutionary scenarios like frequency-dependent selection, where personality is seen as a kind of knob on various things like risktaking, where there cannot be any kind of universal a priori optimal level of risktaking. So simple additive variants will tend to systematically push organisms ‘too high (low)’ and be maladaptive, and fixate, leaving only weirder stuff which has less average effect, like dominance or epistasis. Which is very bad because from what I recall of formal modeling of the statistical power of GWASes for detecting & estimating specific nonlinear variants, the situation is dire. Estimating combinatorially many interactions across millions of common & rare variants, if we want to maintain the standard genome-wide false positive rate, means that we will have to adjust for all the tests/comparisons we’ll run, and that is going to push the sample sizes up from the current feasible millions to possibly hundreds of millions or even billions. (Andrew Gelman’s rule of thumb is that an interaction requires 16x more data, and that’s for the simplest easiest case, so...)
So, this looks pretty bad for any kind of selection process. Rare variants are more expensive to WGS/impute per embryo, they are far more data-expensive to estimate, the sheer rareness means even when estimated they are not useful for selection, and then they turn out to be ceilined at like 13% or 30% for all variants (as opposed to 50% for IQ, with most o that from easy common variants).
Is it bad for editing? Well… maybe?
Editing is hard for IQ, under mutation-selection balance, because large (negative) effects get selected away quicker than small ones. So all that’s left is a ton of little bits of grit in the gears, to be edited away one by one, like picking up sand with tweezers.
But maybe that’s not true of personality? The effect sizes could be relatively large, because the nonlinear effects are mostly invisible to selection. And then for the purposes of editing, rather than prediction/selection, maybe the situation isn’t so dire. We would only need to ‘set’ a few discrete combinations of genes appropriately to potentially get a large personality difference.
And in that case, we don’t need to pass a statistical-significance threshold. (This is often the case when we pass from a naive NHST approach to a decision-relevant analysis.) We might only need a reasonable posterior probability for each ‘setting’, and then we can edit a bunch of them, and get a large effect. If we are wrong, then almost by definition, our edits will average out to no effect on personality.
Is this the case? I dunno. Discussion of the non-additive variants is usually done from the standard GWAS and behavioral genetics perspectives of either maximizing the variance explained of a PGS, or compartmentalizing between variance components. Neither one directly addresses this question.
It seems like it wouldn’t be hard for a grad student or someone to dig into the existing literature and get some idea of what the implied distribution of effect sizes for personality is, and what the sample size requirements would be, and how that translates into the edit-count vs change curve. Even if not used in humans, it’d be useful to understand the plasticity of personality, and could potentially be applied to, say, animal welfare in more rapidly adjusting animals to their conditions so they suffer less.
* This would be even more true of things like ‘taste’ or ‘creativity’, but if we can’t do gross personality traits like Extraversion, anything subtler is clearly off the table, no matter how much more important it will become.
A key consideration when selecting for latent mental traits is whether a common pathway model holds for the latent variable under selection. In an ideal common pathway model, all covariance between indicators is mediated by a single underlying construct.
When this model fails, selecting for one trait can lead to unintended consequences. For instance, attempting to select for Openness might not reliably increase open-mindedness or creativity. Instead, such selection could inadvertently target specific parts of whatever went into the measurement, like liberal political values, aesthetic preferences, or being the kind of person with an inflated view of yourself.
Unlike personality factors, which demonstrate mixed evidence for a coherent latent structure, IQ has been more consistently modeled using a common pathway approach.
TL;DR: Selecting for IQ good. Will get smarter children. Selecting for personality risky. Might get child that likes filling in the rightmost bubble on tests.
Sources:
https://psycnet.apa.org/record/2013-24385-001
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7839945/
I think we would probably want to select much less hard on personality than on IQ. For virtually any one of the big five personality traits there is obviously a downside to becoming too extreme. For IQ that’s not obviously the case.
You’re missing the point. While I agree that we don’t want to select too hard for personality traits, the bigger problem is that we’re not able to robustly select for personality traits the way we’re able to select for IQ. If you try to select for Extraversion, you may end up selecting for people particularly prone to social desirability bias. This isn’t a Goodhart thing; the way our personality tests are currently constructed means that all the personality traits have fairly large correlations with social desirability, which is not what you want to select for. Also, the specific personality traits our tests measure don’t seem real in the same way IQ is real (that’s what testing for a common pathway model tells us).
The key distinction is that IQ demonstrates a robust common pathway structure—different cognitive tests correlate with each other because they’re all tapping into a genuine underlying cognitive ability. In contrast, personality measures often fail common pathway tests, suggesting that the correlations between different personality indicators might arise from multiple distinct sources rather than a single underlying trait. This makes genetic selection for personality traits fundamentally different from selecting for IQ—not just in terms of optimal selection strength, but in terms of whether we can meaningfully select for the intended trait at all.
The problem isn’t just about avoiding extreme personalities—it’s about whether our measurement and selection tools can reliably target the personality constructs we actually care about, rather than accidentally selecting for measurement artifacts or superficial behavioral patterns that don’t reflect genuine underlying traits.
I don’t really see any reason why you couldn’t just do a setwise comparison and check which of the extraversion increasing variants (or combinations of variants if epistatic effects dominate) increase the trait without increasing conformity to social desirability.
In fact if you just select for disagreeableness as well that might just fix the problem.
There is such a thing as a “general factor of personality”. I’m not sure how you can say that the thing IQ is measuring is real while the general factor of personality isn’t.
Sure big 5 aren’t the end-all be-all of personality but they’re decent and there’s no reason you couldn’t invent a more robust measure for the purpose of selection.
FWIW I agree that personality traits are important. A clear case is that you’d want to avoid combining very low conscientiousness with very high disagreeability, because that’s something like antisocial personality disorder or something. But, you don’t want to just select against those traits, because weaker forms might be associated with creative achievement. However, IQ, and more broadly cognitive capacity / problem-solving ability, will not become much less valuable soon.
Using LLMs is an intellectual skill. I would be astonished if IQ was not pretty helpful for that.
For editing adults, it is a good point that lots of them might find a personality tweak very useful, and e.g. if it gave them a big bump in motivation, that would likely be worth more than, say, 5-10 IQ points. An adult is in a good position to tell what’s the delta between their current personality and what might be ideal for their situation.
Deliberately tweaking personality does raise some “dual use” issues. Is there a set of genes that makes someone very unlikely to leave their abusive cult, or makes them loyal obedient citizens to their tyrannical government, or makes them never join the hated outgroup political party? I would be pretty on board with a norm of not doing research into that. Basic “Are there genes that cause personality disorders that ~everyone agrees are bad?” is fine; “motivation” as one undifferentiated category seems fine; Big 5 traits … have some known correlations with political alignment, which brings it into territory I’m not very comfortable with, but if it goes no farther that it might be fine.
I don’t think it is all that helpful, adjusting for the tasks that people do, after years of watching people use LLMs. Smart people are often too arrogant and proud, and know too much. “It’s just a pile of matrix multiplications and a very complicated
if
function and therefore can’t do anything” is the sort of thing only a smart person can convince themselves, where a dumb person thinking “I ask the smart little man in the magic box my questions and I get answers” is getting more out of it. (The benefits of LLM usage is also highly context dependent: so you’ll find studies showing LLMs assist most the highest performers, but also ones showing it helps most the lowest.) Like in 2020, the more you knew about AI, the dumber your uses of GPT-3 were, because you ‘knew’ that it couldn’t do anything and you had to hold its hand to do everything and you had to phrase everything in baby talk etc. You had to unlearn everything you knew and anthropomorphize it to meaningfully explore prompting. This requires a certain flexibility of mind that has less to do with IQ and more to do with, say, schizophrenia -the people in Cyborgism, who do the most interesting things with LLMs, are not extraordinarily intelligent. They are, however, kinda weird and crazy.I thought that might be the case. If you looked at GPT-3 or 3.5, then, the higher the quality of your own work, the less helpful (and, potentially, the more destructive and disruptive) it is to substitute in the LLM’s work; so higher IQ in these early years of LLMs may correlate with dismissing them and having little experience using them.
But this is a temporary effect. Those who initially dismissed LLMs will eventually come round; and, among younger people, especially as LLMs get better, higher-IQ people who try LLMs for the first time will find them worthwhile and use them just as much as their peers. And if you have two people who have both spent N hours using the same LLM for the same purposes, higher IQ will help, all else being equal.
Of course, if you’re simply reporting a correlation you observe, then all else is likely not equal. Please think about selection effects, such as those described here.
I think it is very unclear that we want fewer ‘maladaptive’ people in the world in the sense that we can measure with personality traits such as the big five.
Would reducing the number of outliers in neuroticism also reduce the number of people emotionally invested in X-risk? The downstream results of such a modification do not seem to be clear.
It seems like producing a more homogeneous personality distribution would also reduce the robustness of society.
The core weirdness of this post to me is that the first conditioning on LLM/AI does all the IQ tasks, and humans are not involved in auditing that system in a case where high IQ is important. Personally, I feel like assuming that AI does all the IQ tasks is a moot case. We are pets or dead in that case.
The reason why we want editing for IQ is because we want something unusual like “+1SD above von Neumann”, I’m not sure we want something beyond statistical range of human personality traits. Why do not select outliers from population using personality testing and give them high intelligence?
Obviously it is untrue that editing is useless if it ‘only’ gives you a von Neumann. Similarly for personality. We don’t reify sets of personality traits as much as IQ, which is more obvious, but there are definitely many people who achieved remarkable things through force of personality. (Think figures like Lee Kuan Yew or Napoleon or Elon Musk comes to mind as an example: they were smart, and lucky, and made good choices, but there is clearly still a lot left over to explain.) And because personality is many things and there seems to be a pipeline model of output, you quickly get very few people at the tails who assemble all the right components. (Gignac has a paper making this point more explicitly.)
You’re acting like it’s uncontroversially true that you have unlimited edits and can change any property at any time in development. I don’t think that is the case.* There is going to be an editing budget and limits to editing. One might as well ask the opposite question: why not select intelligence outliers from the population and give them high personality traits? (Well, to know you don’t want to do that, you would have to have some idea of how well personality editing would work—which we don’t. That’s my point!)
* Actually, the whole adult thing is a bit of a red herring. I believe even OP has largely abandoned the idea of adult editing and gone back to embryo-based approaches...? This is just a convenient place to drop my comment about uses of editing which will matter more over the next 30 years.