The first “problem” with this plan is that you don’t get an AGI this way. You get an unintelligent robot that steers towards diamonds. If you keep trying to have the training be about diamonds, it never particularly learns to think. When you compromise and start putting it in environments where it needs to be able to think to succeed, then your new reward-signals end up promoting all sorts of internal goals that aren’t particularly about diamond, but are instead about understanding the world and/or making efficient use of internal memory and/or suchlike.
Will humans stop having children as they get smarter and more powerful because they inadvertently gathered a bunch of utility function quirks like “curiosity”?
Separately, insofar as you were able to get some sort of internalized diamond-ish goal, if you’re not really careful then you end up getting lots of subgoals such as ones about glittering things, and stones cut in stylized ways, and proximity to diamond rather than presence of diamond, and so on and so forth.
Will humans stop having children in the limit of intelligence and power, because we have all of these sub-shards like “make sure your children are safe”, and “have lots of sex” instead of one big “spread your genes” one? Do they stop doing that when you introduce them to superstimulants via the internet or give them access to contraceptives that decouple sex from reproduction?
What the AI’s shards become under reflection is very sensitive to the ways it resolves internal conflicts. For instance, in humans, many of our values trigger only in a narrow range of situations (e.g., people care about people enough that they probably can’t psychologically murder a hundred thousand people in a row, but they can still drop a nuke), and whether we resolve that as “I should care about people even if they’re not right in front of me” or “I shouldn’t care about people any more than I would if the scenario was abstracted” depends quite a bit on the ways that reflection resolves inconsistencies.
The reason human morality is contextual and self contradictory, and we have to resolve a bunch of internal conflicts at the limit of reflectivity, is because we weren’t actually trained to care about other people, the subgoal if any was “maintain the trustworthiness indicators of the people we’re most likely to be able to cooperate with”. So your examples are very cheesy and not at all convincing.
Do humans decide to kill or sterilize their children at higher INT and WIS scores if you change some abstract metacognition parameters that affect how they resolve (deliberately engineered) inconsistencies?
Number of children in our world is negatively correlated with educational achievement and income, often in ways that look like serving other utility function quirks at the expense of children (as the ability to indulge those quirks with scarce effort improved faster with technology faster than those more closely tied to children), e.g. consumption spending instead of children, sex with contraception, pets instead of babies. Climate/ecological or philosophical antinatalism is also more popular the same regions and social circles. Philosophical support for abortion and medical procedures that increase happiness at the expense of sterilizing one’s children also increases with education and in developed countries. Some humans misgeneralize their nurturing/anti-suffering impulses to favor universal sterilization or death of all living things including their own lineages and themselves.
Sub-replacement fertility is not 0 children, but it does trend to 0 descendants over multiple generations.
Many of these changes are partially mediated through breaking attachment to fertility-supporting religions that conduce to fertility and have not been robust to modernity, or new technological options for unbundling previously bundled features.
Human morality was optimized in a context of limited individual power, but that kind of concern can and does dominate societies because it contributes to collective action where CDT selfishness sits out, and drives attention to novel/indirect influence. Similarly an AI takeover can be dominated by whatever motivations contribute to collective action that drives the takeover in the first place, or generalizes to those novel situations best.
The party line of MIRI is not that a super intelligence, without extreme measures, would waste most of the universe’s EV on frivolous nonsense. The party line is that there is a 99+% chance that an AI, even if trained specifically to care about humans, would not end up caring about humans at all, and instead turn the universe into uniform squiggles. That’s the claim I find unsubstantiated by most concrete concerns they have, and which seems suspiciously disanalogous to the one natural example we have. 99% of people in first world countries are not forgoing pregnancy for educational attainment.
It’d of course still be extremely terrible, and maybe even more terrible, if what I think is going to happen happens! But it doesn’t look like all matter becoming squiggles.
I wasn’t arguing for “99+% chance that an AI, even if trained specifically to care about humans, would not end up caring about humans at all” just addressing the questions about humans in the limit of intelligence and power in the comment I replied to. It does seem to me that there is substantial chance that humans eventually do stop having human children in the limit of intelligence and power.
I wasn’t arguing for “99+% chance that an AI, even if trained specifically to care about humans, would not end up caring about humans at all” just addressing the questions about humans in the limit of intelligence and power in the comment I replied to.
Tru
It does seem to me that there is substantial chance that humans eventually do stop having human children in the limit of intelligence and power.
A uniform fertility below 2.1 means extinction, yes, but in no country is the fertility rate uniformly below 2.1. Instead, some humans decide they want lots of children despite the existence of contraception and educational opportunity, and others do not. It seems to me that a substantial proportion of humans would stop having children in the limit of intelligence and power. It also seems to me like a substantial number of humans continue (and would continue) to have such children as if they value it for its own sake.
This suggests that the problems Nate is highlighting, while real, are not sufficient to guarantee complete failure—even when the training process is not being designed with those problems in mind, and there are no attempts at iterated amplification whatsoever. This nuance is important because it affects how far we should think a naive SGD RL approach is from limited “1% success”, and whether or not simple modifications are likely to greatly increase survival odds.
Will humans stop having children as they get smarter and more powerful because they inadvertently gathered a bunch of utility function quirks like “curiosity”?
Will humans stop having children in the limit of intelligence and power, because we have all of these sub-shards like “make sure your children are safe”, and “have lots of sex” instead of one big “spread your genes” one? Do they stop doing that when you introduce them to superstimulants via the internet or give them access to contraceptives that decouple sex from reproduction?
The reason human morality is contextual and self contradictory, and we have to resolve a bunch of internal conflicts at the limit of reflectivity, is because we weren’t actually trained to care about other people, the subgoal if any was “maintain the trustworthiness indicators of the people we’re most likely to be able to cooperate with”. So your examples are very cheesy and not at all convincing.
Do humans decide to kill or sterilize their children at higher INT and WIS scores if you change some abstract metacognition parameters that affect how they resolve (deliberately engineered) inconsistencies?
Number of children in our world is negatively correlated with educational achievement and income, often in ways that look like serving other utility function quirks at the expense of children (as the ability to indulge those quirks with scarce effort improved faster with technology faster than those more closely tied to children), e.g. consumption spending instead of children, sex with contraception, pets instead of babies. Climate/ecological or philosophical antinatalism is also more popular the same regions and social circles. Philosophical support for abortion and medical procedures that increase happiness at the expense of sterilizing one’s children also increases with education and in developed countries. Some humans misgeneralize their nurturing/anti-suffering impulses to favor universal sterilization or death of all living things including their own lineages and themselves.
Sub-replacement fertility is not 0 children, but it does trend to 0 descendants over multiple generations.
Many of these changes are partially mediated through breaking attachment to fertility-supporting religions that conduce to fertility and have not been robust to modernity, or new technological options for unbundling previously bundled features.
Human morality was optimized in a context of limited individual power, but that kind of concern can and does dominate societies because it contributes to collective action where CDT selfishness sits out, and drives attention to novel/indirect influence. Similarly an AI takeover can be dominated by whatever motivations contribute to collective action that drives the takeover in the first place, or generalizes to those novel situations best.
The party line of MIRI is not that a super intelligence, without extreme measures, would waste most of the universe’s EV on frivolous nonsense. The party line is that there is a 99+% chance that an AI, even if trained specifically to care about humans, would not end up caring about humans at all, and instead turn the universe into uniform squiggles. That’s the claim I find unsubstantiated by most concrete concerns they have, and which seems suspiciously disanalogous to the one natural example we have. 99% of people in first world countries are not forgoing pregnancy for educational attainment.
It’d of course still be extremely terrible, and maybe even more terrible, if what I think is going to happen happens! But it doesn’t look like all matter becoming squiggles.
I wasn’t arguing for “99+% chance that an AI, even if trained specifically to care about humans, would not end up caring about humans at all” just addressing the questions about humans in the limit of intelligence and power in the comment I replied to. It does seem to me that there is substantial chance that humans eventually do stop having human children in the limit of intelligence and power.
Tru
A uniform fertility below 2.1 means extinction, yes, but in no country is the fertility rate uniformly below 2.1. Instead, some humans decide they want lots of children despite the existence of contraception and educational opportunity, and others do not. It seems to me that a substantial proportion of humans would stop having children in the limit of intelligence and power. It also seems to me like a substantial number of humans continue (and would continue) to have such children as if they value it for its own sake.
This suggests that the problems Nate is highlighting, while real, are not sufficient to guarantee complete failure—even when the training process is not being designed with those problems in mind, and there are no attempts at iterated amplification whatsoever. This nuance is important because it affects how far we should think a naive SGD RL approach is from limited “1% success”, and whether or not simple modifications are likely to greatly increase survival odds.