Value is Fragile
If I had to pick a single statement that relies on more Overcoming Bias content I’ve written than any other, that statement would be:
Any Future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals, will contain almost nothing of worth.
“Well,” says the one, “maybe according to your provincial human values, you wouldn’t like it. But I can easily imagine a galactic civilization full of agents who are nothing like you, yet find great value and interest in their own goals. And that’s fine by me. I’m not so bigoted as you are. Let the Future go its own way, without trying to bind it forever to the laughably primitive prejudices of a pack of four-limbed Squishy Things—”
My friend, I have no problem with the thought of a galactic civilization vastly unlike our own… full of strange beings who look nothing like me even in their own imaginations… pursuing pleasures and experiences I can’t begin to empathize with… trading in a marketplace of unimaginable goods… allying to pursue incomprehensible objectives… people whose life-stories I could never understand.
That’s what the Future looks like if things go right.
If the chain of inheritance from human (meta)morals is broken, the Future does not look like this. It does not end up magically, delightfully incomprehensible.
With very high probability, it ends up looking dull. Pointless. Something whose loss you wouldn’t mourn.
Seeing this as obvious, is what requires that immense amount of background explanation.
And I’m not going to iterate through all the points and winding pathways of argument here, because that would take us back through 75% of my Overcoming Bias posts. Except to remark on how many different things must be known to constrain the final answer.
Consider the incredibly important human value of “boredom”—our desire not to do “the same thing” over and over and over again. You can imagine a mind that contained almost the whole specification of human value, almost all the morals and metamorals, but left out just this one thing -
- and so it spent until the end of time, and until the farthest reaches of its light cone, replaying a single highly optimized experience, over and over and over again.
Or imagine a mind that contained almost the whole specification of which sort of feelings humans most enjoy—but not the idea that those feelings had important external referents. So that the mind just went around feeling like it had made an important discovery, feeling it had found the perfect lover, feeling it had helped a friend, but not actually doing any of those things—having become its own experience machine. And if the mind pursued those feelings and their referents, it would be a good future and true; but because this one dimension of value was left out, the future became something dull. Boring and repetitive, because although this mind felt that it was encountering experiences of incredible novelty, this feeling was in no wise true.
Or the converse problem—an agent that contains all the aspects of human value, except the valuation of subjective experience. So that the result is a nonsentient optimizer that goes around making genuine discoveries, but the discoveries are not savored and enjoyed, because there is no one there to do so. This, I admit, I don’t quite know to be possible. Consciousness does still confuse me to some extent. But a universe with no one to bear witness to it, might as well not be.
Value isn’t just complicated, it’s fragile. There is more than one dimension of human value, where if just that one thing is lost, the Future becomes null. A single blow and all value shatters. Not every single blow will shatter all value—but more than one possible “single blow” will do so.
And then there are the long defenses of this proposition, which relies on 75% of my Overcoming Bias posts, so that it would be more than one day’s work to summarize all of it. Maybe some other week. There’s so many branches I’ve seen that discussion tree go down.
After all—a mind shouldn’t just go around having the same experience over and over and over again. Surely no superintelligence would be so grossly mistaken about the correct action?
Why would any supermind want something so inherently worthless as the feeling of discovery without any real discoveries? Even if that were its utility function, wouldn’t it just notice that its utility function was wrong, and rewrite it? It’s got free will, right?
Surely, at least boredom has to be a universal value. It evolved in humans because it’s valuable, right? So any mind that doesn’t share our dislike of repetition, will fail to thrive in the universe and be eliminated...
If you are familiar with the difference between instrumental values and terminal values, and familiar with the stupidity of natural selection, and you understand how this stupidity manifests in the difference between executing adaptations versus maximizing fitness, and you know this turned instrumental subgoals of reproduction into decontextualized unconditional emotions...
...and you’re familiar with how the tradeoff between exploration and exploitation works in Artificial Intelligence...
...then you might be able to see that the human form of boredom that demands a steady trickle of novelty for its own sake, isn’t a grand universal, but just a particular algorithm that evolution coughed out into us. And you might be able to see how the vast majority of possible expected utility maximizers, would only engage in just so much efficient exploration, and spend most of its time exploiting the best alternative found so far, over and over and over.
That’s a lot of background knowledge, though.
And so on and so on and so on through 75% of my posts on Overcoming Bias, and many chains of fallacy and counter-explanation. Some week I may try to write up the whole diagram. But for now I’m going to assume that you’ve read the arguments, and just deliver the conclusion:
We can’t relax our grip on the future—let go of the steering wheel—and still end up with anything of value.
And those who think we can -
- they’re trying to be cosmopolitan. I understand that. I read those same science fiction books as a kid: The provincial villains who enslave aliens for the crime of not looking just like humans. The provincial villains who enslave helpless AIs in durance vile on the assumption that silicon can’t be sentient. And the cosmopolitan heroes who understand that minds don’t have to be just like us to be embraced as valuable -
I read those books. I once believed them. But the beauty that jumps out of one box, is not jumping out of all boxes. (This being the moral of the sequence on Lawful Creativity.) If you leave behind all order, what is left is not the perfect answer, what is left is perfect noise. Sometimes you have to abandon an old design rule to build a better mousetrap, but that’s not the same as giving up all design rules and collecting wood shavings into a heap, with every pattern of wood as good as any other. The old rule is always abandoned at the behest of some higher rule, some higher criterion of value that governs.
If you loose the grip of human morals and metamorals—the result is not mysterious and alien and beautiful by the standards of human value. It is moral noise, a universe tiled with paperclips. To change away from human morals in the direction of improvement rather than entropy, requires a criterion of improvement; and that criterion would be physically represented in our brains, and our brains alone.
Relax the grip of human value upon the universe, and it will end up seriously valueless. Not, strange and alien and wonderful, shocking and terrifying and beautiful beyond all human imagination. Just, tiled with paperclips.
It’s only some humans, you see, who have this idea of embracing manifold varieties of mind—of wanting the Future to be something greater than the past—of being not bound to our past selves—of trying to change and move forward.
A paperclip maximizer just chooses whichever action leads to the greatest number of paperclips.
No free lunch. You want a wonderful and mysterious universe? That’s your value. You work to create that value. Let that value exert its force through you who represents it, let it make decisions in you to shape the future. And maybe you shall indeed obtain a wonderful and mysterious universe.
No free lunch. Valuable things appear because a goal system that values them takes action to create them. Paperclips don’t materialize from nowhere for a paperclip maximizer. And a wonderfully alien and mysterious Future will not materialize from nowhere for us humans, if our values that prefer it are physically obliterated—or even disturbed in the wrong dimension. Then there is nothing left in the universe that works to make the universe valuable.
You do have values, even when you’re trying to be “cosmopolitan”, trying to display a properly virtuous appreciation of alien minds. Your values are then faded further into the invisible background—they are less obviously human. Your brain probably won’t even generate an alternative so awful that it would wake you up, make you say “No! Something went wrong!” even at your most cosmopolitan. E.g. “a nonsentient optimizer absorbs all matter in its future light cone and tiles the universe with paperclips”. You’ll just imagine strange alien worlds to appreciate.
Trying to be “cosmopolitan”—to be a citizen of the cosmos—just strips off a surface veneer of goals that seem obviously “human”.
But if you wouldn’t like the Future tiled over with paperclips, and you would prefer a civilization of...
...sentient beings...
...with enjoyable experiences...
...that aren’t the same experience over and over again...
...and are bound to something besides just being a sequence of internal pleasurable feelings...
...learning, discovering, freely choosing...
...well, I’ve just been through the posts on Fun Theory that went into some of the hidden details on those short English words.
Values that you might praise as cosmopolitan or universal or fundamental or obvious common sense, are represented in your brain just as much as those values that you might dismiss as merely human. Those values come of the long history of humanity, and the morally miraculous stupidity of evolution that created us. (And once I finally came to that realization, I felt less ashamed of values that seemed ‘provincial’ - but that’s another matter.)
These values do not emerge in all possible minds. They will not appear from nowhere to rebuke and revoke the utility function of an expected paperclip maximizer.
Touch too hard in the wrong dimension, and the physical representation of those values will shatter—and not come back, for there will be nothing left to want to bring it back.
And the referent of those values—a worthwhile universe—would no longer have any physical reason to come into being.
Let go of the steering wheel, and the Future crashes.
- Counterarguments to the basic AI x-risk case by 14 Oct 2022 13:00 UTC; 370 points) (
- My take on What We Owe the Future by 1 Sep 2022 18:07 UTC; 353 points) (EA Forum;
- Counterarguments to the basic AI risk case by 14 Oct 2022 20:30 UTC; 284 points) (EA Forum;
- Raising the Sanity Waterline by 12 Mar 2009 4:28 UTC; 239 points) (
- Moloch Hasn’t Won by 28 Dec 2019 16:30 UTC; 186 points) (
- Alignment By Default by 12 Aug 2020 18:54 UTC; 174 points) (
- Ironing Out the Squiggles by 29 Apr 2024 16:13 UTC; 153 points) (
- Deep atheism and AI risk by 4 Jan 2024 18:58 UTC; 149 points) (
- 2019 AI Alignment Literature Review and Charity Comparison by 19 Dec 2019 2:58 UTC; 147 points) (EA Forum;
- Morality is Awesome by 6 Jan 2013 15:21 UTC; 146 points) (
- Scope-sensitive ethics: capturing the core intuition motivating utilitarianism by 15 Jan 2021 16:22 UTC; 141 points) (EA Forum;
- Coherence arguments do not entail goal-directed behavior by 3 Dec 2018 3:26 UTC; 133 points) (
- 2019 AI Alignment Literature Review and Charity Comparison by 19 Dec 2019 3:00 UTC; 130 points) (
- An even deeper atheism by 11 Jan 2024 17:28 UTC; 125 points) (
- Compendium of problems with RLHF by 29 Jan 2023 11:40 UTC; 120 points) (
- The genie knows, but doesn’t care by 6 Sep 2013 6:42 UTC; 119 points) (
- A Case for the Least Forgiving Take On Alignment by 2 May 2023 21:34 UTC; 100 points) (
- Would we even want AI to solve all our problems? by 21 Apr 2023 18:04 UTC; 97 points) (
- The case for Doing Something Else (if Alignment is doomed) by 5 Apr 2022 17:52 UTC; 93 points) (
- But exactly how complex and fragile? by 3 Nov 2019 18:20 UTC; 87 points) (
- Implications of evidential cooperation in large worlds by 23 Aug 2023 0:43 UTC; 79 points) (EA Forum;
- Clarifying some key hypotheses in AI alignment by 15 Aug 2019 21:29 UTC; 79 points) (
- The Craft & The Community—A Post-Mortem & Resurrection by 2 Nov 2017 3:45 UTC; 77 points) (
- Value fragility and AI takeover by 5 Aug 2024 21:28 UTC; 76 points) (
- Cause prioritization for downside-focused value systems by 31 Jan 2018 14:47 UTC; 75 points) (EA Forum;
- Will AI end everything? A guide to guessing | EAG Bay Area 23 by 25 May 2023 17:01 UTC; 74 points) (EA Forum;
- Joshua Achiam Public Statement Analysis by 10 Oct 2024 12:50 UTC; 73 points) (
- Research Agenda v0.9: Synthesising a human’s preferences into a utility function by 17 Jun 2019 17:46 UTC; 70 points) (
- Deep atheism and AI risk by 4 Jan 2024 18:58 UTC; 64 points) (EA Forum;
- my current outlook on AI risk mitigation by 3 Oct 2022 20:06 UTC; 63 points) (
- Taxonomy of AI-risk counterarguments by 16 Oct 2023 0:12 UTC; 62 points) (
- your terminal values are complex and not objective by 13 Mar 2023 13:34 UTC; 61 points) (
- Inferring Our Desires by 24 May 2011 5:33 UTC; 60 points) (
- 21 Jun 2020 20:03 UTC; 60 points) 's comment on The ground of optimization by (
- Movie Review: Megan by 23 Jan 2023 12:50 UTC; 60 points) (
- Review of ‘But exactly how complex and fragile?’ by 6 Jan 2021 18:39 UTC; 57 points) (
- a narrative explanation of the QACI alignment plan by 15 Feb 2023 3:28 UTC; 56 points) (
- What is ambitious value learning? by 1 Nov 2018 16:20 UTC; 55 points) (
- Perfect Competition by 29 Dec 2019 13:30 UTC; 54 points) (
- an Evangelion dialogue explaining the QACI alignment plan by 10 Jun 2023 3:28 UTC; 54 points) (
- AXRP Episode 22 - Shard Theory with Quintin Pope by 15 Jun 2023 19:00 UTC; 52 points) (
- Conclusion to the sequence on value learning by 3 Feb 2019 21:05 UTC; 51 points) (
- Not for the Sake of Pleasure Alone by 11 Jun 2011 23:21 UTC; 50 points) (
- AI #58: Stargate AGI by 4 Apr 2024 13:10 UTC; 49 points) (
- The Thing That I Protect by 7 Feb 2009 19:18 UTC; 46 points) (
- Defeating Goodhart and the “closest unblocked strategy” problem by 3 Apr 2019 14:46 UTC; 45 points) (
- The Catastrophic Convergence Conjecture by 14 Feb 2020 21:16 UTC; 45 points) (
- Alignment and Deep Learning by 17 Apr 2022 0:02 UTC; 44 points) (
- Play in Hard Mode by 26 Aug 2017 19:00 UTC; 43 points) (
- Otherness and control in the age of AGI by 2 Jan 2024 18:15 UTC; 43 points) (
- Using expected utility for Good(hart) by 27 Aug 2018 3:32 UTC; 42 points) (
- Conversation on forecasting with Vaniver and Ozzie Gooen by 30 Jul 2019 11:16 UTC; 41 points) (
- Rudimentary Categorization of Less Wrong Topics by 5 Sep 2015 7:32 UTC; 39 points) (
- Implications of evidential cooperation in large worlds by 23 Aug 2023 0:43 UTC; 39 points) (
- The Natural Abstraction Hypothesis: Implications and Evidence by 14 Dec 2021 23:14 UTC; 39 points) (
- Conversation on forecasting with Vaniver and Ozzie Gooen by 30 Jul 2019 11:16 UTC; 38 points) (EA Forum;
- Sam Altman and Ezra Klein on the AI Revolution by 27 Jun 2021 4:53 UTC; 38 points) (
- Otherness and control in the age of AGI by 2 Jan 2024 18:15 UTC; 37 points) (EA Forum;
- But exactly how complex and fragile? by 13 Dec 2019 7:05 UTC; 37 points) (EA Forum;
- Blood Is Thicker Than Water 🐬 by 28 Sep 2021 3:21 UTC; 37 points) (
- 6 Jan 2021 18:40 UTC; 37 points) 's comment on But exactly how complex and fragile? by (
- Unpicking Extinction by 9 Dec 2023 9:15 UTC; 35 points) (
- Human Minds are Fragile by 11 Feb 2015 18:40 UTC; 35 points) (
- Not for the Sake of Selfishness Alone by 2 Jul 2011 17:37 UTC; 34 points) (
- Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI? by 11 Jan 2024 12:56 UTC; 34 points) (
- Connecting Your Beliefs (a call for help) by 20 Nov 2011 5:18 UTC; 34 points) (
- Questions about AI that bother me by 31 Jan 2023 6:50 UTC; 33 points) (EA Forum;
- Engaging First Introductions to AI Risk by 19 Aug 2013 6:26 UTC; 31 points) (
- 14 Feb 2013 18:53 UTC; 31 points) 's comment on The Singularity Wars by (
- Reward splintering for AI design by 21 Jul 2021 16:13 UTC; 30 points) (
- Uncertainty versus fuzziness versus extrapolation desiderata by 30 May 2019 13:52 UTC; 29 points) (
- A theory of human values by 13 Mar 2019 15:22 UTC; 28 points) (
- What Should AI Owe To Us? Accountable and Aligned AI Systems via Contractualist AI Alignment by 8 Sep 2022 15:04 UTC; 26 points) (
- Why No Wireheading? by 18 Jun 2011 23:33 UTC; 26 points) (
- An even deeper atheism by 11 Jan 2024 17:28 UTC; 25 points) (EA Forum;
- Are we confident that superintelligent artificial intelligence disempowering humans would be bad? by 10 Jun 2023 9:24 UTC; 24 points) (EA Forum;
- When Goodharting is optimal: linear vs diminishing returns, unlikely vs likely, and other factors by 19 Dec 2019 13:55 UTC; 24 points) (
- Is “Strong Coherence” Anti-Natural? by 11 Apr 2023 6:22 UTC; 23 points) (
- 28 Sep 2020 5:13 UTC; 23 points) 's comment on Blog posts as epistemic trust builders by (
- Can we simulate human evolution to create a somewhat aligned AGI? by 28 Mar 2022 22:55 UTC; 21 points) (
- The low cost of human preference incoherence by 27 Mar 2019 11:58 UTC; 20 points) (
- Value uncertainty by 29 Jan 2020 20:16 UTC; 20 points) (
- Can we simulate human evolution to create a somewhat aligned AGI? by 29 Mar 2022 1:23 UTC; 19 points) (EA Forum;
- How sure are you that brain emulations would be conscious? by 26 Aug 2013 6:21 UTC; 19 points) (
- Compendium of problems with RLHF by 30 Jan 2023 8:48 UTC; 18 points) (EA Forum;
- a casual intro to AI doom and alignment by 1 Nov 2022 16:38 UTC; 18 points) (
- 29 Jul 2013 16:50 UTC; 18 points) 's comment on Why I’m Skeptical About Unproven Causes (And You Should Be Too) by (
- 19 Jul 2021 19:49 UTC; 18 points) 's comment on Is the argument that AI is an xrisk valid? by (
- 13 Apr 2012 16:54 UTC; 17 points) 's comment on Why I Moved from AI to Neuroscience, or: Uploading Worms by (
- Locality of goals by 22 Jun 2020 21:56 UTC; 16 points) (
- 21 Apr 2023 18:23 UTC; 16 points) 's comment on Would we even want AI to solve all our problems? by (
- Without a trajectory change, the development of AGI is likely to go badly by 29 May 2023 23:42 UTC; 16 points) (
- Alignment by Default by 5 Dec 2021 2:19 UTC; 15 points) (
- Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis by 1 Feb 2024 21:15 UTC; 15 points) (
- The deepest atheist: Sam Altman by 10 Oct 2024 3:27 UTC; 14 points) (
- Paying the corrigibility tax by 19 Apr 2023 1:57 UTC; 14 points) (
- 14 Oct 2023 6:33 UTC; 13 points) 's comment on Matthew_Barnett’s Quick takes by (EA Forum;
- Questions about AI that bother me by 5 Feb 2023 5:04 UTC; 13 points) (
- 15 Jan 2010 15:59 UTC; 13 points) 's comment on Back of the envelope calculations around the singularity. by (
- 24 Feb 2013 2:27 UTC; 13 points) 's comment on Discussion: Which futures are good enough? by (
- a casual intro to AI doom and alignment by 2 Nov 2022 9:42 UTC; 12 points) (EA Forum;
- To determine alignment difficulty, we need to know the absolute difficulty of alignment generalization by 14 Mar 2023 3:52 UTC; 12 points) (
- The STEM Attractor by 3 Jun 2022 22:21 UTC; 12 points) (
- [DISC] Are Values Robust? by 21 Dec 2022 1:00 UTC; 12 points) (
- What am I fighting for? by 20 Apr 2021 23:27 UTC; 12 points) (
- Fundamentally Fuzzy Concepts Can’t Have Crisp Definitions: Cooperation and Alignment vs Math and Physics by 21 Jul 2023 21:03 UTC; 12 points) (
- Miscellaneous First-Pass Alignment Thoughts by 21 Nov 2022 21:23 UTC; 12 points) (
- Permanent Societal Improvements by 6 Sep 2015 1:30 UTC; 11 points) (EA Forum;
- 9 Feb 2009 21:10 UTC; 11 points) 's comment on (Moral) Truth in Fiction? by (
- 24 Dec 2022 23:12 UTC; 10 points) 's comment on Read The Sequences by (EA Forum;
- We Have Not Been Invited to the Future: e/acc and the Narrowness of the Way Ahead by 17 Jul 2024 22:15 UTC; 10 points) (EA Forum;
- 13 Nov 2022 11:37 UTC; 10 points) 's comment on Jacy’s Quick takes by (EA Forum;
- 27 Jun 2011 17:44 UTC; 10 points) 's comment on asking an AI to make itself friendly by (
- 18 Jul 2020 7:11 UTC; 10 points) 's comment on My Dating Plan ala Geoffrey Miller by (
- Superintelligence 14: Motivation selection methods by 16 Dec 2014 2:00 UTC; 9 points) (
- Rationality Reading Group: Part V: Value Theory by 10 Mar 2016 1:11 UTC; 9 points) (
- 29 Nov 2009 4:30 UTC; 9 points) 's comment on A Nightmare for Eliezer by (
- The goal-guarding hypothesis (Section 2.3.1.1 of “Scheming AIs”) by 2 Dec 2023 15:20 UTC; 8 points) (
- Discussion: Which futures are good enough? by 24 Feb 2013 0:06 UTC; 8 points) (
- 5 Jan 2010 19:43 UTC; 7 points) 's comment on Open Thread: January 2010 by (
- [SEQ RERUN] Value is Fragile by 12 Feb 2013 1:32 UTC; 7 points) (
- 17 Feb 2020 13:42 UTC; 7 points) 's comment on The Catastrophic Convergence Conjecture by (
- The goal-guarding hypothesis (Section 2.3.1.1 of “Scheming AIs”) by 2 Dec 2023 15:20 UTC; 6 points) (EA Forum;
- 22 Feb 2021 2:10 UTC; 6 points) 's comment on Google’s Ethical AI team and AI Safety by (
- 2 Nov 2010 23:18 UTC; 6 points) 's comment on Levels of Intelligence by (
- 27 Jul 2009 9:42 UTC; 6 points) 's comment on The Level Above Mine by (
- 18 Jun 2009 19:07 UTC; 6 points) 's comment on Cascio in The Atlantic, more on cognitive enhancement as existential risk mitigation by (
- 9 May 2022 9:43 UTC; 5 points) 's comment on A tale of 2.5 orthogonality theses by (EA Forum;
- 4 Sep 2010 1:52 UTC; 5 points) 's comment on Should I believe what the SIAI claims? by (
- 7 Jun 2011 20:02 UTC; 5 points) 's comment on Existential Angst Factory by (
- 14 Jun 2009 23:30 UTC; 5 points) 's comment on Why safety is not safe by (
- 30 Dec 2011 18:00 UTC; 5 points) 's comment on Would a FAI reward us for helping create it? by (
- 4 Jan 2013 6:02 UTC; 5 points) 's comment on Open Thread, January 1-15, 2013 by (
- 5 Mar 2023 14:00 UTC; 5 points) 's comment on DragonGod’s Shortform by (
- [DISC] Are Values Robust? by 21 Dec 2022 1:13 UTC; 4 points) (EA Forum;
- 10 May 2022 9:21 UTC; 4 points) 's comment on A tale of 2.5 orthogonality theses by (EA Forum;
- 17 Jan 2010 0:17 UTC; 4 points) 's comment on The Preference Utilitarian’s Time Inconsistency Problem by (
- 14 Nov 2012 8:06 UTC; 4 points) 's comment on Please don’t vote because democracy is a local optimum by (
- 21 Apr 2023 5:25 UTC; 4 points) 's comment on Matthew Barnett’s Shortform by (
- 29 Jan 2020 8:05 UTC; 4 points) 's comment on Using vector fields to visualise preferences and make them consistent by (
- 2 Jun 2009 14:05 UTC; 4 points) 's comment on Bioconservative and biomoderate singularitarian positions by (
- Population Ethics Shouldn’t Be About Maximizing Utility by 18 Mar 2013 2:35 UTC; 4 points) (
- 2 Feb 2011 0:41 UTC; 3 points) 's comment on What is Eliezer Yudkowsky’s meta-ethical theory? by (
- 6 Mar 2009 19:54 UTC; 3 points) 's comment on Is it rational to take psilocybin? by (
- 24 Feb 2013 7:28 UTC; 3 points) 's comment on An attempt to dissolve subjective expectation and personal identity by (
- 18 Mar 2012 12:36 UTC; 3 points) 's comment on The AI design space near the FAI [draft] by (
- 24 May 2013 22:25 UTC; 3 points) 's comment on Is a paperclipper better than nothing? by (
- An attempt to understand the Complexity of Values by 5 Aug 2022 4:43 UTC; 3 points) (
- 25 Nov 2014 19:01 UTC; 3 points) 's comment on Superintelligence 11: The treacherous turn by (
- 24 Apr 2010 22:00 UTC; 3 points) 's comment on Fusing AI with Superstition by (
- Further considerations on the Evidentialist’s Wager by 3 Nov 2022 20:06 UTC; 3 points) (
- 28 Aug 2021 13:00 UTC; 3 points) 's comment on Zach Stein-Perlman’s Shortform by (
- 5 Apr 2022 10:53 UTC; 3 points) 's comment on MIRI announces new “Death With Dignity” strategy by (
- 30 Jan 2020 5:57 UTC; 3 points) 's comment on Using vector fields to visualise preferences and make them consistent by (
- 2 Feb 2023 23:02 UTC; 3 points) 's comment on Is AI risk assessment too anthropocentric? by (
- 22 Apr 2018 19:54 UTC; 3 points) 's comment on The Gift We Give To Tomorrow by (
- 29 Nov 2009 14:34 UTC; 3 points) 's comment on A Nightmare for Eliezer by (
- 9 Jun 2009 0:46 UTC; 3 points) 's comment on indexical uncertainty and the Axiom of Independence by (
- 14 Oct 2015 2:28 UTC; 2 points) 's comment on EA’s Image Problem by (EA Forum;
- 1 Mar 2022 14:39 UTC; 2 points) 's comment on We’re Aligned AI, AMA by (EA Forum;
- 12 Aug 2010 20:54 UTC; 2 points) 's comment on Should I believe what the SIAI claims? by (
- 22 Apr 2023 12:48 UTC; 2 points) 's comment on Matthew Barnett’s Shortform by (
- 17 Mar 2016 15:26 UTC; 2 points) 's comment on Rationality Reading Group: Part V: Value Theory by (
- 23 Aug 2013 21:18 UTC; 2 points) 's comment on Humans are utility monsters by (
- Yud Ethics—LW/ACX Meetup #221 (Wednesday, January 25th) by 23 Jan 2023 22:59 UTC; 2 points) (
- 22 Jul 2023 13:01 UTC; 2 points) 's comment on Fundamentally Fuzzy Concepts Can’t Have Crisp Definitions: Cooperation and Alignment vs Math and Physics by (
- 6 Apr 2022 16:29 UTC; 2 points) 's comment on Google’s new 540 billion parameter language model by (
- 14 Jun 2012 6:12 UTC; 2 points) 's comment on The Creating Bob the Jerk problem. Is it a real problem in decision theory? by (
- 13 Feb 2009 13:54 UTC; 2 points) 's comment on The Evolutionary-Cognitive Boundary by (
- 16 Nov 2013 23:14 UTC; 2 points) 's comment on I notice that I am confused about Identity and Resurrection by (
- 18 Feb 2013 20:43 UTC; 2 points) 's comment on Great rationality posts by LWers not posted to LW by (
- 13 Dec 2019 13:31 UTC; 1 point) 's comment on But exactly how complex and fragile? by (EA Forum;
- Without a trajectory change, the development of AGI is likely to go badly by 30 May 2023 0:21 UTC; 1 point) (EA Forum;
- 27 May 2009 9:31 UTC; 1 point) 's comment on Dissenting Views by (
- 8 Feb 2009 3:53 UTC; 1 point) 's comment on The Thing That I Protect by (
- 10 Jun 2022 21:29 UTC; 1 point) 's comment on AGI Safety FAQ / all-dumb-questions-allowed thread by (
- 4 May 2022 1:35 UTC; 1 point) 's comment on Various Alignment Strategies (and how likely they are to work) by (
- 14 Jun 2009 15:11 UTC; 1 point) 's comment on Why safety is not safe by (
- 1 May 2009 2:41 UTC; 1 point) 's comment on How Not to be Stupid: Adorable Maybes by (
- 19 Jan 2023 18:08 UTC; 1 point) 's comment on Tamsin Leake’s Shortform by (
- 10 Dec 2022 0:28 UTC; 1 point) 's comment on Richard_Kennaway’s Shortform by (
- 28 Dec 2021 15:52 UTC; 1 point) 's comment on Ben Pace’s Controversial Picks for the 2020 Review by (
- 23 Jun 2009 21:44 UTC; 1 point) 's comment on The Domain of Your Utility Function by (
- 19 Jun 2011 20:47 UTC; 0 points) 's comment on Utility Maximization and Complex Values by (
- 25 Jun 2011 11:42 UTC; 0 points) 's comment on Q&A with Shane Legg on risks from AI by (
- 14 Jun 2009 18:07 UTC; 0 points) 's comment on Why safety is not safe by (
- 10 Nov 2011 1:00 UTC; 0 points) 's comment on A question on rationality. by (
- 15 Jun 2015 17:14 UTC; 0 points) 's comment on Philosophical differences by (
- 3 Aug 2009 1:18 UTC; 0 points) 's comment on Pain by (
- 26 Aug 2013 3:30 UTC; 0 points) 's comment on Humans are utility monsters by (
- 25 Oct 2016 6:37 UTC; 0 points) 's comment on Open thread, Oct. 10 - Oct. 16, 2016 by (
- 18 Jan 2015 16:03 UTC; 0 points) 's comment on Superintelligence 18: Life in an algorithmic economy by (
- 21 Mar 2013 5:04 UTC; 0 points) 's comment on Population Ethics Shouldn’t Be About Maximizing Utility by (
- 3 Mar 2010 19:37 UTC; 0 points) 's comment on Hedging our Bets: The Case for Pursuing Whole Brain Emulation to Safeguard Humanity’s Future by (
- 8 Feb 2011 1:13 UTC; 0 points) 's comment on Convergence Theories of Meta-Ethics by (
- 8 Feb 2011 0:02 UTC; 0 points) 's comment on Convergence Theories of Meta-Ethics by (
- 12 Mar 2011 2:04 UTC; 0 points) 's comment on A Thought Experiment on Pain as a Moral Disvalue by (
- 30 Jan 2010 12:53 UTC; -1 points) 's comment on Complexity of Value ≠ Complexity of Outcome by (
- 5 Apr 2022 19:02 UTC; -1 points) 's comment on The case for Doing Something Else (if Alignment is doomed) by (
- 15 Aug 2010 18:31 UTC; -1 points) 's comment on Existential Risk and Public Relations by (
- 10 Oct 2024 16:20 UTC; -2 points) 's comment on Joshua Achiam Public Statement Analysis by (
- 27 Jul 2009 17:09 UTC; -2 points) 's comment on Bayesian Utility: Representing Preference by Probability Measures by (
- 17 Jan 2010 19:10 UTC; -4 points) 's comment on Advice for AI makers by (
- 29 Jan 2011 23:29 UTC; -5 points) 's comment on David Chalmers’ “The Singularity: A Philosophical Analysis” by (
- Complexity of value has implications for Torture vs Specks by 2 Jun 2012 21:11 UTC; -13 points) (
- 6 Feb 2011 18:26 UTC; -18 points) 's comment on My hour-long interview with Yudkowsky on “Becoming a Rationalist” by (
“Except to remark on how many different things must be known to constrain the final answer.”
What would you estimate the probability of each thing being correct is?
What is human morals and metamorals?
What about “near-human” morals, like, say, Kzinti: Where the best of all possible words contains hierarchies, duels to the death, and subsentient females; along with exploration, technology, and other human-like activities. Though I find their morality repugnant for humans, I can see that they have the moral “right” to it. Is human morality, then, in some deep sense better than those?
It is better in the sense that it is ours. It is an inescapable quality of life as an agent with values embedded in a much greater universe that might contain other agents with other values, that ultimately the only thing that makes one particular set of values matter more to that agent is that those are the values that belong to that agent.
We happen to have as one of our values, to respect others’ values. But this particular value happens to be self-contradictory when taken to its natural conclusion. To take it to its conclusion would be to say that nothing matters in the end, not even what we ourselves care about. Consider the case of an alien being whose values include disrespecting others’ values. Is the human value placed on respecting others’ values in some deep sense better than this being’s?
At some point you have to stop and say, “Sorry, my own values take precedence over yours when they are incompatible to this degree. I cannot respect this value of yours.” And what gives you the justification to do this? Because it is your choice, your values. Ultimately, we must be chauvinists on some level if we are to have any values at all. Otherwise, what’s wrong with a sociopath who murders for joy? How can we say that their values are wrong, except to say that their values contradict our own?
No. The point of meta-ethics as outlined on LW is that there is no “deeper sense”, no outside perspective from which to judge moral views against one another.
What would “deeper sense” even mean? Human morality is better (or h-better, whichever terminology you prefer), that’s all there is to it.
I think Eliezer is due for congratulation here. This series is nothing short of a mammoth intellectual achievement, integrating modern academic thought about ethics, evolutionary psychology and biases with the provocative questions of the transhumanist movement. I’ve learned a staggering amount from reading this OB series, especially about human values and my own biases and mental blank spots.
I hope we can all build on this. Really. There’s a lot left to do, especially for transhumanists and those who hope for a significantly better future than the best available in today’s world. For those who have more pedestrian ambitions for the future (i.e. most of the world), this series provides a stark warning as to how the well intentioned may destroy everything.
Bravo!
[crosspost from h+ goodness]
Pearson, it’s not that kind of chaining. More like trying to explain to someone why their randomly chosen lottery ticket won’t win (big space, small target, poor aim) when their brain manufactures argument after argument after different argument for why they’ll soon be rich.
The core problem is simple. The targeting information disappears, so does the good outcome. Knowing enough to refute every fallacious remanufacturing of the value-information from nowhere, is the hard part.
What are the odds that every proof of God’s existence is wrong, when there are so many proofs? Pretty high. A selective search for plausible-sounding excuses won’t change reality itself. But knowing the specific refutations—being able to pinpoint the flaws in every supposed proof—that might take some study.
I have read and considered all of Eliezer’s posts, and still disagree with him on this his grand conclusion. Eliezer, do you think the universe was terribly unlikely and therefore terribly lucky to have coughed up human-like values, rather than some other values? Or is it only in the stage after ours where such rare good values were unlikely to exist?
I imagine a distant future with just a smattering of paper clip maximizers—having risen in different galaxies with slightly different notions of what a paperclip is—might actually be quite interesting. But even so, so what? Screw the paperclips, even if they turn out to be more elegant and interesting than us!
Robin, I discussed this in The Gift We Give To Tomorrow as a “moral miracle” that of course isn’t really a miracle at all. We’re judging the winding path that evolution took to human value, and judging it as fortuitous using our human values. (See also, “Where Recursive Justification Hits Bottom”, “The Ultimate Source”, “Created Already In Motion”, etcetera.)
RH: “I have read and considered all of Eliezer’s posts, and still disagree with him on this his grand conclusion. Eliezer, do you think the universe was terribly unlikely and therefore terribly lucky to have coughed up human-like values, rather than some other values?”
yes, it almost certainly was because of the way we evolved. There are two distinct events here:
A species evolves to intelligence with the particular values we have.
Given that a species evolves to intelligence with some particular values, it decides that it likes those values.
1 is an extremely unlikely event. 2 is essentially a certainty.
One might call this “the ethical anthopic argument”
Evolution (as an algorithm) doesn’t work on the indestructible. Therefore all naturally-evolved beings must be fragile to some extent, and must have evolved to value protecting their fragility.
Yes, a designed life form can have paper clip values, but I don’t think we’ll encounter any naturally occurring beings like this. So our provincial little values may not be so provincial after all, but common on many planets.
Ian C.: [i]”Yes, a designed life form can have paper clip values, but I don’t think we’ll encounter any naturally occurring beings like this. So our provincial little values may not be so provincial after all, but common on many planets.”[i] Almost all life forms (especially simpler ones) are sort of paperclip maximizers, they just make copies of themselves ad infinitum. If life could leave this planet and use materials more efficiently, it would consume everything. Good for us evolution couldn’t optimize them to such an extent.
Ian: some individual values of other naturally-evolved beings may be recognizable, but that doesn’t mean that the value system as a whole will.
I’d expect that carnivores, or herbivores, or non-social creatures, or hermaphrodites, or creatures with a different set of senses—would probably have some quite different values.
And there can be different brain architectures, different social/political organisation, different transwhateverism technology, etc.
Roko:
Not so fast. We like some of our evolved values at the expense of others. Ingroup-outgroup dynamics, the way we’re most motivated only when we have someone to fear and hate: this too is an evolved value, and most of the people here would prefer to do away with it if we can.
The interesting part of moral progress is that the values etched into us by evolution don’t really need to be consistent with each other, so as we become more reflective and our environment changes to force new situations upon us, we realize that they conflict with one another. The analysis of which values have been winning and which have been losing (in different times and places) is another fascinating one...
“Ingroup-outgroup dynamics, the way we’re most motivated only when we have someone to fear and hate: this too is an evolved value, and most of the people here would prefer to do away with it if we can.”
So you would want to eliminate your special care for family, friends, and lovers? Or are you really just saying that your degree of ingroup-outgroup concern is less than average and you wish everyone was as cosmopolitan as you? Or, because ingroup-concern is indexical, it results in different values for different ingroups, so you wish every shared your precise ingroup concerns? Or that you are in a Prisoner’s Dilemma with other groups (or worse), and you think the benefit of changing the values of others would be enough for you to accept a deal in which your own ingroup-concern was eliminated?
http://www.overcomingbias.com/2008/03/unwanted-morali.html
I suspect it gets worse. Eliezer seems to lean heavily on the psychological unity of humankind, but there’s a lot of room for variance within that human dot. My morality is a human morality, but that doesn’t mean I’d agree with a weighted sum across all possible extrapolated human moralities. So even if you preserve human morals and metamorals, you could still end up with a future we’d find horrifying (albeit better than a paperclip galaxy). It might be said that that’s only a Weirdtopia, that’s you’re horrified at first, but then you see that it’s actually for the best after all. But if “the utility function [really] isn’t up for grabs,” then I’ll be horrified for as long as I damn well please.
Well, okay, but the Weirdtopia thesis under consideration makes the empirical falsifiable prediction that “as long as you damn well please” isn’t actually a very long time. Also, I call scope neglect: your puny human brain can model some aspects of your local environment, which is a tiny fraction of this Earth, but you’re simply not competent to judge the entire future, which is much larger.
I would like to point out that you’re probably replying to your past self. This gives me significant amusement.
This post seems almost totally wrong to me. For one thing, its central claim—that without human values the future would, with high probability be dull is not even properly defined.
To be a little clearer, one would need to say something like: if you consider a specified enumeration over the space of possibile utility functions, a random small sample from that space would be “dull” (it might help to say a bit more about what dullness means too, but that is a side issue for now).
That claim might well be true for typical “shortest-first” enumerations in sensible languages—but it is not a very interesting claim—since the dull utility functions would be those which led to an attainable goal—such as “count up to 10 and then stop”.
The “open-ended” utilility functions—the ones that resulted in systems that would spread out—would almost inevitably lead to rich complexity. You can’t turn the galaxy into paper-clips (or whatever) without extensively mastering science, technology, intergalactic flight, nanotechnology—and so on. So, you need scientists and engineers—and other complicated and interesting things. This conclusion seems so obvious as to hardly be worth discussing to me.
I’ve explained all this to Eleizer before. After reading this post I still have very little idea about what it is that he isn’t getting. He seems to think that making paper clips are boring. However, they are not any more boring than making DNA sequences, and that’s the current aim of most living systems.
A prime-seeking civilisation has a competitive disadvantage over one that doesn’t have silly, arbitrary bits tacked on to its utility function. It is more likely to be wiped out in a battle with an alien race—and it’s more likely to suffer from a mutiny from within. However, that is about all. They are unlikely to lack science, technology, or other interesting stuff.
Yes, but there would be no persons. There would be no scientists, no joy of discovery, no feeling of curiosity. There would just be a “process” that, from the outside, would look like an avalanche of expanding machinery, and on the inside would have no subjective experience. It would contain a complex intelligence, but there would be no-one to marvel at the complex intelligence, not even itself, because there would be “no-one home” in all likelihood.
For me, what proved decisive in coming to a low estimate of the value of such a system was the realization that the reason that I liked science, technology, etc, was because of my subjective experiences of finding out the answer.
Interestingness is in the eye of the beholder, but this piece argues that the beholder would have no eye; that there would be an optimizing process that lacked the ability to experience joy over any of its discoveries.
While I think you may very plausibly be correct, there is (I think) some reasonable grounds for uncertainty. I can imagine that an advanced algorithm that performs the role of making scientific discoveries to aid in the development of technologies for the great paperclip fleet might indeed have “some one home”. It maybe that this is beneficial to its effectiveness, or might be close to essential.
I can’t make any strong claims about why this would be needed, only that human beings (at least me) do have “some one home”, but if we didn’t know about human beings are we were speculating on what organisms evolution might produce we might find ourselves postulating complex, social creatures who solve complicated tasks, but have “no one home”, and we would obviously be wrong.
But you don’t need very many, and you’re free to enslave them while they work then kill them once they’re done. They might not need to be conscious, and they certainly don’t need to enjoy their work.
Probably, they will just be minor subroutines of the original AI, deleted and replaced once they learn everything necessary, which won’t take long for a smart AI.
We don’t particularly value copying DNA sequences for its own sake either though. Imagine a future where an unthinking strain of bacteria functioned like grey goo and replicated itself using all matter in its light cone, and it was impervious to mutations. I wouldn’t rate that future as any more valuable than a future where all life went extinct. The goals of evolution aren’t necessarily our goals.
Making a DNA sequence will count as (an extremely low level activity) [http://lesswrong.com/lw/xr/in_praise_of_boredom/] which is necessary to support non-boring activities. It is a very simple argument that these are the very activity we stop thinking about and concentrate on novel activities.
Carl:
I don’t think that automatic fear, suspicion and hatred of outsiders is a necessary prerequisite to a special consideration for close friends, family, etc. Also, yes, outgroup hatred makes cooperation on large-scale Prisoner’s Dilemmas even harder than it generally is for humans.
But finally, I want to point out that we are currently wired so that we can’t get as motivated to face a huge problem if there’s no villain to focus fear and hatred on. The “fighting” circuitry can spur us to superhuman efforts and successes, but it doesn’t seem to trigger without an enemy we can characterize as morally evil.
If a disease of some sort threatened the survival of humanity, governments might put up a fight, but they’d never ask (and wouldn’t receive) the level of mobilization and personal sacrifice that they got during World War II— although if they were crafty enough to say that terrorists caused it, they just might. Concern for loved ones isn’t powerful enough without an idea that an evil enemy threatens them.
Wouldn’t you prefer to have that concern for loved ones be a sufficient motivating force?
@Eliezer: Can you expand on the “less ashamed of provincial values” part?
@Carl Shuman: I don’t know about him, but for myself, HELL YES I DO. Family—they’re just randomly selected by the birth lottery. Lovers—falling in love is some weird stuff that happens to you regardless of whether you want it, reaching into your brain to change your values: like, dude, ew—I want affection and tenderness and intimacy and most of the old interpersonal fun and much more new interaction, but romantic love can go right out of the window with me. Friends—I do value friendship; I’m confused; maybe I just value having friends, and it’d rock to be close friends with every existing mind; maybe I really value preferring some people to others; but I’m sure about this: I should not, and do not want to, worry more about a friend with the flu than about a stranger with cholera.
@Robin Hanson: HUH? You’d really expect natural selection to come up with minds who enjoy art, mourn dead strangers and prefer a flawed but sentient woman to a perfect catgirl on most planets?
This talk about “‘right’ means right” still makes me damn uneasy. I don’t have more to show for it than “still feels a little forced”—when I visualize a humane mind (say, a human) and a paperclipper (a sentient, moral one) looking at each other in horror and knowing there is no way they could agree about whether using atoms to feed babies or make paperclips, I feel wrong. I think about the paperclipper in exactly the same way it thinks about me! Sure, that’s also what happens when I talk to a creationist, but we’re trying to approximate external truth; and if our priors were too stupid, our genetic line would be extinct (or at least that’s what I think) - but morality doesn’t work like probability, it’s not trying to approximate anything external. So I don’t feel so happier about the moral miracle that made us than about the one that makes the paperclipper.
Patrick,
Those are instrumental reasons, and could be addressed in other ways. I was trying to point out that giving up big chunks of our personality for instrumental benefits can be a real trade-off.
http://lesswrong.com/lw/gz/policy_debates_should_not_appear_onesided/
Jordan: “I imagine a distant future with just a smattering of paper clip maximizers—having risen in different galaxies with slightly different notions of what a paperclip is—might actually be quite interesting.”
That’s exactly how I imagine the distant future. And I very much like to point to the cyclic cellular automaton (java applet) as a visualization. Actually, I speculate that we live in a small part of the space-time continuum not yet eaten by a paper clip maximizer. Now you may ask: Why don’t we see huge blobs of paper clip maximizers expanding on the night sky? My answer is that they are expanding with the speed of light in every direction.
Note: I abused the term paper clip maximizer somewhat. Originally I called these things Expanding Space Amoebae, but PCM is more OB.
Probability of an evolved alien species:
(A) Possessing analogues of pleasure and pain: HIGH. Reinforcement learning is simpler than consequentialism for natural selection to stumble across.
(B) Having a human idiom of boredom that desires a steady trickle of novelty: MEDIUM. This has to do with acclimation and adjustment as a widespread neural idiom, and the way that we try to abstract that as a moral value. It’s fragile but not impossible.
(C) Having a sense of humor: LOW.
Probability of an expected paperclip maximizer having analogous properties, if it originated as a self-improving code soup (rather than by natural selection), or if it was programmed over a competence threshold by foolish humans and then exploded:
(A) MEDIUM
(B) LOW
(C) LOW
the vast majority of possible expected utility maximizers, would only engage in just so much efficient exploration, and spend most of its time exploiting the best alternative found so far, over and over and over.
I’m not convinced of that. First, “vast majority” needs to use an appropriate measure, one that is applicable to evolutionary results. If, when two equally probable mutations compete in the same environment, one of those mutations wins, making the other extinct, then the winner needs to be assigned the far greater weight. So, for example, if humans were to compete against a variant of human without the boredom instinct, who would win?
Second, it would seem easier to build (or mutate into) something that keeps going forever than it is to build something that goes for a while then stops. Cancer, for example, just keeps going and going, and it takes a lot of bodily tricks to put a stop to that.
it would seem easier to build (or mutate into) something that keeps going forever than it is to build something that goes for a while then stops.
On reflection, I realize this point might be applied to repetitive drudgery. But I was applying it to the behavior “engage in just so much efficient exploration.” My point is that it may be easier to mutate into something that explores and explores and explores, than it would be to mutate into something that explores for a while then stops.
Thanks for the probability assessments. What is missing are supporting arguments. What you think is relatively clear—but why you think it is not.
...and what’s the deal with mentioning a “sense of humour”? What has that to do with whether a civilization is complex and interesting? Whether our distant descendants value a sense of humour or not seems like an irrelevance to me. I am more concerned with whether they “make it” or not—factors affecting whether our descendants outlast the exploding sun—or whether the seed of human civilisation is obliterated forever.
@Jordan—agreed.
I think the big difference in expected complexity is between sampling the space of possible singletons’ algorithms results and sampling the space of competitive entities. I agree with Eliezer that an imprecisely chosen value function, if relentlessly optimized, is likely to yield a dull universe. To my mind the key is that the ability to relentlessly optimize one function only exists if a singleton gets and keeps an overwhelming advantage over everything else. If this does not happen, we get competing entities with the computationally difficult problem of outsmarting each other. Under this scenario, while I might not like the detailed results, I’d expect them to be complex to much the same extent and for much the same reasons as living organisms are complex.
What if I want a wonderful and non-mysterious universe? Your current argument seems to be that there’s no such thing. I don’t follow why this is so. “Fun” (defined as desire for novelty) may be the simplest way to build a strategy of exploration, but it’s not obvious that it’s the only one, is it?
A series on “theory of motivation” that explores other options besides novelty and fun as prime directors of optimization processes that can improve the universe (in their and maybe even our eyes).
“This talk about “‘right’ means right” still makes me damn uneasy. I don’t have more to show for it than “still feels a little forced”—when I visualize a humane mind (say, a human) and a paperclipper (a sentient, moral one) looking at each other in horror and knowing there is no way they could agree about whether using atoms to feed babies or make paperclips, I feel wrong. I think about the paperclipper in exactly the same way it thinks about me! Sure, that’s also what happens when I talk to a creationist, but we’re trying to approximate external truth; and if our priors were too stupid, our genetic line would be extinct (or at least that’s what I think) - but morality doesn’t work like probability, it’s not trying to approximate anything external. So I don’t feel so happier about the moral miracle that made us than about the one that makes the paperclipper.”
Oh my, this is so wrong. So you’re postulating that the paperclipper would be extinct too due to natural selection? Somehow I don’t see the mechanisms of natural selection applying to that. With it being created once by humans and then exploding, and all that.
If 25% of its “moral drive” is the result of a programming error, is it still “understandable and as much of a worthy creature/shaper of the Universe” as us? This is the cosmopolitan view that Eliezer describes; and I don’t see how you’re convinced that admiring static is just as good as admiring evolved structure. It might just be bias but the later seems much better. Order > chaos, no?
@Jotaf, “Order > chaos, no?”
Imagine God shows up tomorrow. “Everyone, hey, yeah. So I’ve got this other creation and they’re super moral. Man, moral freaks, let me tell you. Make Mennonites look Shintoist. And, sure, I like them better than you. It’s why I’m never around, sorry. Thing is, their planet is about to get eaten by a supernova. So.. I’m giving them the moral green light to invade Earth. It’s been real.”
I’d be the first to sign up for the resistance. Who cares about moral superiority? Are we more moral than a paperclip maximizer? Are human ideals ‘better’? Who cares? I don’t want an OfficeMax universe, so I’ll take up arms against a paperclip maximizer, whether its blessed by God or not.
Carl:
Those are instrumental reasons, and could be addressed in other ways.
I wouldn’t want to modify/delete hatred for instrumental reasons, but on behalf of the values that seem to clash almost constantly with hatred. Among those are the values I meta-value, including rationality and some wider level of altruism.
I was trying to point out that giving up big chunks of our personality for instrumental benefits can be a real trade-off.
I agree with that heuristic in general. I would be very cautious regarding the means of ending hatred-as-we-know-it in human nature, and I’m open to the possibility that hatred might be integral (in a way I cannot now see) to the rest of what I value. However, given my understanding of human psychology, I find that claim improbable right now.
My first point was that our values are often the victors of cultural/intellectual/moral combat between the drives given us by the blind idiot god; most of human civilization can be described as the attempt to make humans self-modify away from the drives that lost in the cultural clash. Right now, much of this community values (for example) altruism and rationality over hatred where they conflict, and exerts a certain willpower to keep the other drive vanquished at times. (E.g. repeating the mantra “Politics is the Mind-Killer” when tempted to characterize the other side as evil).
So far, we haven’t seen disaster from this weak self-modification against hatred, and we’ve seen a lot of good (from the perspective of the values we privilege). I take this as some evidence that we can hope to push it farther without losing what we care about (or what we want to care about).
(E.g. repeating the mantra “Politics is the Mind-Killer” when tempted to characterize the other side as evil)
Uh, I don’t mean that literally, though doing up a whole Litany of Politics might be fun.
Maybe it’s the types I of haunts I’ve been frequenting lately, but the elimination of all conscious life in the universe doesn’t strike me as too terrible at the moment (provided it doesn’t shorten my own lifespan).
We can sort the values evolution gave us into the following categories (not necessarily exhaustive). Note that only the first category of values is likely to be preserved without special effort, if Eliezer is right and our future is dominated by singleton FOOM scenarios. But many other values are likely to survive naturally in alternative futures.
likely values for all intelligent beings and optimization processes (power, resources)
likely values for creatures with roughly human-level brain power (boredom, knowledge)
likely values for all creatures under evolutionary competition (reproduction, survival, family/clan/tribe)
likely values for creatures under evolutionary competition who cannot copy their minds (individual identity, fear of personal death)
likely values for creatures under evolutionary competition who cannot wirehead (pain, pleasure)
likely values for creatures with sexual reproduction (beauty, status, sex)
likely values for intelligent creatures with sexual reproduction (music, art, literature, humor)
likely values for intelligent creatures who cannot directly prove their beliefs (honesty, reputation, piety)
values caused by idiosyncratic environmental characteristics (salt, sugar)
values caused by random genetic/memetic drift and co-evolution (Mozart, Britney Spears, female breasts, devotion to specific religions)
The above probably isn’t controversial, rather the disagreement is mainly on the following:
the probabilities of various future scenarios
which values, if any, can be preserved using approaches such as FAI
which values, if any, we should try to preserve
I agree with Roko that Eliezer has made his case in an impressive fashion, but it seems that many of us are still not convinced on these three key points.
Take the last one. I agree with those who say that human values do not form a consistent and coherent whole. Another way of saying this is that human beings are not expected utility maximizers, not as individuals and certainly not as societies. Nor do most of us desire to become expected utility maximizers. Even amongst the readership of this blog, where one might logically expect to find the world’s largest collection of EU-maximizer wannabes, few have expressed this desire. But there is no principled way to derive an utility function from something that is not an expected utility maximizer!
Is there any justification for trying to create an expected utility maximizer that will forever have power over everyone else, whose utility function is derived using a more or less arbitrary method from the incoherent values of those who happen to live in the present? That is, besides the argument that it is the only feasible alternative to a null future. Many of us are not convinced of this, neither the “only” nor the “feasible”.
Wei_Dai2, it looks like you missed Eliezer’s main point:
It doesn’t matter that “many” values survive, if Eliezer’s “value is fragile” thesis is correct, because we could lose the whole future if we lose just a single critical value. Do we have such critical values? Maybe, maybe not, but you didn’t address that issue.
I like the idea of replying to past selves and think it should be encouraged.
The added bonus is they can’t answer back.
“Yeah, past me is terrible, but don’t even get me started on future me, sheesh!”
Quite. I never expected LW to resemble classic scenes from Homestuck… except, you know, way more functional.
likely values for all intelligent beings and optimization processes (power, resources)
Agree.
likely values for creatures with roughly human-level brain power (boredom, knowledge)
Disagree. Maybe we don’t mean the same thing by boredom?
likely values for all creatures under evolutionary competition (reproduction, survival, family/clan/tribe)
Mostly agree. Depends somewhat on definition of evolution. Some evolved organisms pursue only 1 or 2 of these but all pursue at least one.
likely values for creatures under evolutionary competition who cannot copy their minds (individual identity, fear of personal death)
Disagree. Genome equivalents which don’t generate terminally valued individual identity in the minds they descrive should outperform those that do.
likely values for creatures under evolutionary competition who cannot wirehead (pain, pleasure)
Disagree. Why not just direct expected utility? Pain and pleasure are easy to find but don’t work nearly as well.
likely values for creatures with sexual reproduction (beauty, status, sex)
Define sexual. Most sexual creatures are too simple to value the first two. Most plausible posthumans aren’t sexual in a traditional sense.
likely values for intelligent creatures with sexual reproduction (music, art, literature, humor)
Disagree.
likely values for intelligent creatures who cannot directly prove their beliefs (honesty, reputation, piety)
Agree assuming that they aren’t singletons. Even then for sub-components.
values caused by idiosyncratic environmental characteristics (salt, sugar)
Agree.
values caused by random genetic/memetic drift and co-evolution (Mozart, Britney Spears, female breasts, devotion to specific religions)
Agree. Some caveats about Mozart.
So: you think a “paperclip maximiser” would be “dull”?
How is that remotely defensible? Do you think a “paperclip maximiser” will master molecular nanotechnology, artificial intelligence, space travel, fusion, the art of dismantling planets and stellar farming?
If so, how could that possibly be “dull”? If not, what reason do you have for thinking that those technologies would not help with the making of paper clips?
Apparently-simple processes can easily produce great complexity. That’s one of the lessons of Conway’s game.
Maybe we don’t mean the same thing by boredom?
I’m using Eliezer’s definition: a desire not to do the same thing over and over again. For a creature with roughly human-level brain power, doing the same thing over and over again likely means it’s stuck in a local optimum of some sort.
Genome equivalents which don’t generate terminally valued individual identity in the minds they descrive should outperform those that do.
I don’t understand this. Please elaborate.
Why not just direct expected utility? Pain and pleasure are easy to find but don’t work nearly as well.
I suppose you mean why not value external referents directly instead of indirectly through pain and pleasure. As long as wireheading isn’t possible, I don’t see why the latter wouldn’t work just as well as the former in many cases. Also, the ability to directly value external referents depends on a complex cognitive structure to assess external states, which may be more vulnerable in some situations to external manipulation (i.e. unfriendly persuasion or parasitic memes) than hard-wired pain and pleasure, although the reverse is probably true in other situations. It seems likely that evolution would come up with both.
Define sexual. Most sexual creatures are too simple to value the first two. Most plausible posthumans aren’t sexual in a traditional sense.
I mean reproduction where more than one party contributes genetic material and/or parental resources. Even simple sexual creatures probably have some notion of beauty and/or status to help attract/select mates, but for the simplest perhaps “instinct” would be a better word than “value”.
- likely values for intelligent creatures with sexual reproduction (music, art, literature, humor)
Disagree.
These all help signal fitness and attract mates. Certainly not all intelligent creatures with sexual reproduction will value exactly music, art, literature, and humor, but it seems likely they will have values that perform the equivalent functions.
@Jotaf: No, you misunderstood—guess I got double-transparent-deluded. I’m saying this:
Probability is subjectively objective
Probability is about something external and real (called truth)
Therefore you can take a belief and call it “true” or “false” without comparing it to another belief
If you don’t match truth well enough (if your beliefs are too wrong), you die
So if you’re still alive, you’re not too stupid—you were born with a smart prior, so justified in having it
So I’m happy with probability being subjectively objective, and I don’t want to change my beliefs about the lottery. If the paperclipper had stupid beliefs, it would be dead—but it doesn’t, it has evil morals.
Morality is subjectively objective
Morality is about some abstract object, a computation that exists in Formalia but nowhere in the actual universe
Therefore, if you take a morality, you need another morality (possibly the same one) to assess it, rather than a nonmoral object
Even if there was some light in the sky you could test morality against, it wouldn’t kill you for your morality being evil
So I don’t feel on better moral ground than the paperclipper. It has human_evil morals, but I have paperclipper_evil morals—we are exactly equally horrified.
They are not perfect expected utility maximizers. However, no expected utility maximizer is perfect. Humans approach the ideal at least as well as other organisms. Fitness maximization is the central explanatory principle in biology—and the underlying idea is the same. The economic framework associated with utilitarianism is general, of broad applicability, and deserves considerable respect.
You can model any agent as in expected utility maximizer—with a few caveats about things such as uncomputability and infinitely complex functions.
You really can reverse-engineer their utility functions too—by considering them as Input-Transform-Output black boxes—and asking what expected utility maximizer would produce the observed transformation.
A utility function is like a program in a Turing-complete language. If the behaviour can be computed at all, it can be computed by a utility function.
A utility function is like a program in a Turing-complete language. If the behaviour can be computed at all, it can be computed by a utility function.
Tim, I’ve seen you state this before, but it’s simply wrong. A utility function is not like a Turing-complete language. It imposes rather strong constraints on possible behavior.
Consider a program which when given the choices (A,B) outputs A. If you reset it and give it choices (B,C) it outputs B. If you reset it again and give it choices (C,A) it outputs C. The behavior of this program cannot be reproduced by a utility function.
Here’s another example: When given (A,B) a program outputs “indifferent”. When given (equal chance of A or B, A, B) it outputs “equal chance of A or B”. This is also not allowed by EU maximization.
Wei Dai: Consider a program which when given the choices (A,B) outputs A. If you reset it and give it choices (B,C) it outputs B. If you reset it again and give it choices (C,A) it outputs C. The behavior of this program cannot be reproduced by a utility function.
I don’t know the proper rational-choice-theory terminology, but wouldn’t modeling this program just be a matter of describing the “space” of choices correctly? That is, rather than making the space of choices {A, B, C}, make it the set containing
(1) = taking A when offered A and B, (2) = taking B when offered A and B,
(3) = taking B when offered B and C, (4) = taking C when offered B and C,
(5) = taking C when offered C and A, (6) = taking A when offered C and A.
Then the revealed preferences (if that’s the way to put it) from your experiment would be (1) > (2), (3) > (4), and (5) > (6). Viewed this way, there is no violation of transitivity by the relation >, or at least none revealed so far. I would expect that you could always “smooth over” any transitivity-violation by making an appropriate description of the space of options. In fact, I would guess that there’s a standard theory about how to do this while still keeping the description-method as useful as possible for purposes such as prediction.
That is silly—the associated utility function is the one you have just explicitly given. To rephrase:
if (senses contain (A,B)) selecting A has high utility; else if (senses contain (B,C)) selecting B has high utility; else if (senses contain (C,A)) selecting C has high utility;
Again, you have just given the utility function by describing it. As for “indifference” being a problem for a maximisation algorithm—it really isn’t in the context of decision theory. An agent either takes some positive action, or it doesn’t. Indifference is usually modelled as lazyness—i.e. a preference for taking the path of least action.
No it isn’t. It is a list of preferences. The corresponding utility function would be a function U(X) from {A,B,C} to the real numbers such that
1) U(A)>U(B) 2) U(B)>U(C) and 3) U(C)>U(A)
But only some lists of preferences can be described by utility functions, and this one can’t, because 1) and 2) imply that U(A)>U(C), which contradicts 3).
Err, that got ugly. How do you make beutiful quotes on this site?
There’s a help link under the box you type in. (Use > for quotes, as in email.)
See also the Markdown documentation.
Thank you.
I doubt the premise. Where are you getting that from? It wasn’t in the specification of the problem.
From the definition of utility function.
That seems like a ridiculous reply—it says nothing about the issue there.
Tim, that’s what the term means. This other thing that you have called a “utility function”, is not in fact a utility function, because that’s not what the term means. It’s already been pointed out that not every list of preferences can be derived from a utility function. If you want to define or use a generalization of the notion of utility function, you should do so explicitly.
I have no argument with the definition of the term “utility function”. It is a function that maps outcomes to utilities—usually real numbers. The function I described did just that. If you don’t understand that, then you should explain what aspects of the function’s map from outcomes to utilities you don’t understand—since it seemed to be a pretty simple one to me.
I don’t think that all preferences can be expressed as a utility function. For example, some preferences are uncomputable.
Note that Tyrrell_McAllister2′s reply makes exactly the same point as I am making.
See, this would have been a lot clearer if you had specified initially that your objection was to the domain.
Sorry if there was any confusion. Here are all the possible outcomes—and their associated (real valued) utilities—laboriously spelled out in a table:
Remembers being presented with (A,B) and chooses A—utility 1.0.
Remembers being presented with (A,B) and chooses B—utility 0.0.
Remembers being presented with (B,C) and chooses B—utility 1.0.
Remembers being presented with (B,C) and chooses C—utility 0.0.
Remembers being presented with (C,A) and chooses C—utility 1.0.
Remembers being presented with (C,A) and chooses A—utility 0.0.
Other action—utility 0.0.
It seems like an odd place for congratulations—since the conclusion here seems to be about 180 degrees out of whack—and hardly anyone seems to agree with it. I asked how one of the ideas here was remotely defensible. So far, there have been no takers.
If there is not even a debate, whoever is incorrect on this topic would seem to be in danger of failing to update. Of course personally, I think it is Eliezer who needs to update. I have quite a bit in common with Eliezer—and I’d like to be on the same page as him—but it is difficult to do when he insists on defending positions that I regard as poorly-conceived.
The utility function of Deep Blue has 8,000 parts—and contained a lot of information. Throw all that information away, and all you really need to reconstruct Deep Blue is the knowledge that it’s aim is to win games of chess. The exact details of the information in the original utility function are not recovered—but the eventual functional outcome would be much the same—a powerful chess computer.
The “targeting information” is actually a bunch of implementation details that can be effectively recreated from the goal—if that should prove to be necessary.
It is not precious information that must be preserved. If anything, attempts to preserve the 8,000 parts of Deep Blue’s utility function while improving it would actually have a crippling negative effect on its future development. Similarly with human values: those are a bunch of implementation details—not the real target.
If Deep Blue had emotions and desires that were attached to the 8,000 parts of its utility function, if it drew great satisfaction, meaning, and joy from executing those 8,000 parts regardless of whether doing so resulted in winning a chess game, then yes, those 8,000 parts would be precious information that needed to be preserved. It would be a horrible disaster if they were lost. They wouldn’t be the programmer’s real target, but why in the world would Emotional Deep Blue care about what it’s programmer wanted? It wouldn’t want to win at chess, it would want to implement those 8,000 parts! That’s what its real target is!
For humans, our real target is all those complex values that evolution metaphorically “programmed” into us. We don’t care at all about what evolution’s “real target” was. If those values were destroyed or replaced then it would be bad for us because those values are what humans really care about. Saying humans care about genetic fitness because we sometimes accidentally enhance it when we are fulfilling our real values is like saying that automobile drivers care about maximizing CO2 content in the atmosphere because they do that by accident when they drive. Humans don’t care about genetic fitness, we never have, and hopefully we never will.
In fact, evolution doesn’t even have a real target. It’s an abstract statistical description of certain trends in the history of life. When we refer to it as “wanting” things and having “goals” that’s not because it really does. It’s because humans are good at understanding the minds of other humans, but bad at understanding abstract processes, so it helps people understand how evolution works better if we metaphorically describe it as a human-like mind with certain goals, even though that isn’t true. Modeling evolution as having a “goal” describes it less accurately, but it makes up for it by making the model easier for a human brain to run.
When you say that preserving those parts of the utility function would have a “crippling negative” effect you are forgetting an important referent: Negative for who? Evolution has no feelings and desires, so preserving human values would not be crippling or negative for it, nothing is crippling of negative for it, since doesn’t really have any feelings or goals. It literally doesn’t care about anything. By contrast humans do have feelings and desires, so failing to preserve our values would have a crippling and negative effect on our future development, because we would lose something we deeply care about.
The problem with self-improving Deep Blue preserving its 8,000 heuristics is that it might cause it to lose games of chess, to a player with a better representation of its target. If that happens, its 8,000 heuristics will probably turn out to assign very low values to the resulting lost games. Of course, that means that the values weren’t very effectively maximized in the first place. Just so—that’s one of the problems with working from a dud set of heuristics that poorly encode your target.
We potentially face a similar issue. Plenty of folks would love to live in a world where their every desire is satisfied—and they live in continual ecstasy. However, pursuing such goals in the short-term could easily lead humanity towards long-term extinction. We face much the same problem with our values that self-improving Deep Blue faces with its heuristics.
This issue doesn’t have anything particularly to do with the difference between psychological and genetic optimization targets. Both genes and minds value dying out very negatively. They agree on the relevant values.
There’s a proposed solution to this problem: pursue universal instrumental values until you have conquered the universe, and then switch to pursuing your “real” values. However it’s a controversial proposal. When will you be confident of not facing a stronger opponent with different values? How much does lugging those “true values” around for billions of years actually cost?
My position is that you’ll probably never know that you are safe, and that the cost isn’t that great—but that any such expense is an intolerable squandering of resources.
Minds value not dying out because dying out would mean that they can no longer pursue “true values,” not because not dying out is an end in itself. Imagine we were given a choice between:
A) The human race dies out.
B) The human race survives forever, but every human being alive and who will ever live will be tortured 24⁄7 by a sadistic AI.
Any sane person would choose A. That’s because in scenario B the human race, even though it survives, is unable to pursue any of its values, and is forced to pursue one of its major disvalues.
There is no point in the human race surviving if it can’t pursue its values.
I personally think the solution for the species is the same as it is for an individual, mix pursuit of terminal and instrumental values. I do this every day and I assume you do as well. I spend lots of time and effort making sure that I will survive and exist in the future. But I also take minor risks, such as driving a car, in order to lead a more fun and interesting life.
Carl’s proposal sounds pretty good to me. Yes, it has dangers, as you correctly pointed out. But some level of danger has to be accepted in order to live a worthwhile life.
It’s likely to not be a binary decision. We may well be able to trade preserving values against a better chance of surviving at all. The more we deviate from universal instrumental values, the greater our chances of being wiped out by accidents or aliens. The more we adhere to universal instrumental values, the more of our own values get lost.
Since I see our values heavily overlapping with universal instrumental values, adopting them doesn’t seem too bad to me—while all our descendants being wiped out seems pretty negative—although also rather unlikely.
How to deal with this tradeoff is a controversial issue. However, it certainly isn’t obvious that we should struggle to preserve our human values—and resist adopting universal instrumental values. That runs a fairly clear risk of screwing up the future for all our descendants.
If that’s the case I don’t think we disagree about anything substantial. We probably just disagree about what percentage of resources should go to UIV and what should go to terminal values.
You might be right to some extent. Human beings tend to place great terminal value on big, impressive achievements, and quickly colonizing the universe would certainly involve doing that.
It’s a tricky and controversial issue. The cost of preserving our values looks fairly small—but any such expense diverts resources away from the task of surviving—and increases the risk of eternal oblivion. Those who are wedded to the idea of preserving their values will need to do some careful accounting on this issue, if they want the world to run such risks.
While the phrase “universal instrumental values” has the word “instrumental” in it, that’s just one way of thinking about them. You could also call them “nature’s values” or “god’s values”. You can contrast them with human values—but it isn’t really an “instrumental vs terminal” issue.
Tim and Tyrrell, do you know the axiomatic derivation of expected utility theory? If you haven’t read http://cepa.newschool.edu/het/essays/uncert/vnmaxioms.htm or something equivalent, please read it first.
Yes, if you change the spaces of states and choices, maybe you can encode every possible agent as an utility function, not just those satisfying certain axioms of “rationality” (which I put in quotes because I don’t necessarily agree with them), but that would be to miss the entire point of expected utility theory, which is that it is supposed to be a theory of rationality, and is supposed to rule out irrational preferences. That means using state and choice spaces where those axiomatic constraints have real world meaning.
Utility theory is bigger than the VN axioms. They are just one way of looking at things.
Wei: Most people in most situations would reject the idea that the set of options presented is part of the outcome—would say that (A,B,C) is a better outcome space than the richer one Tyrrell suggested—so expected utility theory is applicable. A set of preferences can never be instrumentally irrational, but it can be unreasonable as judged by another part of your morality.
Specifically, the point of utility theory is the attempt to predict the actions of complex agents by dividing them into two layers:
Simple list of values
Complex machinery for attaining those values
The idea being that if you can’t know the details of the machinery, successful prediction might be possible by plugging the values into your own equivalent machinery.
Does this work in real life? In practice it works well for simple agents, or complex agents in simple/narrow contexts. It works well for Deep Blue, or for Kasparov on the chessboard. It doesn’t work for Kasparov in life. If you try to predict Kasparov’s actions away from the chessboard using utility theory, it ends up as epicycles; every time you see him taking a new action you can write a corresponding clause in your model of his utility function, but the model has no particular predictive power.
In hindsight we shouldn’t really have expected otherwise; simple models in general have predictive power only in simple/narrow contexts.
Counter-example 1: gene-frequency maximization in biology. A tremendously simple principle with enormous explanatory power.
Counter-example 2: Entropy maximization. Another tremendously simple principle with enormous explanatory power.
Note that both are maximization principles—the very type of principle whose limitations you are arguing for.
To expand on my categorization of values a bit more, it seems clear to me that at least some human value do not deserved to be forever etched into the utility function of a singleton. Those caused by idiosyncratic environmental characteristics like taste for salt and sugar, for example. To me, these are simply accidents of history, and I wouldn’t hesitate (too much) to modify them away in myself, perhaps to be replaced by more interesting and exotic tastes.
What about reproduction? It’s a value that my genes programmed into me for their own purposes, so why should I be obligated to stick with it forever?
Or consider boredom. Eventually I may become so powerful that I can easily find the globally optimal course of action for any set of goals I might have, and notice that the optimal course of action often involves repetition of some kind. Why should I retain my desire not to do the same thing over and over again, which was programmed into me by evolution back when minds had a tendency to get stuck in local optimums?
And once I finally came to that realization, I felt less ashamed of values that seemed ‘provincial’ - but that’s another matter.
Eliezer, I wonder if this actually has more to do with your current belief that rationality equals expected utility maximization. For an expected utility maximizer, there is no distinction between ‘provincial’ and ‘universal’ values, and certainly no reason to ever feel ashamed of one’s values. One just optimizes according to whatever values one happens to have. But as I argued before, human beings are not expected utility maximizers, and I don’t see why we should try to emulate them, especially this aspect.
In dealing with your example, I didn’t “change the space of states or choices”. All I did was specify a utility function. The input states and output states were exactly as you specified them to be. The agent could see what choices were available, and then it picked one of them—according to the maximum value of the utility function I specified.
The corresponding real world example is an agent that prefers Boston to Atlanta, Chicago to Boston, and Atlanta to Chicago. I simply showed how a utility maximiser could represent such preferences. Such an agent would drive in circles—but that is not necessarily irrational behaviour.
Of course much of the value of expected utility theory arises when you use short and simple utility functions—however, if you are prepared to use more complex utility functions, there really are very few limits on what behaviours can be represented.
The possibility of using complex utility functions does not in any way negate the value of the theory for providing a model of rational economic behaviour. In economics, the utility function is pretty fixed: maximise profit, with specified risk aversion and future discounting. That specifies an ideal which real economic agents approximate. Plugging in an arbitrary utility function is simply an illegal operation in that context.
The analogy between the theory that humans behave like expected utility maximisers—and the theory that atoms behave like billiard balls could be criticised—but it generally seems quite appropriate to me.
This is a critical post. I disagree with where Eliezer has gone from here; but I’m with him up to and including this point. This post is a good starting point for a dialogue.
I don’t know, or maybe I don’t understand your point. I would find a quiet and silent, post-human world very beatiful in a way. A world where the only reminders of the great, yet long gone civilisation would be ancient ruins.. Super structures which once were the statues of human prosperity and glory, now standing along with nothing but trees and plants, forever forgotten. Simply sleeping in a never ending serenity and silence...
Don’t you too, find such a future very beatiful in an eerie way? Even if there is no sentient being to perceive it at that time, the fact that such a future may exist one day, and that it can now be perceived through art and imagination, is where it’s beauty truly lies.
I suspect that you are imagining this world a good because you can’t actually separate your imagined observer from the world. The world you are talking about is not just a failure of humanity it is a world where we have failed so much that nothing is alive to witness our failure.
I don’t think you can call such a world good or perfect, but I don’t think it’s all bad either. I quess you could call it neutral.
I mean, I don’t see that world as a big failure, if a failure at all. No civilization will be there forever*, but the one I mentioned had at least achieved something at it’s time: it had once been glorious. While it left it’s statues, it still managed to keep the world habitable for life and other species. (note how I mentioned trees and plants growing on the ruins). To put it simple, it was a beatiful civilization that left a beatiful world.. It isn’t fair to call it a failure only because it wasn’t eternal.
*Who am I to say that?
I’ll only speak for myself, but ‘everybody dead’ gives an output nowhere near zero on my utility function. Everybody dead is awful. It’s not the worst imaginable outcome, but it is really really really low in my preference ordering. I can see why you would think it’s neutral—there’s nobody to be happy but there’s nobody to suffer either. However, if you think that people dying is a bad thing in itself, this outcome really is horrifying.
Value isn’t fragile because value isn’t a process. Only processes can be fragile or robust.
Winning the lottery is fragile, is a fragile process, because it had to be done all in one go. Contrast that with the process of writing down a 12 digit phone number: if you to try to memoriese the whole number, and then write it down, you are likely to make a mistake, due to Millers law. Writing digits down one at time, as you hear them, is more robust. Being able to ask for corrections, or having errors pointed out to you, is more robust still.
Processes that are incremental and involve error correction are robust, and can handle large volumes of data. The data aren’t the problem. Trying to preload an AI with the total of human rationality is the problem, because it is the most fragile way of installing human value. Safety researchers need to aim for error, connection, ie corrigibility.
This is the key point on which I disagree with Eliezer. I don’t disagree with what he literally says here, but with what he implies and what he concludes. The key context he isn’t giving here is that what he says here only applies fully to a hard-takeoff AI scenario. Consider what he says about boredom:
The things he lists here make an argument that, in the absence of competition, an existing value system can drift into one that doesn’t mind boredom. But none of them address the argument he’s supposedly addressing, that bored creatures will fail to compete and be eliminated. I infer that he dismisses that argument because the thing he’s talking about having value-drift, the thing that appears in the next line, is a hard-takeoff AI that doesn’t have to compete.
The right way to begin asking whether minds can evolve not to be bored is to sample minds that have evolved, preferably independently, and find how many of them mind boredom.
Earth has no independently-evolved minds that I know of; all intelligent life is metazoans, and all metazoans are intelligent. This does show us, however, that intelligence is an evolutionary ratchet. Unlike other phenotypic traits, it doesn’t disappear from any lines after evolving. That’s remarkable, and relevant: Intelligence doesn’t disappear. So we can strike off “universe of mindless plankton” from our list of moderately-probable fears.
Some metazoans live lives of extreme boredom. For instance, spiders, sea urchins, molluscs, most fish, maybe alligators. Others suffer physically from boredom: parrots, humans, dogs, cats. What distinguishes these two categories?
Animals that don’t become bored are generally small, small-brained, have low metabolisms, short lifespans, a large number of offspring, and live in a very narrow range of environmental conditions. Animals that become bored are just the opposite. There are exceptions: fish and alligators have long lifespans, and alligators are large. But we can see how these traits conspire to produce an organism that can afford to be bored:
Small-brained, short lifespan, large number of offspring, narrow range of environmental conditions: These are all conditions under which it is better for the species to adapt to the environment by selection or by environment-directed development than by learning. Insects and nematodes can’t learn much except via selection; their brains appear to have identical neuron number and wiring within a species. Alligator brains weigh about 8 grams.
Low metabolism merely correlates with low activity, which is how I identified most of these organisms, equating “not moving” with “not minding boredom.” Small correlates with short lifespan and small-brained.
These things require learning: long lifespan, a changing environment, and minimizing reproduction time. If an organism will need to compete in a changing environment or across many different environments, as birds and mammals do, they’ll need to learn. If a mother’s knowledge is encoded in a form that she can’t transmit to her children, they’ll need to learn.
This business of having children is difficult to translate to a world of AIs. But the business of adaptation is clear. Given that active, curious, intelligent, environment-transforming minds already exist, and given continued competition, only minds that can adapt to rapid change will be able to remain powerful. So we can also strike “world dominated by beings who build paperclips” off our list of fears, provided those conditions are maintained. All we need do is ensure continued competition. Intelligence will not de-evolve, and intelligence will keep the environment changing rapidly enough that constant learning will be necessary, and so will be boredom.
The space of possible minds is large. The space of possible evolved minds is much smaller. The space of possible minds co-evolved with competition is much smaller than that. The space X of possible co-evolved minds capable of dominating the human race is much smaller than that.
Let Y = the set of value systems that might be produced from trying to enumerate human “final” values and put them in a utility function which will be evaluated by a single symbolic logic engine, incorporating all types of values above the level of the gene (body, mind, conscious mind, kin group, social group, for starters), with context-free set-membership functions that classify percepts into a finite set of atomic symbols prior to considering the context those symbols will be used in, and that will be designed to prevent final values from changing. I take that as roughly Eliezer’s approach.
Let f(Z) be a function over sets of possible value systems, which tells how many of them are not repugnant to us.
My estimation is that f(X) / |X| >> f(Y) / |Y|. Therefore, the best approach is not to try to enumerate human final values and code them into an AI, but to study how co-evolution works, and what conditions give rise to the phenomena we value such as intelligence, consciousness, curiosity, and affection. Then try to direct the future to stay within those conditions.
I think some of the assumptions here have lead you to false conclusions. For one, you seem to assume that because humans share some values, all humans have an identical value system. This is just plain wrong, humans each have their own unique value “signature” more or less like a fingerprint. If there is one thing that you place more value weight on than a person who is otherwise identical, you are different. That being said, does your argument still hold with this, albeit minor in the grand scheme of things, heterogeneity added to human value systems? I don’t think so. I think there is plenty of reason to think that human values will be much more robust because of the person-to-person differential.
Furthermore, I think the premise of this article kind of comes back to your claim that boredom is an absolute value. After you claim this, you go on to say how it was evolved over time (which is correct), but still hold that it is absolute (can’t you see the contradiction here?). How can something be absolute if it evolved over time in humans to enhance survival?
Further, who’s to say that with the advent of ASI this wouldn’t be “cured” (so to speak). That is, an ASI should be able to detect the cause of human boredom and can thus genetically reprogram us to fix it. How can something that is structural due to evolutionary and environmental components of human development be considered a “human value”? Being a value implies that it somehow transcends biological constraints, I.e. tradition like religion, etc. You are painting boredom as a value when it is little more than an instinct. One can argue that even though something causes a biologically structural change, it constitutes a value. I can concede that, but how can you insist that the universe will have no “point” if these “values” get adjusted to compromise with the existence of an ASI? Value is completely subjective to the organism that holds it. The transhuman will have different values, and the universe will not necessarily contain less values for him/her/it at that time. In fact, it will likely be much richer to them.
Lastly: “A paperclip maximizer just chooses whichever action leads to the greatest number of paperclips.” I counter with “a biological system just chooses (through natural selection) whichever action leads to greatest number of biological systems”. How did this argument help you, exactly? Humans are subject to the same subjective value that a machine ASI would be subjected to. The only way to pretend that human value isn’t just another component of how humans historically have done this, is by bestowing some sort of transcendent component to human biology (i.e. a soul or something). I think this is a methodological flaw to your argument.
Human values are special because we are human. Each of us is at the center of the universe, from our own perspective, regardless of what the rest of the universe thinks of that. It’s the only way for anything to have value at all, because there is no other way to choose one set of values over another except that you happen to embody those values. The paperclip maximizer’s goals do not have value with respect to our own, and it is only our own that matter to us.
A paperclip maximizer could have its values adjusted to want to make staples instead. But what would the paperclip maximizer think of this? Clearly, this would be contrary to its current goal of making paperclips. As a consequence, the paperclip maximizer will not want to permit such a change, since what it would become would be meaningless with respect to its current values. The same principle applies to human beings. I do not want my values to be modified because who I would become would be devalued with respect to my current values. Even if the new me found the universe every bit as rich and meaningful as the old me did, it would be no comfort to me now because the new me’s values would not coincide my current values.
Regarding this post and the complexity of value:
Taking a paperclip maximizer as a starting point, the machine can be divided up into two primary components: the value function, which dictates that more paperclips is a good thing, and the optimizer that increases the universe’s score with respect to that value function. What we should aim for, in my opinion, is to become the value function to a really badass optimizer. If we build a machine that asks us how happy we are, and then does everything in its power to improve that rating (so long as it doesn’t involve modifying our values or controlling our ability to report them), that is the only way we can build a machine that reliably encompasses all of our human values.
Any other route and we are only steering the future by proxy—via an approximation to our values that may be fatally flawed and make it impossible for us to regain control when things go wrong. Even if we could somehow perfectly capture all of our values in a single function, there is still the matter of how that value function is embedded via our perceptions, which may differ from the machine’s, the fact that our values may continue to change over time and thereby invalidate that function, and the fact that we each have our own unique variation on those values to start with. So yes, we should definitely keep our hands on the steering wheel.
I’m not sure if ‘fragile’ is the right word, removing one component might be devastating, but in my opinion, that more reflects on the importance of each piece, and not so much on the fragility of the actual system. The way I see it, it’s something like a tower with 4 large beams for support, if one takes out a single piece, it would be worse than, say if one removed a piece from a tower with 25 smaller beams to support it.
But other than that, thank you very much for the informative article.
Did anyone notice that this flatly contradicts Three Worlds Collide? The superhappies and babyeaters don’t inherit from human morals at all (let alone detailedly and reliably), but the humans still regard the aliens as moral patients, having meddling preferences for the babyeater children to not be eaten, rather than being as indifferent as they would be to heaps of pebbles being scattered.
(Yes, it was fiction, but one would imagine the naturalistic metaethics of aliens were meant to be taken at face value, even if there is no Alderson drive and the evolutionary psychology resulting in baby-eaters specifically was literary casuistry.)
So if the moral of this post isn’t quite right, how should it be revised?
“Any Future not shaped by a goal system with detailed reliable inheritance from human morals will be incomprehensibly alien and arbitrary-looking in a non-valued direction, even after taking into account how much you think you’re ‘cosmopolitan’; furthermore, AI goal systems are expected to look even more arbitrary than those of biological aliens, which would at least share the design signature of natural selection”?
How much is this modified moral weakened by potential analogies between natural selection of biological creatures, and gradient-descent or self-play AI training regimens?
I don’t think Three Worlds Collide should be interpreted as having anything to do with actual aliens, any more than The Scorpion and the Frog should be interpreted as having anything to do with actual scorpions and frogs. TWC uses different alien species to allegorically explore human differences of opinion.
Contrast my likewise-fictional story Kindness to Kin.
No, I think this post is right as-is. As you say, Three Worlds Collide was fiction. There is no “but”. It’s fictional evidence, and so it should update us not at all.
Sorry, the function of bringing up Three Worlds Collide was to point out the apparent contradiction in the Yudkowskian canon. Forget the story; I agree that fiction didn’t happen and therefore isn’t evidence.
The actual issue is that it seems like worlds shaped by the goal systems of other evolved biological creatures probably don’t “contain almost nothing of worth”: the lives of octopuses mean much less to me than human lives, but more than tiny molecular paperclips. The theme of “animal-like organisms that feel pleasure and pain” is something that natural selection will tend to reinvent, and the idealized values of those organisms are not a random utility function. (Do you disagree? If so, you at least face a Sorites problem on how fast value drops off as you look at our evolutionary history. Do chimpanzees matter? If not, did Homo erectus?) But if other animals aren’t literally-as-valueless-as-paperclips, then some classes of AI architecture might not be, either.
Having disagreed with Zack many times in the past, it is a pleasure to say: I think this is absolutely right (except that I think I’d replace “pleasure and pain” with “something pleasure-like and something pain-like”); that bit of “Value is Fragile” is surely wrong, and the intuitions that drove the relevant bits of “Three Worlds Collide” are more reflective of how actual human value systems work.
I think I’d want to distinguish two related but separate issues here. (1) Should we expect that (some) other intelligent agents are things whose welfare we value? (Whether they are might depend on whether we think they have internal mechanisms that resemble our mechanisms of pleasure, pain, hope, fear, etc.) (2) Should we expect that (some) other intelligent agents share some of our values? (Whether they do would depend on how far the structure of their thinking has converged with ours.) If there are other intelligent species out there, then whether they’re “animal-like organisms that feel pleasure and pain” addresses #1 and whether “the idealized values of those organisms are not a random utility function” addresses #2.
(Of course, how much we care about their welfare may depend on how much we think they share our values, for internalized-game-theory-ish reasons. And presumably they’re likely to share more of our values if their motivational systems work similarly to ours. So the issues are not only related but interdependent.)
Suppose that (evolved/uplifted/otherwise-advanced-enough-for-sapience) octopuses share some of our values. Now suppose that humans go extinct, and these Octopus sapiens create an advanced civilization, whose products instantiate some values we would recognize, like art, music, science, etc.
Does this future contain anything of value? I say it does not, because there are no humans around to value it. There are octopuses, and that’s great for the octopuses, but as far as human values go, this future ended with humanity’s extinction. Whatever happens afterwards is irrelevant.
EDIT: Mind you—this is not quite the point Eliezer was making, I don’t think; I am responding to gjm’s comment, here. This comment should not necessarily be taken to constitute part of a defense of the point made in the OP (and quoted by Zack upthread).
When I consider this possible universe, I find that I do attach some value to the welfare of these sapient octopuses, and I do consider that it’s a universe that contains plenty of value. (It depends somewhat on whether they have, as well as values resembling ours, something I can recognize as welfare; see my last couple of paragraphs above.) If there were a magic switch I could control, where one setting is “humans go extinct, no other advanced civilization ever exists” and the other is “humans go extinct, the sapient octopus civilization arises”, I would definitely put it on the second setting, and if sufficiently convinced that the switch would really do what it says then I think I would pay a nonzero amount, or put up with nonzero effort or inconvenience, to put it there.
Of course my values are mine and your values are yours, and if we disagree there may be no way for either of us to persuade the other. But I’ll at least try to explain why I feel the way I do. (So far as I can; introspection is difficulty and unreliable.)
First, consider two possible futures. 1: Humanity continues for millions of years, substantially unchanged from how we are now. (I take it we agree that in this case the future universe contains much of value.) 2: Humanity continues for millions of years, gradually evolving (in the Darwinian sense or otherwise) but always somewhat resembling us, and always retaining something like our values. It seems to me that here, too, the future universe contains much of value.
The sapient octopuses, I am taking it, do somewhat resemble us and have something like our values. Perhaps as much so as our descendants in possible future 2. So why should I care much less about them? I can see only one plausible reason: because our descendants are, in fact, our descendants: they are biologically related to us. How plausible is that reason?
Possible future 3: at some point in that future history of humanity, our descendants decide to upload themselves into computers and continue their lives virtually. Possible future 4: at some point in that virtual existence they decide they’d like to be embodied again, and arrange for it to happen. Their new bodies are enough like original-human bodies for them to feel at home in them, but they use some freshly-invented genetic material rather than DNA, and many of the internal organs are differently designed.
I don’t find that the loss of biological continuity in these possible futures makes me not care about the welfare of our kinda-sorta-descendants there. I don’t see any reason why it should, either. So if I should care much less about the octopuses, what matters must be some more generalized sort of continuity: the future-kinda-humans are our “causal descendants” or something, even if not our biological descendants.
At that point I think I stop; I can see how someone might find that relationship super-important, and care about “causal descendants” but not about other beings, physically and mentally indistinguishable, who happen not to be our “causal descendants”; but I don’t myself feel much inclination to see that as super-important, and I don’t see any plausible way to change anyone’s mind on the matter by argument.
One can construct all sorts of hypothetical scenarios, but I am far from convinced of their usefulness in teasing out our “true” values (as contrasted with “confabulating some plausible-sounding, but not reflectively stable, set of values”). That said, it seems to me that how much I value (and should value) any given future depends on the degree of that future’s resemblance to my current values. So, to take the examples:
Indeed, we agree.
Well, it depends: it seems to me that the further from my current values this future humanity drifts, the less I value this future.
Crucially, it seems to me that the degree of difference (at any given future time period) will depend (and how can it not?) on the starting point. Start with current humans, and you get one degree of resemblance; start with octopuses, on the other hand…
I would not like for this to happen, personally. I value this future substantially less, thereby.
The impact of this biological re-invention on how valuable the future is, will depend on what impact it has on observable and experiential traits of this new humanity—I care about the interface, so to speak, not the implementation details. (After all, suppose that, while I slept, you replaced my liver, kidneys, pancreas, and some other internal organs with a different set of organs—which, however, performed all the same functions, allowing me to continue living my life as before. I do not see what difference this would make to… well, almost anything, really. Perhaps I couldn’t even tell that this had been done! Would this matter in any moral calculus? I think not…)
Causal descendancy is something, certainly; but, again, for me it is a question of degree of resemblance. Perhaps another way of putting it is: could I inhabit this future? Would I, personally, find it… fun? Would I, living inside it, consider it to be awesome, amazing, wonderful? Or would I find it to be alien and bizarre? It is all well and good to “expect weirdtopia”, but there is no law of morality that says I have to want weirdtopia…
How do you get from:
to:
…?
Because it sure seems to me that a future shaped by the goal systems of octopuses will, indeed, contain almost nothing of worth. (And I do not see what the heck “feel[ing] pleasure and pain” has to do with anything…)
(And, yeah, other animals are close to being as valueless as paperclips. [EDIT: In the sense of “value as a moral subject”, of course; in terms of instrumental value, well, paperclips aren’t valueless either—not regular ones, anyhow.] I like octopuses, but tiling the universe with them doesn’t constitute the creation of a huge amount of value, that’s for sure.)
Consider a human being—specifically not yourself. Why are they relevant to your values but an octopus isn’t?
After answering that:
In a hypothetical where an octopus is an artist, a scientist, an author and a reader, why does the difference remain?
If you construct a scenario where an “octopus” is actually just a “human in a funny suit”, then sure, you can draw all sorts of unintuitive conclusions. I don’t consider this to be informative.
Fair. I was drawing on your comment:
For what it’s worth, I don’t buy this. To my intuitions, it seems like the whole universe experiencing the literal optimal experience, over and over, with no variation, sounds...obviously good.
Insofar as it seems less than great, I think that’s only because we’re engaging in the typical mind fallacy: projecting our own internal sense of boredom onto the future universe. But it wouldn’t feel boring to do that same thing over and over again. That’s the whole point.