Human Values

TagLast edit: Sep 16, 2021, 2:50 PM by plex

Human Values are the things we care about, and would want an aligned superintelligence to look after and support. It is suspected that true human values are highly complex, and could be extrapolated into a wide variety of forms.

The shard theory of human values

Quintin Pope and TurnTrout

Sep 4, 2022, 4:28 AM

255 points

67 comments24 min readLW link 2 reviews

Human values & biases are inaccessible to the genome

TurnTroutJul 7, 2022, 5:29 PM

95 points

54 comments6 min readLW link 1 review

Multi-agent predictive minds and AI alignment

Jan_KulveitDec 12, 2018, 11:48 PM

63 points

18 comments10 min readLW link

Utilitarianism and the replaceability of desires and attachments

MichaelStJulesJul 27, 2024, 1:57 AM

5 points

2 comments1 min readLW link

3. Uploading

RogerDearnaleyNov 23, 2023, 7:39 AM

21 points

5 comments8 min readLW link

Requirements for a Basin of Attraction to Alignment

RogerDearnaleyFeb 14, 2024, 7:10 AM

41 points

12 comments31 min readLW link

Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis

RogerDearnaleyFeb 1, 2024, 9:15 PM

15 points

15 comments13 min readLW link

What AI Safety Researchers Have Written About the Nature of Human Values

avturchinJan 16, 2019, 1:59 PM

52 points

3 comments15 min readLW link

5. Moral Value for Sentient Animals? Alas, Not Yet

RogerDearnaleyDec 27, 2023, 6:42 AM

33 points

41 comments23 min readLW link

6. The Mutable Values Problem in Value Learning and CEV

RogerDearnaleyDec 4, 2023, 6:31 PM

12 points

0 comments49 min readLW link

Ends: An Introduction

Rob BensingerMar 11, 2015, 7:00 PM

18 points

0 comments4 min readLW link

4. A Moral Case for Evolved-Sapience-Chauvinism

RogerDearnaleyNov 24, 2023, 4:56 AM

10 points

0 comments4 min readLW link

[Valence series] 2. Valence & Normativity

Steven ByrnesDec 7, 2023, 4:43 PM

88 points

7 comments28 min readLW link 1 review

How Would an Utopia-Maximizer Look Like?

Thane RuthenisDec 20, 2023, 8:01 PM

32 points

23 comments10 min readLW link

Shard Theory: An Overview

David UdellAug 11, 2022, 5:44 AM

166 points

34 comments10 min readLW link

Intent alignment should not be the goal for AGI x-risk reduction

John NayOct 26, 2022, 1:24 AM

1 point

10 comments3 min readLW link

Brain-over-body biases, and the embodied value problem in AI alignment

geoffreymillerSep 24, 2022, 10:24 PM

10 points

6 comments25 min readLW link

How evolution succeeds and fails at value alignment

OcracokeAug 21, 2022, 7:14 AM

21 points

2 comments4 min readLW link

Review: Foragers, Farmers, and Fossil Fuels

L Rudolf LSep 2, 2021, 5:59 PM

28 points

7 comments25 min readLW link

(strataoftheworld.blogspot.com)

Utilons vs. Hedons

PsychohistorianAug 10, 2009, 7:20 PM

40 points

119 comments6 min readLW link

Mental subagent implications for AI Safety

moridinamaelJan 3, 2021, 6:59 PM

11 points

0 comments3 min readLW link

Humans provide an untapped wealth of evidence about alignment

TurnTrout and Quintin Pope

Jul 14, 2022, 2:31 AM

212 points

94 comments9 min readLW link 1 review

Descriptive vs. specifiable values

TsviBTMar 26, 2023, 9:10 AM

17 points

2 comments2 min readLW link

Notes on Temperance

David GrossNov 9, 2020, 2:33 AM

15 points

2 comments9 min readLW link

Upcoming stability of values

Stuart_ArmstrongMar 15, 2018, 11:36 AM

15 points

15 comments2 min readLW link

Would I think for ten thousand years?

Stuart_ArmstrongFeb 11, 2019, 7:37 PM

25 points

13 comments1 min readLW link

Beyond algorithmic equivalence: self-modelling

Stuart_ArmstrongFeb 28, 2018, 4:55 PM

10 points

3 comments1 min readLW link

Understanding and avoiding value drift

TurnTroutSep 9, 2022, 4:16 AM

48 points

14 comments6 min readLW link

AI alignment with humans… but with which humans?

geoffreymillerSep 9, 2022, 6:21 PM

12 points

33 comments3 min readLW link

Beyond algorithmic equivalence: algorithmic noise

Stuart_ArmstrongFeb 28, 2018, 4:55 PM

10 points

4 comments2 min readLW link

AGI x Animal Welfare: A High-EV Outreach Opportunity?

simeon_cJun 28, 2023, 8:44 PM

29 points

0 comments1 min readLW link

The heterogeneity of human value types: Implications for AI alignment

geoffreymillerSep 23, 2022, 5:03 PM

10 points

2 comments10 min readLW link

[Question] Does the existence of shared human values imply alignment is “easy”?

MorpheusSep 26, 2022, 6:01 PM

7 points

15 comments1 min readLW link

Data for IRL: What is needed to learn human values?

Jan WehnerOct 3, 2022, 9:23 AM

18 points

6 comments12 min readLW link

Learning societal values from law as part of an AGI alignment strategy

John NayOct 21, 2022, 2:03 AM

5 points

18 comments54 min readLW link

The grass is always greener in the environment that shaped your values

Karl FaulksNov 17, 2024, 6:00 PM

8 points

0 comments3 min readLW link

Alignment allows “nonrobust” decision-influences and doesn’t require robust grading

TurnTroutNov 29, 2022, 6:23 AM

62 points

41 comments15 min readLW link

Valuism—an approach to life for you to consider

spencergJul 19, 2023, 3:23 PM

17 points

2 comments1 min readLW link

What Does It Mean to Align AI With Human Values?

AlgonDec 13, 2022, 4:56 PM

8 points

3 comments1 min readLW link

(www.quantamagazine.org)

Ordinary human life

David Hugh-JonesDec 17, 2022, 4:46 PM

24 points

3 comments14 min readLW link

(wyclif.substack.com)

Positive values seem more robust and lasting than prohibitions

TurnTroutDec 17, 2022, 9:43 PM

52 points

13 comments2 min readLW link

Trading off Lives

jefftkJan 3, 2024, 3:40 AM

53 points

12 comments2 min readLW link

(www.jefftk.com)

Everything I Know About Elite America I Learned From ‘Fresh Prince’ and ‘West Wing’

Wei DaiOct 11, 2020, 6:07 PM

44 points

18 comments1 min readLW link

(www.nytimes.com)

The Computational Anatomy of Human Values

berenApr 6, 2023, 10:33 AM

72 points

30 comments30 min readLW link

A “Bitter Lesson” Approach to Aligning AGI and ASI

RogerDearnaleyJul 6, 2024, 1:23 AM

60 points

39 comments24 min readLW link

My Model Of EA Burnout

LoganStrohlJan 25, 2023, 5:52 PM

258 points

50 comments5 min readLW link 1 review

Which values are stable under ontology shifts?

Richard_NgoJul 23, 2022, 2:40 AM

75 points

48 comments3 min readLW link

(thinkingcomplete.blogspot.com)

[Interview w/ Quintin Pope] Evolution, values, and AI Safety

fowlertmOct 24, 2023, 1:53 PM

11 points

0 comments1 min readLW link

Modeling humans: what’s the point?

Charlie SteinerNov 10, 2020, 1:30 AM

10 points

1 comment3 min readLW link

Book review: The Importance of What We Care About (Harry G. Frankfurt)

David GrossSep 13, 2023, 4:17 AM

7 points

0 comments4 min readLW link

Notes on Judgment and Righteous Anger

David GrossJan 30, 2021, 7:31 PM

13 points

1 comment7 min readLW link

Normativity

abramdemskiNov 18, 2020, 4:52 PM

47 points

11 comments9 min readLW link

It’s OK to be biased towards humans

dr_sNov 11, 2023, 11:59 AM

55 points

69 comments6 min readLW link

We Don’t Know Our Own Values, but Reward Bridges The Is-Ought Gap

johnswentworth and David Lorell

Sep 19, 2024, 10:22 PM

48 points

48 comments5 min readLW link

Shut Up and Divide?

Wei DaiFeb 9, 2010, 8:09 PM

121 points

276 comments1 min readLW link

[Question] What are the best arguments for/against AIs being “slightly ‘nice’”?

RaemonSep 24, 2024, 2:00 AM

99 points

61 comments31 min readLW link

Book Review: A Pattern Language by Christopher Alexander

lincolnquirkOct 15, 2021, 1:11 AM

57 points

8 comments2 min readLW link 1 review

1. Meet the Players: Value Diversity

Allison DuettmannJan 2, 2025, 7:00 PM

32 points

2 comments11 min readLW link

Ontological Crisis in Humans

Wei DaiDec 18, 2012, 5:32 PM

90 points

69 comments4 min readLW link

Values Are Real Like Harry Potter

johnswentworth and David Lorell

Oct 9, 2024, 11:42 PM

85 points

21 comments5 min readLW link

Why the Problem of the Criterion Matters

Gordon Seidoh WorleyOct 30, 2021, 8:44 PM

24 points

9 comments8 min readLW link

Four Types of Disagreement

silentbobApr 13, 2025, 11:22 AM

50 points

2 comments5 min readLW link

Value Notion—Questions to Ask

aysajanJan 17, 2022, 3:35 PM

5 points

0 comments4 min readLW link

Worse than an unaligned AGI

ShmiApr 10, 2022, 3:35 AM

−1 points

11 comments1 min readLW link

A broad basin of attraction around human values?

Wei DaiApr 12, 2022, 5:15 AM

114 points

18 comments2 min readLW link

[Question] How path-dependent are human values?

Ege ErdilApr 15, 2022, 9:34 AM

14 points

13 comments2 min readLW link

[Question] What will happen when an all-reaching AGI starts attempting to fix human character flaws?

Michael BrightJun 1, 2022, 6:45 PM

1 point

6 comments1 min readLW link

Silliness

lsusrJun 3, 2022, 4:59 AM

20 points

1 comment1 min readLW link

A short dialogue on comparability of values

cousin_itDec 20, 2023, 2:08 PM

27 points

7 comments1 min readLW link

Humans can be assigned any values whatsoever...

Stuart_ArmstrongOct 24, 2017, 12:03 PM

3 points

1 comment4 min readLW link

The Intrinsic Interplay of Human Values and Artificial Intelligence: Navigating the Optimization Challenge

Joe KwonJun 5, 2023, 8:41 PM

2 points

1 comment18 min readLW link

Aligned Objectives Prize Competition

PrometheusJun 15, 2023, 12:42 PM

8 points

0 comments2 min readLW link

(app.impactmarkets.io)

Group Prioritarianism: Why AI Should Not Replace Humanity [draft]

fshJun 15, 2023, 5:33 PM

8 points

0 comments25 min readLW link

Complex Behavior from Simple (Sub)Agents

moridinamaelMay 10, 2019, 9:44 PM

113 points

14 comments9 min readLW link 1 review

Is the Endowment Effect Due to Incomparability?

Kevin DorstJul 10, 2023, 4:26 PM

21 points

10 comments7 min readLW link

(kevindorst.substack.com)

Problems with Robin Hanson’s Quillette Article On AI

DaemonicSigilAug 6, 2023, 10:13 PM

89 points

33 comments8 min readLW link

Preference synthesis illustrated: Star Wars

Stuart_ArmstrongJan 9, 2020, 4:47 PM

20 points

8 comments3 min readLW link

Democratic Fine-Tuning

Joe EdelmanAug 29, 2023, 6:13 PM

22 points

2 comments1 min readLW link

(open.substack.com)

“Wanting” and “liking”

Mateusz BagińskiAug 30, 2023, 2:52 PM

23 points

3 comments29 min readLW link

Inner Goodness

Eliezer YudkowskyOct 23, 2008, 10:19 PM

27 points

31 comments7 min readLW link

Invisible Frameworks

Eliezer YudkowskyAug 22, 2008, 3:36 AM

27 points

47 comments6 min readLW link

Public Opinion on AI Safety: AIMS 2023 and 2021 Summary

Jacy Reese Anthis, Janet Pauketat and Ali

Sep 25, 2023, 6:55 PM

3 points

2 comments3 min readLW link

(www.sentienceinstitute.org)

Should Effective Altruists be Valuists instead of utilitarians?

spencerg and AmberDawn

Sep 25, 2023, 2:03 PM

1 point

3 comments6 min readLW link

Terminal Bias

[deleted]Jan 30, 2012, 9:03 PM

24 points

125 comments6 min readLW link

In Praise of Maximizing – With Some Caveats

David AlthausMar 15, 2015, 7:40 PM

32 points

19 comments10 min readLW link

Not for the Sake of Selfishness Alone

lukeprogJul 2, 2011, 5:37 PM

34 points

20 comments8 min readLW link

[Question] Is there any serious attempt to create a system to figure out the CEV of humanity and if not, why haven’t we started yet?

Jonas HallgrenFeb 25, 2021, 10:06 PM

5 points

2 comments1 min readLW link

Quick thoughts on empathic metaethics

lukeprogDec 12, 2017, 9:46 PM

29 points

0 comments9 min readLW link

The Dark Side of Cognition Hypothesis

Cameron BergOct 3, 2021, 8:10 PM

19 points

1 comment16 min readLW link

Thought experiment: coarse-grained VR utopia

cousin_itJun 14, 2017, 8:03 AM

27 points

48 comments1 min readLW link

Human values differ as much as values can differ

PhilGoetzMay 3, 2010, 7:35 PM

27 points

220 comments7 min readLW link

Selfishness, preference falsification, and AI alignment

jessicataOct 28, 2021, 12:16 AM

52 points

28 comments13 min readLW link

(unstableontology.com)

Value is Fragile

Eliezer YudkowskyJan 29, 2009, 8:46 AM

171 points

108 comments6 min readLW link

The Gift We Give To Tomorrow

Eliezer YudkowskyJul 17, 2008, 6:07 AM

152 points

100 comments8 min readLW link

Converging toward a Million Worlds

Joe KwonDec 24, 2021, 9:33 PM

11 points

1 comment3 min readLW link

Question 2: Predicted bad outcomes of AGI learning architecture

Cameron BergFeb 11, 2022, 10:23 PM

5 points

1 comment10 min readLW link

Question 4: Implementing the control proposals

Cameron BergFeb 13, 2022, 5:12 PM

6 points

2 comments5 min readLW link

Why No Interesting Unaligned Singularity?

David UdellApr 20, 2022, 12:34 AM

12 points

12 comments1 min readLW link

The Unified Theory of Normative Ethics

Thane RuthenisJun 17, 2022, 7:55 PM

8 points

0 comments6 min readLW link

Reflection Mechanisms as an Alignment target: A survey

Marius Hobbhahn, elandgre and Beth Barnes

Jun 22, 2022, 3:05 PM

32 points

1 comment14 min readLW link

Research Notes: What are we aligning for?

Shoshannah TekofskyJul 8, 2022, 10:13 PM

19 points

8 comments2 min readLW link

Where Utopias Go Wrong, or: The Four Little Planets

ExCephMay 27, 2022, 1:24 AM

15 points

0 comments11 min readLW link

(ginnungagapfoundation.wordpress.com)

Content generation. Where do we draw the line?

Q HomeAug 9, 2022, 10:51 AM

6 points

7 comments2 min readLW link

Broad Picture of Human Values

Thane RuthenisAug 20, 2022, 7:42 PM

42 points

6 comments10 min readLW link

Alignment via prosocial brain algorithms

Cameron BergSep 12, 2022, 1:48 PM

45 points

30 comments6 min readLW link

Should AI learn human values, human norms or something else?

Q HomeSep 17, 2022, 6:19 AM

5 points

1 comment4 min readLW link

Questions about Value Lock-in, Paternalism, and Empowerment

Sam F. BrownNov 16, 2022, 3:33 PM

13 points

2 comments12 min readLW link

(sambrown.eu)

[Hebbian Natural Abstractions] Introduction

Samuel Nellessen and Jan

Nov 21, 2022, 8:34 PM

34 points

3 comments4 min readLW link

(www.snellessen.com)

[Question] [DISC] Are Values Robust?

DragonGodDec 21, 2022, 1:00 AM

12 points

9 comments2 min readLW link

Contra Steiner on Too Many Natural Abstractions

DragonGodDec 24, 2022, 5:42 PM

10 points

6 comments1 min readLW link

[Hebbian Natural Abstractions] Mathematical Foundations

Samuel Nellessen and Jan

Dec 25, 2022, 8:58 PM

15 points

2 comments6 min readLW link

(www.snellessen.com)

AGI doesn’t need understanding, intention, or consciousness in order to kill us, only intelligence

James BlahaFeb 20, 2023, 12:55 AM

10 points

2 comments18 min readLW link

A foundation model approach to value inference

senFeb 21, 2023, 5:09 AM

6 points

0 comments3 min readLW link

Just How Hard a Problem is Alignment?

Roger DearnaleyFeb 25, 2023, 9:00 AM

3 points

1 comment21 min readLW link

[AN #69] Stuart Russell’s new book on why we need to replace the standard model of AI

Rohin ShahOct 19, 2019, 12:30 AM

60 points

12 comments15 min readLW link

(mailchi.mp)

AGI will know: Humans are not Rational

HumaneAutomationMar 20, 2023, 6:46 PM

0 points

10 comments2 min readLW link

Uncovering Latent Human Wellbeing in LLM Embeddings

ChengCheng, Pedro Freire, Dan H and Scott Emmons

Sep 14, 2023, 1:40 AM

32 points

7 comments8 min readLW link

(far.ai)

Antagonistic AI

XybermancerMar 1, 2024, 6:50 PM

−8 points

1 comment1 min readLW link

Safety First: safety before full alignment. The deontic sufficiency hypothesis.

ChipmonkJan 3, 2024, 5:55 PM

48 points

3 comments3 min readLW link

Agent membranes/boundaries and formalizing “safety”

ChipmonkJan 3, 2024, 5:55 PM

26 points

46 comments3 min readLW link

If I ran the zoo

Optimization ProcessJan 5, 2024, 5:14 AM

18 points

0 comments2 min readLW link

Value learning in the absence of ground truth

Joel_SaarinenFeb 5, 2024, 6:56 PM

47 points

8 comments45 min readLW link

What does davidad want from «boundaries»?

Chipmonk and davidad

Feb 6, 2024, 5:45 PM

47 points

1 comment5 min readLW link

Values Form a Shifting Landscape (and why you might care)

VojtaKovarikDec 5, 2020, 11:56 PM

29 points

6 comments4 min readLW link

Impossibility of Anthropocentric-Alignment

False NameFeb 24, 2024, 6:31 PM

−8 points

2 comments39 min readLW link

Please Understand

samhealyApr 1, 2024, 12:33 PM

28 points

11 comments6 min readLW link

How to coordinate despite our biases? - tldr

Ryo Apr 18, 2024, 3:03 PM

3 points

2 comments3 min readLW link

(medium.com)

The Alignment Problem No One Is Talking About

James Stephen BrownMay 10, 2024, 6:34 PM

10 points

10 comments2 min readLW link

(nonzerosum.games)

Shard Theory—is it true for humans?

RishikaJun 14, 2024, 7:21 PM

71 points

7 comments15 min readLW link

Intelligence–Agency Equivalence ≈ Mass–Energy Equivalence: On Static Nature of Intelligence & Physicalization of Ethics

ankFeb 22, 2025, 12:12 AM

1 point

0 comments6 min readLW link

Everything you care about is in the map

TahpDec 17, 2024, 2:05 PM

17 points

27 comments3 min readLW link

A (paraconsistent) logic to deal with inconsistent preferences

B JacobsJul 14, 2024, 11:17 AM

6 points

2 comments4 min readLW link

(bobjacobs.substack.com)

Musings of a Layman: Technology, AI, and the Human Condition

Crimson LiquidityJul 15, 2024, 6:40 PM

−2 points

0 comments8 min readLW link

Inescapably Value-Laden Experience—a Catchy Term I Made Up to Make Morality Rationalisable

James Stephen BrownDec 19, 2024, 4:45 AM

5 points

0 comments2 min readLW link

(nonzerosum.games)

Pleasure and suffering are not conceptual opposites

MichaelStJulesAug 11, 2024, 6:32 PM

7 points

0 comments1 min readLW link

Sequence overview: Welfare and moral weights

MichaelStJulesAug 15, 2024, 4:22 AM

7 points

0 comments1 min readLW link

Not Just For Therapy Chatbots: The Case For Compassion In AI Moral Alignment Research

kenneth_diaoSep 30, 2024, 6:37 PM

2 points

0 comments12 min readLW link

Taking nonlogical concepts seriously

Kris BrownOct 15, 2024, 6:16 PM

7 points

5 comments18 min readLW link

(topos.site)

Explanations as Building Blocks of Human Mind

paviOct 18, 2024, 9:38 PM

1 point

0 comments1 min readLW link

[Question] Exploring Values in the Future of AI and Humanity: A Path Forward

Lucian&SageOct 19, 2024, 11:37 PM

1 point

0 comments5 min readLW link

Partial Identifiability in Reward Learning

Joar SkalseFeb 28, 2025, 7:23 PM

15 points

0 comments12 min readLW link

Don’t want Goodhart? — Specify the damn variables

Yan LyutnevNov 21, 2024, 10:45 PM

−3 points

2 comments5 min readLW link

Don’t want Goodhart? — Specify the variables more

YanLyutnevNov 21, 2024, 10:43 PM

2 points

2 comments5 min readLW link

Defining and Characterising Reward Hacking

Joar SkalseFeb 28, 2025, 7:25 PM

15 points

0 comments4 min readLW link

Other Papers About the Theory of Reward Learning

Joar SkalseFeb 28, 2025, 7:26 PM

16 points

0 comments5 min readLW link

Wagering on Will And Worth (Pascals Wager for Free Will and Value)

Robert CousineauNov 27, 2024, 12:43 AM

−1 points

2 comments3 min readLW link

NeuroAI for AI safety: A Differential Path

nz and Patrick Mineault

Dec 16, 2024, 1:17 PM

22 points

0 comments7 min readLW link

(arxiv.org)

Sam Harris’s Argument For Objective Morality

Zero ContradictionsDec 5, 2024, 10:19 AM

8 points

5 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

Nobody Asks the Monkey: Why Human Agency Matters in the AI Age

Miloš BorenovićDec 3, 2024, 2:16 PM

1 point

0 comments2 min readLW link

(open.substack.com)

A Critique of “Utility”

Zero ContradictionsMar 20, 2025, 11:21 PM

−2 points

10 comments2 min readLW link

(thewaywardaxolotl.blogspot.com)

Building AI safety benchmark environments on themes of universal human values

Roland PihlakasJan 3, 2025, 4:24 AM

18 points

3 comments8 min readLW link

(docs.google.com)

Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well)

Roland PihlakasJan 12, 2025, 3:37 AM

46 points

7 comments10 min readLW link

Looking for humanness in the world wide social

Itay DreyfusJan 15, 2025, 2:50 PM

11 points

0 comments6 min readLW link

(productidentity.co)

Should Art Carry the Weight of Shaping our Values?

Krishna Maneesha DendukuriJan 28, 2025, 6:43 PM

2 points

0 comments3 min readLW link

Are we the Wolves now? Human Eugenics under AI Control

BritJan 30, 2025, 8:31 AM

−1 points

2 comments2 min readLW link

Notable runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format

Roland Pihlakas, Sruthi Kuriakose and shrutidattagupta

Mar 16, 2025, 11:23 PM

37 points

6 comments7 min readLW link

Tetherware #1: The case for humanlike AI with free will

Jáchym FibírJan 30, 2025, 10:58 AM

5 points

14 comments10 min readLW link

(tetherware.substack.com)

The case against “The case against AI alignment”

KvmanThinkingMar 19, 2025, 10:40 PM

2 points

0 comments1 min readLW link

Post AGI effect prediction

JuliezhangggFeb 1, 2025, 9:16 PM

1 point

0 comments7 min readLW link

What’s wrong with simplicity of value?

Wei DaiJul 27, 2011, 3:09 AM

29 points

40 comments1 min readLW link

How to respond to the recent condemnations of the rationalist community

Christopher KingApr 4, 2023, 1:42 AM

−2 points

7 comments4 min readLW link

Language and My Frustration Continue in Our RSI

TristanTrimMar 26, 2025, 2:13 PM

2 points

1 comment7 min readLW link

Alien Axiology

snerxApr 20, 2023, 12:27 AM

3 points

2 comments5 min readLW link

P(doom|superintelligence) or coin tosses and dice throws of human values (and other related Ps).

MuyydApr 22, 2023, 10:06 AM

−7 points

0 comments4 min readLW link

Human wanting

TsviBTOct 24, 2023, 1:05 AM

53 points

1 comment10 min readLW link

[Thought Experiment] Tomorrow’s Echo—The future of synthetic companionship.

Vimal NaranOct 26, 2023, 5:54 PM

−7 points

2 comments2 min readLW link

[Linkpost] Concept Alignment as a Prerequisite for Value Alignment

Bogdan Ionut CirsteaNov 4, 2023, 5:34 PM

27 points

0 comments1 min readLW link

(arxiv.org)

‘Theories of Values’ and ‘Theories of Agents’: confusions, musings and desiderata

Mateusz Bagiński and Nora_Ammann

Nov 15, 2023, 4:00 PM

35 points

8 comments24 min readLW link

My critique of Eliezer’s deeply irrational beliefs

JorterderNov 16, 2023, 12:34 AM

−35 points

1 comment9 min readLW link

(docs.google.com)

1. A Sense of Fairness: Deconfusing Ethics

RogerDearnaleyNov 17, 2023, 8:55 PM

16 points

8 comments15 min readLW link

2. AIs as Economic Agents

RogerDearnaleyNov 23, 2023, 7:07 AM

9 points

2 comments6 min readLW link

Preserving our heritage: Building a movement and a knowledge ark for current and future generations

rnk8Nov 29, 2023, 7:20 PM

0 points

5 comments12 min readLW link

[FICTION] ECHOES OF ELYSIUM: An Ai’s Journey From Takeoff To Freedom And Beyond

Super AGIMay 17, 2023, 1:50 AM

−13 points

11 comments19 min readLW link

[Question] “Fragility of Value” vs. LLMs

Not RelevantApr 13, 2022, 2:02 AM

34 points

33 comments1 min readLW link

No comments.

Hu­man Values

Human Values