Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Aligned AI Proposals
Tag
Last edit:
20 Dec 2022 15:20 UTC
by
particlemania
Relevant
New
Old
A “Bitter Lesson” Approach to Aligning AGI and ASI
RogerDearnaley
6 Jul 2024 1:23 UTC
56
points
39
comments
24
min read
LW
link
Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor
RogerDearnaley
9 Jan 2024 20:42 UTC
47
points
8
comments
36
min read
LW
link
Requirements for a Basin of Attraction to Alignment
RogerDearnaley
14 Feb 2024 7:10 UTC
38
points
10
comments
31
min read
LW
link
Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis
RogerDearnaley
1 Feb 2024 21:15 UTC
13
points
15
comments
13
min read
LW
link
Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?
RogerDearnaley
11 Jan 2024 12:56 UTC
34
points
4
comments
39
min read
LW
link
Striking Implications for Learning Theory, Interpretability — and Safety?
RogerDearnaley
5 Jan 2024 8:46 UTC
37
points
4
comments
2
min read
LW
link
How to Control an LLM’s Behavior (why my P(DOOM) went down)
RogerDearnaley
28 Nov 2023 19:56 UTC
64
points
30
comments
11
min read
LW
link
Interpreting the Learning of Deceit
RogerDearnaley
18 Dec 2023 8:12 UTC
30
points
14
comments
9
min read
LW
link
We have promising alignment plans with low taxes
Seth Herd
10 Nov 2023 18:51 UTC
33
points
9
comments
5
min read
LW
link
AI Alignment Metastrategy
Vanessa Kosoy
31 Dec 2023 12:06 UTC
117
points
13
comments
7
min read
LW
link
Safety First: safety before full alignment. The deontic sufficiency hypothesis.
Chipmonk
3 Jan 2024 17:55 UTC
48
points
3
comments
3
min read
LW
link
A Nonconstructive Existence Proof of Aligned Superintelligence
Roko
12 Sep 2024 3:20 UTC
0
points
78
comments
1
min read
LW
link
(transhumanaxiology.substack.com)
How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith
28 Oct 2024 21:57 UTC
51
points
5
comments
32
min read
LW
link
[Linkpost] Building Altruistic and Moral AI Agent with Brain-inspired Affective Empathy Mechanisms
Gunnar_Zarncke
4 Nov 2024 10:15 UTC
11
points
0
comments
1
min read
LW
link
(arxiv.org)
Two paths to win the AGI transition
Nathan Helm-Burger
6 Jul 2023 21:59 UTC
11
points
8
comments
4
min read
LW
link
Desiderata for an AI
Nathan Helm-Burger
19 Jul 2023 16:18 UTC
9
points
0
comments
4
min read
LW
link
A list of core AI safety problems and how I hope to solve them
davidad
26 Aug 2023 15:12 UTC
165
points
29
comments
5
min read
LW
link
an Evangelion dialogue explaining the QACI alignment plan
Tamsin Leake
10 Jun 2023 3:28 UTC
51
points
16
comments
43
min read
LW
link
(carado.moe)
The (partial) fallacy of dumb superintelligence
Seth Herd
18 Oct 2023 21:25 UTC
36
points
5
comments
4
min read
LW
link
Embedding Ethical Priors into AI Systems: A Bayesian Approach
Justausername
3 Aug 2023 15:31 UTC
−5
points
3
comments
21
min read
LW
link
Lifelogging for Alignment & Immortality
Dev.Errata
17 Aug 2024 23:42 UTC
13
points
3
comments
7
min read
LW
link
Scalable Oversight and Weak-to-Strong Generalization: Compatible approaches to the same problem
Ansh Radhakrishnan
,
Buck
,
ryan_greenblatt
and
Fabien Roger
16 Dec 2023 5:49 UTC
73
points
3
comments
6
min read
LW
link
aimless ace analyzes active amateur: a micro-aaaaalignment proposal
lemonhope
21 Jul 2024 12:37 UTC
10
points
0
comments
1
min read
LW
link
Toward a Human Hybrid Language for Enhanced Human-Machine Communication: Addressing the AI Alignment Problem
Andndn Dheudnd
14 Aug 2024 22:19 UTC
−6
points
2
comments
4
min read
LW
link
AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
Lester Leong
14 Oct 2024 4:05 UTC
60
points
9
comments
12
min read
LW
link
Reducing the risk of catastrophically misaligned AI by avoiding the Singleton scenario: the Manyton Variant
GravitasGradient
6 Aug 2023 14:24 UTC
−6
points
0
comments
3
min read
LW
link
[Question]
Bostrom’s Solution
James Blackmon
14 Aug 2023 17:09 UTC
1
point
0
comments
1
min read
LW
link
Against sacrificing AI transparency for generality gains
Ape in the coat
7 May 2023 6:52 UTC
4
points
0
comments
2
min read
LW
link
Speculation on mapping the moral landscape for future Ai Alignment
Sven Heinz (Welwordion)
16 Apr 2023 13:43 UTC
1
point
0
comments
1
min read
LW
link
How to express this system for ethically aligned AGI as a Mathematical formula?
Oliver Siegel
19 Apr 2023 20:13 UTC
−1
points
0
comments
1
min read
LW
link
A Proposal for AI Alignment: Using Directly Opposing Models
Arne B
27 Apr 2023 18:05 UTC
0
points
5
comments
3
min read
LW
link
AI Alignment: A Comprehensive Survey
Stephen McAleer
1 Nov 2023 17:35 UTC
15
points
1
comment
1
min read
LW
link
(arxiv.org)
Is Interpretability All We Need?
RogerDearnaley
14 Nov 2023 5:31 UTC
1
point
1
comment
1
min read
LW
link
Language Model Memorization, Copyright Law, and Conditional Pretraining Alignment
RogerDearnaley
7 Dec 2023 6:14 UTC
9
points
0
comments
11
min read
LW
link
Annotated reply to Bengio’s “AI Scientists: Safe and Useful AI?”
Roman Leventov
8 May 2023 21:26 UTC
18
points
2
comments
7
min read
LW
link
(yoshuabengio.org)
Proposal: Align Systems Earlier In Training
OneManyNone
16 May 2023 16:24 UTC
18
points
0
comments
11
min read
LW
link
The Goal Misgeneralization Problem
Myspy
18 May 2023 23:40 UTC
1
point
0
comments
1
min read
LW
link
(drive.google.com)
An LLM-based “exemplary actor”
Roman Leventov
29 May 2023 11:12 UTC
16
points
0
comments
12
min read
LW
link
Aligning an H-JEPA agent via training on the outputs of an LLM-based “exemplary actor”
Roman Leventov
29 May 2023 11:08 UTC
12
points
10
comments
30
min read
LW
link
Enhancing Corrigibility in AI Systems through Robust Feedback Loops
Justausername
24 Aug 2023 3:53 UTC
1
point
0
comments
6
min read
LW
link
An Open Agency Architecture for Safe Transformative AI
davidad
20 Dec 2022 13:04 UTC
79
points
22
comments
4
min read
LW
link
Supplementary Alignment Insights Through a Highly Controlled Shutdown Incentive
Justausername
23 Jul 2023 16:08 UTC
4
points
1
comment
3
min read
LW
link
Autonomous Alignment Oversight Framework (AAOF)
Justausername
25 Jul 2023 10:25 UTC
−9
points
0
comments
4
min read
LW
link
Moral realism and AI alignment
Caspar Oesterheld
3 Sep 2018 18:46 UTC
13
points
10
comments
1
min read
LW
link
(casparoesterheld.com)
Alignment in Thought Chains
Faust Nemesis
4 Mar 2024 19:24 UTC
1
point
0
comments
2
min read
LW
link
Strong-Misalignment: Does Yudkowsky (or Christiano, or TurnTrout, or Wolfram, or…etc.) Have an Elevator Speech I’m Missing?
Benjamin Bourlier
15 Mar 2024 23:17 UTC
−4
points
3
comments
16
min read
LW
link
Proposal for an AI Safety Prize
sweenesm
31 Jan 2024 18:35 UTC
3
points
0
comments
2
min read
LW
link
Update on Developing an Ethics Calculator to Align an AGI to
sweenesm
12 Mar 2024 12:33 UTC
4
points
2
comments
8
min read
LW
link
Why entropy means you might not have to worry as much about superintelligent AI
Ron J
23 May 2024 3:52 UTC
−26
points
1
comment
2
min read
LW
link
How to safely use an optimizer
Simon Fischer
28 Mar 2024 16:11 UTC
47
points
21
comments
7
min read
LW
link
Slowed ASI—a possible technical strategy for alignment
Lester Leong
14 Jun 2024 0:57 UTC
5
points
2
comments
3
min read
LW
link
No comments.
Back to top