Aligned AI Proposals

TagLast edit: Jan 4, 2025, 10:05 PM by Dakara

Aligned AI Proposals are proposals aimed at ensuring artificial intelligence systems behave in accordance with human intentions (intent alignment) or human values (value alignment).

The main goal of these proposals is to ensure that AI systems will, all things considered, benefit humanity.

Why Aligning an LLM is Hard, and How to Make it Easier

RogerDearnaleyJan 23, 2025, 6:44 AM

30 points

3 comments4 min readLW link

A “Bitter Lesson” Approach to Aligning AGI and ASI

RogerDearnaleyJul 6, 2024, 1:23 AM

60 points

39 comments24 min readLW link

Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?

RogerDearnaleyJan 11, 2024, 12:56 PM

35 points

4 comments39 min readLW link

Striking Implications for Learning Theory, Interpretability — and Safety?

RogerDearnaleyJan 5, 2024, 8:46 AM

37 points

4 comments2 min readLW link

Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor

RogerDearnaleyJan 9, 2024, 8:42 PM

47 points

8 comments36 min readLW link

How to Control an LLM’s Behavior (why my P(DOOM) went down)

RogerDearnaleyNov 28, 2023, 7:56 PM

64 points

30 comments11 min readLW link

Requirements for a Basin of Attraction to Alignment

RogerDearnaleyFeb 14, 2024, 7:10 AM

41 points

12 comments31 min readLW link

Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis

RogerDearnaleyFeb 1, 2024, 9:15 PM

15 points

15 comments13 min readLW link

Interpreting the Learning of Deceit

RogerDearnaleyDec 18, 2023, 8:12 AM

30 points

14 comments9 min readLW link

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)

Joe CarlsmithOct 28, 2024, 9:57 PM

54 points

5 comments32 min readLW link

Desiderata for an AI

Nathan Helm-BurgerJul 19, 2023, 4:18 PM

9 points

0 comments4 min readLW link

Two paths to win the AGI transition

Nathan Helm-BurgerJul 6, 2023, 9:59 PM

11 points

8 comments4 min readLW link

AI Alignment Metastrategy

Vanessa KosoyDec 31, 2023, 12:06 PM

121 points

13 comments7 min readLW link

A Nonconstructive Existence Proof of Aligned Superintelligence

RokoSep 12, 2024, 3:20 AM

0 points

80 comments1 min readLW link

(transhumanaxiology.substack.com)

Safety First: safety before full alignment. The deontic sufficiency hypothesis.

ChipmonkJan 3, 2024, 5:55 PM

48 points

3 comments3 min readLW link

We have promising alignment plans with low taxes

Seth HerdNov 10, 2023, 6:51 PM

44 points

9 comments5 min readLW link

A list of core AI safety problems and how I hope to solve them

davidadAug 26, 2023, 3:12 PM

165 points

29 comments5 min readLW link

The (partial) fallacy of dumb superintelligence

Seth HerdOct 18, 2023, 9:25 PM

38 points

5 comments4 min readLW link

[Linkpost] Building Altruistic and Moral AI Agent with Brain-inspired Affective Empathy Mechanisms

Gunnar_ZarnckeNov 4, 2024, 10:15 AM

13 points

0 comments1 min readLW link

(arxiv.org)

The Road to Evil Is Paved with Good Objectives: Framework to Classify and Fix Misalignments.

ShivamJan 30, 2025, 2:44 AM

1 point

0 comments11 min readLW link

Speculation on mapping the moral landscape for future Ai Alignment

Sven Heinz (Welwordion)Apr 16, 2023, 1:43 PM

1 point

0 comments1 min readLW link

A Proposal for AI Alignment: Using Directly Opposing Models

Arne BApr 27, 2023, 6:05 PM

0 points

5 comments3 min readLW link

AI Alignment: A Comprehensive Survey

Stephen McAleerNov 1, 2023, 5:35 PM

20 points

1 comment1 min readLW link

(arxiv.org)

Is Interpretability All We Need?

RogerDearnaleyNov 14, 2023, 5:31 AM

1 point

1 comment1 min readLW link

Language Model Memorization, Copyright Law, and Conditional Pretraining Alignment

RogerDearnaleyDec 7, 2023, 6:14 AM

9 points

0 comments11 min readLW link

Annotated reply to Bengio’s “AI Scientists: Safe and Useful AI?”

Roman LeventovMay 8, 2023, 9:26 PM

18 points

2 comments7 min readLW link

(yoshuabengio.org)

Proposal: Align Systems Earlier In Training

OneManyNoneMay 16, 2023, 4:24 PM

18 points

0 comments11 min readLW link

The Goal Misgeneralization Problem

MyspyMay 18, 2023, 11:40 PM

1 point

0 comments1 min readLW link

(drive.google.com)

An LLM-based “exemplary actor”

Roman LeventovMay 29, 2023, 11:12 AM

16 points

0 comments12 min readLW link

Aligning an H-JEPA agent via training on the outputs of an LLM-based “exemplary actor”

Roman LeventovMay 29, 2023, 11:08 AM

12 points

10 comments30 min readLW link

[Question] Share AI Safety Ideas: Both Crazy and Not. №2

ankMar 28, 2025, 5:22 PM

2 points

10 comments1 min readLW link

Supplementary Alignment Insights Through a Highly Controlled Shutdown Incentive

JustausernameJul 23, 2023, 4:08 PM

4 points

1 comment3 min readLW link

Autonomous Alignment Oversight Framework (AAOF)

JustausernameJul 25, 2023, 10:25 AM

−9 points

0 comments4 min readLW link

Embedding Ethical Priors into AI Systems: A Bayesian Approach

JustausernameAug 3, 2023, 3:31 PM

−5 points

3 comments21 min readLW link

Reducing the risk of catastrophically misaligned AI by avoiding the Singleton scenario: the Manyton Variant

GravitasGradientAug 6, 2023, 2:24 PM

−6 points

0 comments3 min readLW link

[Question] Bostrom’s Solution

James BlackmonAug 14, 2023, 5:09 PM

1 point

0 comments1 min readLW link

Enhancing Corrigibility in AI Systems through Robust Feedback Loops

JustausernameAug 24, 2023, 3:53 AM

1 point

0 comments6 min readLW link

An Open Agency Architecture for Safe Transformative AI

davidadDec 20, 2022, 1:04 PM

80 points

22 comments4 min readLW link

Scalable Oversight and Weak-to-Strong Generalization: Compatible approaches to the same problem

Ansh Radhakrishnan, Buck, ryan_greenblatt and Fabien Roger

Dec 16, 2023, 5:49 AM

76 points

4 comments6 min readLW link 1 review

Lifelogging for Alignment & Immortality

Dev.ErrataAug 17, 2024, 11:42 PM

13 points

3 comments7 min readLW link

Update on Developing an Ethics Calculator to Align an AGI to

sweenesmMar 12, 2024, 12:33 PM

4 points

2 comments8 min readLW link

Alignment in Thought Chains

Faust NemesisMar 4, 2024, 7:24 PM

1 point

0 comments2 min readLW link

Moral realism and AI alignment

Caspar OesterheldSep 3, 2018, 6:46 PM

13 points

10 comments1 min readLW link

(casparoesterheld.com)

Proposal for an AI Safety Prize

sweenesmJan 31, 2024, 6:35 PM

3 points

0 comments2 min readLW link

How to safely use an optimizer

Simon FischerMar 28, 2024, 4:11 PM

47 points

21 comments7 min readLW link

Slowed ASI—a possible technical strategy for alignment

Lester LeongJun 14, 2024, 12:57 AM

5 points

2 comments3 min readLW link

Intelligence–Agency Equivalence ≈ Mass–Energy Equivalence: On Static Nature of Intelligence & Physicalization of Ethics

ankFeb 22, 2025, 12:12 AM

1 point

0 comments6 min readLW link

aimless ace analyzes active amateur: a micro-aaaaalignment proposal

lemonhopeJul 21, 2024, 12:37 PM

12 points

0 comments1 min readLW link

Toward a Human Hybrid Language for Enhanced Human-Machine Communication: Addressing the AI Alignment Problem

Andndn DheudndAug 14, 2024, 10:19 PM

−4 points

2 comments4 min readLW link

Making alignment a law of the universe

jugginsFeb 25, 2025, 10:44 AM

0 points

3 comments15 min readLW link

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II

Lester LeongOct 14, 2024, 4:05 AM

60 points

9 comments12 min readLW link

Recursive alignment with the principle of alignment

hiveFeb 27, 2025, 2:34 AM

9 points

1 comment15 min readLW link

(hiveism.substack.com)

[Question] Share AI Safety Ideas: Both Crazy and Not

ankMar 1, 2025, 7:08 PM

16 points

28 comments1 min readLW link

Rational Effective Utopia & Narrow Way There: Multiversal AI Alignment, Place AI, New Ethicophysics… (Updated)

ankFeb 11, 2025, 3:21 AM

13 points

8 comments35 min readLW link

Artificial Static Place Intelligence: Guaranteed Alignment

ankFeb 15, 2025, 11:08 AM

2 points

2 comments2 min readLW link

Give Neo a Chance

ankMar 6, 2025, 1:48 AM

3 points

7 comments7 min readLW link

Is Alignment a flawed approach?

Patrick BernardMar 11, 2025, 8:32 PM

1 point

0 comments3 min readLW link

No comments.