Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Human-AI Safety
Tag
Last edit:
Jul 17, 2023, 11:19 PM
by
Wei Dai
Relevant
New
Old
Three AI Safety Related Ideas
Wei Dai
Dec 13, 2018, 9:32 PM
69
points
38
comments
2
min read
LW
link
Morality is Scary
Wei Dai
Dec 2, 2021, 6:35 AM
228
points
116
comments
4
min read
LW
link
1
review
A broad basin of attraction around human values?
Wei Dai
Apr 12, 2022, 5:15 AM
114
points
18
comments
2
min read
LW
link
Two Neglected Problems in Human-AI Safety
Wei Dai
Dec 16, 2018, 10:13 PM
102
points
25
comments
2
min read
LW
link
SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research
Roman Leventov
Dec 19, 2023, 4:49 PM
17
points
5
comments
3
min read
LW
link
Let’s ask some of the largest LLMs for tips and ideas on how to take over the world
Super AGI
Feb 24, 2024, 8:35 PM
1
point
0
comments
7
min read
LW
link
[Question]
Will OpenAI also require a “Super Red Team Agent” for its “Superalignment” Project?
Super AGI
Mar 30, 2024, 5:25 AM
2
points
2
comments
1
min read
LW
link
“Toward Safe Self-Evolving AI: Modular Memory and Post-Deployment Alignment”
Manasa Dwarapureddy
May 2, 2025, 5:02 PM
1
point
0
comments
3
min read
LW
link
Recursive Cognitive Refinement (RCR): A Self-Correcting Approach for LLM Hallucinations
mxTheo
Feb 22, 2025, 9:32 PM
0
points
0
comments
2
min read
LW
link
The Checklist: What Succeeding at AI Safety Will Involve
Sam Bowman
Sep 3, 2024, 6:18 PM
142
points
49
comments
22
min read
LW
link
(sleepinyourhat.github.io)
Human-AI Complementarity: A Goal for Amplified Oversight
rishubjain
and
Sophie Bridgers
Dec 24, 2024, 9:57 AM
27
points
4
comments
1
min read
LW
link
(deepmindsafetyresearch.medium.com)
Will AI and Humanity Go to War?
Simon Goldstein
Oct 1, 2024, 6:35 AM
9
points
4
comments
6
min read
LW
link
I Recommend More Training Rationales
Gianluca Calcagni
Dec 31, 2024, 2:06 PM
2
points
0
comments
6
min read
LW
link
Launching Applications for the Global AI Safety Fellowship 2025!
Aditya_SK
Nov 30, 2024, 2:02 PM
11
points
5
comments
1
min read
LW
link
A Universal Prompt as a Safeguard Against AI Threats
Zhaiyk Sultan
Mar 10, 2025, 2:28 AM
1
point
0
comments
2
min read
LW
link
A Safer Path to AGI? Considering the Self-to-Processing Route as an Alternative to Processing-to-Self
op
Apr 21, 2025, 1:09 PM
1
point
0
comments
1
min read
LW
link
Tetherware #1: The case for humanlike AI with free will
Jáchym Fibír
Jan 30, 2025, 10:58 AM
5
points
14
comments
10
min read
LW
link
(tetherware.substack.com)
Machine Unlearning in Large Language Models: A Comprehensive Survey with Empirical Insights from the Qwen 1.5 1.8B Model
Saketh Baddam
Feb 1, 2025, 9:26 PM
9
points
2
comments
11
min read
LW
link
AI Safety Oversights
Davey Morse
Feb 8, 2025, 6:15 AM
3
points
0
comments
1
min read
LW
link
Gradient Anatomy’s—Hallucination Robustness in Medical Q&A
DieSab
Feb 12, 2025, 7:16 PM
2
points
0
comments
10
min read
LW
link
OpenAI’s NSFW policy: user safety, harm reduction, and AI consent
8e9
Feb 13, 2025, 1:59 PM
4
points
3
comments
2
min read
LW
link
Out of the Box
jesseduffield
Nov 13, 2023, 11:43 PM
5
points
1
comment
7
min read
LW
link
Apply to the Conceptual Boundaries Workshop for AI Safety
Chipmonk
Nov 27, 2023, 9:04 PM
50
points
0
comments
3
min read
LW
link
Public Opinion on AI Safety: AIMS 2023 and 2021 Summary
Jacy Reese Anthis
,
Janet Pauketat
and
Ali
Sep 25, 2023, 6:55 PM
3
points
2
comments
3
min read
LW
link
(www.sentienceinstitute.org)
Title: IAM360: The Future of Human-AI Symbiosis — Can We Reach Investors?
Bruno Massena Massena
Apr 29, 2025, 7:02 PM
1
point
0
comments
1
min read
LW
link
Safety First: safety before full alignment. The deontic sufficiency hypothesis.
Chipmonk
Jan 3, 2024, 5:55 PM
48
points
3
comments
3
min read
LW
link
Gaia Network: An Illustrated Primer
Rafael Kaufmann Nedal
and
Roman Leventov
Jan 18, 2024, 6:23 PM
3
points
2
comments
15
min read
LW
link
No comments.
Back to top
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel