RSS

Hu­man-AI Safety

TagLast edit: Jul 17, 2023, 11:19 PM by Wei Dai

Two Ne­glected Prob­lems in Hu­man-AI Safety

Wei DaiDec 16, 2018, 10:13 PM
102 points
25 comments2 min readLW link

Three AI Safety Re­lated Ideas

Wei DaiDec 13, 2018, 9:32 PM
69 points
38 comments2 min readLW link

Mo­ral­ity is Scary

Wei DaiDec 2, 2021, 6:35 AM
221 points
116 comments4 min readLW link1 review

A broad basin of at­trac­tion around hu­man val­ues?

Wei DaiApr 12, 2022, 5:15 AM
114 points
18 comments2 min readLW link

So­ci­aLLM: pro­posal for a lan­guage model de­sign for per­son­al­ised apps, so­cial sci­ence, and AI safety research

Roman LeventovDec 19, 2023, 4:49 PM
17 points
5 comments3 min readLW link

Let’s ask some of the largest LLMs for tips and ideas on how to take over the world

Super AGIFeb 24, 2024, 8:35 PM
1 point
0 comments7 min readLW link

[Question] Will OpenAI also re­quire a “Su­per Red Team Agent” for its “Su­per­al­ign­ment” Pro­ject?

Super AGIMar 30, 2024, 5:25 AM
2 points
2 comments1 min readLW link

Re­cur­sive Cog­ni­tive Refine­ment (RCR): A Self-Cor­rect­ing Ap­proach for LLM Hallucinations

mxTheoFeb 22, 2025, 9:32 PM
0 points
0 comments2 min readLW link

The Check­list: What Suc­ceed­ing at AI Safety Will In­volve

Sam BowmanSep 3, 2024, 6:18 PM
149 points
49 comments22 min readLW link
(sleepinyourhat.github.io)

Hu­man-AI Com­ple­men­tar­ity: A Goal for Am­plified Oversight

Dec 24, 2024, 9:57 AM
27 points
4 comments1 min readLW link
(deepmindsafetyresearch.medium.com)

Will AI and Hu­man­ity Go to War?

Simon GoldsteinOct 1, 2024, 6:35 AM
9 points
4 comments6 min readLW link

I Recom­mend More Train­ing Rationales

Gianluca CalcagniDec 31, 2024, 2:06 PM
2 points
0 comments6 min readLW link

Launch­ing Ap­pli­ca­tions for the Global AI Safety Fel­low­ship 2025!

Aditya_SKNov 30, 2024, 2:02 PM
11 points
5 comments1 min readLW link

A Univer­sal Prompt as a Safe­guard Against AI Threats

Zhaiyk SultanMar 10, 2025, 2:28 AM
1 point
0 comments2 min readLW link

Tether­ware #1: The case for hu­man­like AI with free will

Jáchym FibírJan 30, 2025, 10:58 AM
5 points
11 comments10 min readLW link
(tetherware.substack.com)

Ma­chine Un­learn­ing in Large Lan­guage Models: A Com­pre­hen­sive Sur­vey with Em­piri­cal In­sights from the Qwen 1.5 1.8B Model

Saketh BaddamFeb 1, 2025, 9:26 PM
9 points
2 comments11 min readLW link

AI Safety Oversights

Davey MorseFeb 8, 2025, 6:15 AM
3 points
0 comments1 min readLW link

Gra­di­ent Anatomy’s—Hal­lu­ci­na­tion Ro­bust­ness in Med­i­cal Q&A

DieSabFeb 12, 2025, 7:16 PM
2 points
0 comments10 min readLW link

OpenAI’s NSFW policy: user safety, harm re­duc­tion, and AI consent

8e9Feb 13, 2025, 1:59 PM
4 points
3 comments2 min readLW link

Out of the Box

jesseduffieldNov 13, 2023, 11:43 PM
5 points
1 comment7 min readLW link

Ap­ply to the Con­cep­tual Boundaries Work­shop for AI Safety

ChipmonkNov 27, 2023, 9:04 PM
50 points
0 comments3 min readLW link

Public Opinion on AI Safety: AIMS 2023 and 2021 Summary

Sep 25, 2023, 6:55 PM
3 points
2 comments3 min readLW link
(www.sentienceinstitute.org)

Safety First: safety be­fore full al­ign­ment. The de­on­tic suffi­ciency hy­poth­e­sis.

ChipmonkJan 3, 2024, 5:55 PM
48 points
3 comments3 min readLW link

Gaia Net­work: An Illus­trated Primer

Jan 18, 2024, 6:23 PM
3 points
2 comments15 min readLW link
No comments.