RSS

Marc Carauleanu

Karma: 642

AI Safety Researcher @AE Studio

Currently researching a neglected prior for cooperation and honesty inspired by the cognitive neuroscience of altruism called self-other overlap in state-of-the-art ML models.

Previous SRF @ SERI 21′, MLSS & Student Researcher @ CAIS 22′ and LTFF grantee.

LinkedIn

Mis­tral Large 2 (123B) ex­hibits al­ign­ment faking

Mar 27, 2025, 3:39 PM
80 points
4 comments13 min readLW link

Re­duc­ing LLM de­cep­tion at scale with self-other over­lap fine-tuning

Mar 13, 2025, 7:09 PM
154 points
40 comments6 min readLW link

Self-Other Over­lap: A Ne­glected Ap­proach to AI Alignment

Jul 30, 2024, 4:22 PM
215 points
49 comments12 min readLW link

The ‘Ne­glected Ap­proaches’ Ap­proach: AE Stu­dio’s Align­ment Agenda

Dec 18, 2023, 8:35 PM
175 points
22 comments12 min readLW link1 review

Towards em­pa­thy in RL agents and be­yond: In­sights from cog­ni­tive sci­ence for AI Align­ment

Marc CarauleanuApr 3, 2023, 7:59 PM
15 points
6 comments1 min readLW link
(clipchamp.com)

Les­sons learned and re­view of the AI Safety Nudge Competition

Marc CarauleanuJan 17, 2023, 5:13 PM
3 points
0 commentsLW link

Win­ners of the AI Safety Nudge Competition

Marc CarauleanuNov 15, 2022, 1:06 AM
4 points
0 commentsLW link

An­nounc­ing the AI Safety Nudge Com­pe­ti­tion to Help Beat Procrastination

Marc CarauleanuOct 1, 2022, 1:49 AM
10 points
0 commentsLW link

Should we rely on the speed prior for safety?

Marc CarauleanuDec 14, 2021, 8:45 PM
14 points
5 comments5 min readLW link