Marc Carauleanu

Karma: 659

AI Safety Researcher @AE Studio

Currently researching a neglected prior for cooperation and honesty inspired by the cognitive neuroscience of altruism called self-other overlap in state-of-the-art ML models.

Previous SRF @ SERI 21′, MLSS & Student Researcher @ CAIS 22′ and LTFF grantee.

Mistral Large 2 (123B) seems to exhibit alignment faking

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Cameron Berg, Judd Rosenblatt, Mike Vaiana and AE Studio

27 Mar 2025 15:39 UTC

80 points

4 comments13 min readLW link

Reducing LLM deception at scale with self-other overlap fine-tuning

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Judd Rosenblatt, Cameron Berg, Mike Vaiana and AE Studio

13 Mar 2025 19:09 UTC

155 points

41 comments6 min readLW link

Self-Other Overlap: A Neglected Approach to AI Alignment

Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena, Cameron Berg and AE Studio

30 Jul 2024 16:22 UTC

223 points

51 comments12 min readLW link

The ‘Neglected Approaches’ Approach: AE Studio’s Alignment Agenda

Cameron Berg, Judd Rosenblatt, AE Studio and Marc Carauleanu

18 Dec 2023 20:35 UTC

178 points

23 comments12 min readLW link 1 review

Towards empathy in RL agents and beyond: Insights from cognitive science for AI Alignment

Marc Carauleanu3 Apr 2023 19:59 UTC

15 points

6 comments1 min readLW link

(clipchamp.com)

Lessons learned and review of the AI Safety Nudge Competition

Marc Carauleanu17 Jan 2023 17:13 UTC

3 points

0 comments LW link

Winners of the AI Safety Nudge Competition

Marc Carauleanu15 Nov 2022 1:06 UTC

4 points

0 comments LW link

Announcing the AI Safety Nudge Competition to Help Beat Procrastination

Marc Carauleanu1 Oct 2022 1:49 UTC

10 points

0 comments LW link

Should we rely on the speed prior for safety?

Marc Carauleanu14 Dec 2021 20:45 UTC

14 points

5 comments5 min readLW link