Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Mike Vaiana
Karma:
545
All
Posts
Comments
New
Top
Old
Mistral Large 2 (123B) exhibits alignment faking
Marc Carauleanu
,
Diogo de Lucena
,
Gunnar_Zarncke
,
Cameron Berg
,
Judd Rosenblatt
,
Mike Vaiana
and
AE Studio
Mar 27, 2025, 3:39 PM
80
points
4
comments
13
min read
LW
link
Reducing LLM deception at scale with self-other overlap fine-tuning
Marc Carauleanu
,
Diogo de Lucena
,
Gunnar_Zarncke
,
Judd Rosenblatt
,
Cameron Berg
,
Mike Vaiana
and
AE Studio
Mar 13, 2025, 7:09 PM
155
points
40
comments
6
min read
LW
link
Self-prediction acts as an emergent regularizer
Cameron Berg
,
Judd Rosenblatt
,
Mike Vaiana
,
Diogo de Lucena
,
florin_pop
and
AE Studio
Oct 23, 2024, 10:27 PM
91
points
9
comments
4
min read
LW
link
Self-Other Overlap: A Neglected Approach to AI Alignment
Marc Carauleanu
,
Mike Vaiana
,
Judd Rosenblatt
,
Diogo de Lucena
,
Cameron Berg
and
AE Studio
Jul 30, 2024, 4:22 PM
215
points
51
comments
12
min read
LW
link
Video Intro to Guaranteed Safe AI
Mike Vaiana
,
Diogo de Lucena
and
AE Studio
Jul 11, 2024, 5:53 PM
27
points
0
comments
1
min read
LW
link
(youtu.be)
DIY RLHF: A simple implementation for hands on experience
Mike Vaiana
and
AE Studio
Jul 10, 2024, 12:07 PM
28
points
0
comments
6
min read
LW
link
Back to top
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel