Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Fazl
Karma:
118
All
Posts
Comments
New
Top
Old
Best-of-N Jailbreaking
John Hughes
,
saraprice
,
Aengus Lynch
,
Rylan Schaeffer
,
Fazl
,
Henry Sleight
,
Ethan Perez
and
mrinank_sharma
Dec 14, 2024, 4:58 AM
78
points
5
comments
2
min read
LW
link
(arxiv.org)
Visualizing neural network planning
Nevan Wichers
,
Victor Tao
,
Fazl
and
Riccardo Volpato
May 9, 2024, 6:40 AM
4
points
0
comments
5
min read
LW
link
Mechanistic Interpretability Workshop Happening at ICML 2024!
Neel Nanda
,
LawrenceC
and
Fazl
May 3, 2024, 1:18 AM
48
points
6
comments
1
min read
LW
link
Early Experiments in Reward Model Interpretation Using Sparse Autoencoders
lukemarks
,
Amirali Abdullah
,
Rauno Arike
,
Fazl
and
nothoughtsheadempty
Oct 3, 2023, 7:45 AM
17
points
0
comments
5
min read
LW
link
Automated Sandwiching & Quantifying Human-LLM Cooperation: ScaleOversight hackathon results
Esben Kran
,
Fazl
,
Sabrina Zaki
,
gabrielrecc
and
rz2383
Feb 23, 2023, 10:48 AM
8
points
0
comments
6
min read
LW
link
Back to top
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel