Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
bilalchughtai
Karma:
922
My website is
here
.
All
Posts
Comments
New
Top
Old
Detecting Strategic Deception Using Linear Probes
Nicholas Goldowsky-Dill
,
bilalchughtai
,
StefanHex
and
Marius Hobbhahn
Feb 6, 2025, 3:46 PM
102
points
9
comments
2
min read
LW
link
(arxiv.org)
Paper: Open Problems in Mechanistic Interpretability
Lee Sharkey
and
bilalchughtai
Jan 29, 2025, 10:25 AM
68
points
0
comments
1
min read
LW
link
(arxiv.org)
Activation space interpretability may be doomed
bilalchughtai
and
Lucius Bushnaq
Jan 8, 2025, 12:49 PM
147
points
33
comments
8
min read
LW
link
Reasons for and against working on technical AI safety at a frontier AI lab
bilalchughtai
Jan 5, 2025, 2:49 PM
100
points
12
comments
12
min read
LW
link
Book Summary: Zero to One
bilalchughtai
Dec 29, 2024, 4:13 PM
27
points
2
comments
8
min read
LW
link
Remap your caps lock key
bilalchughtai
Dec 15, 2024, 2:03 PM
80
points
18
comments
1
min read
LW
link
You should consider applying to PhDs (soon!)
bilalchughtai
Nov 29, 2024, 8:33 PM
114
points
19
comments
6
min read
LW
link
bilalchughtai’s Shortform
bilalchughtai
Jul 29, 2024, 6:57 PM
5
points
10
comments
LW
link
Understanding Positional Features in Layer 0 SAEs
bilalchughtai
and
Yeu-Tong Lau
Jul 29, 2024, 9:36 AM
43
points
0
comments
5
min read
LW
link
Unlearning via RMU is mostly shallow
Andy Arditi
and
bilalchughtai
Jul 23, 2024, 4:07 PM
54
points
4
comments
6
min read
LW
link
Transformer Circuit Faithfulness Metrics Are Not Robust
Joseph Miller
,
bilalchughtai
and
William_S
Jul 12, 2024, 3:47 AM
104
points
5
comments
7
min read
LW
link
(arxiv.org)
Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
L Rudolf L
,
bilalchughtai
,
Jan Betley
,
kaivu
,
Jérémy Scheurer
,
Mikita Balesni
,
AlexMeinke
,
Owain_Evans
and
Marius Hobbhahn
Jul 8, 2024, 10:24 PM
109
points
37
comments
5
min read
LW
link
Back to top
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel