RSS

bilalchughtai

Karma: 909

My website is here.

De­tect­ing Strate­gic De­cep­tion Us­ing Lin­ear Probes

Feb 6, 2025, 3:46 PM
100 points
9 comments2 min readLW link
(arxiv.org)

Paper: Open Prob­lems in Mechanis­tic Interpretability

Jan 29, 2025, 10:25 AM
68 points
0 comments1 min readLW link
(arxiv.org)

Ac­ti­va­tion space in­ter­pretabil­ity may be doomed

Jan 8, 2025, 12:49 PM
145 points
32 comments8 min readLW link

Rea­sons for and against work­ing on tech­ni­cal AI safety at a fron­tier AI lab

bilalchughtaiJan 5, 2025, 2:49 PM
97 points
12 comments12 min readLW link

Book Sum­mary: Zero to One

bilalchughtaiDec 29, 2024, 4:13 PM
27 points
2 comments8 min readLW link

Remap your caps lock key

bilalchughtaiDec 15, 2024, 2:03 PM
80 points
18 comments1 min readLW link