RSS

7vik

Karma: 337

I research intelligence and it’s emergence and expression in neural networks to ensure advanced AI is safe and beneficial.

Current interests: neural network interpretability, alignment/​safety, unsupervised learning, and deep learning theory.

For more, check out my scholar profile and personal website.

Among Us: A Sand­box for Agen­tic Deception

Apr 5, 2025, 6:24 AM
102 points
4 comments7 min readLW link

Au­dit­ing lan­guage mod­els for hid­den objectives

Mar 13, 2025, 7:18 PM
138 points
15 comments13 min readLW link

Some les­sons from the OpenAI-Fron­tierMath debacle

7vikJan 19, 2025, 9:09 PM
69 points
9 comments4 min readLW link

In­tri­ca­cies of Fea­ture Geom­e­try in Large Lan­guage Models

Dec 7, 2024, 6:10 PM
68 points
0 comments12 min readLW link

The Geom­e­try of Feel­ings and Non­sense in Large Lan­guage Models

Sep 27, 2024, 5:49 PM
61 points
10 comments4 min readLW link