RSS

Amirali Abdullah

Karma: 32

Steer­ing Lan­guage Models in Mul­ti­ple Direc­tions Simultaneously

May 2, 2025, 3:27 PM
18 points
0 comments7 min readLW link

Back­doors have uni­ver­sal rep­re­sen­ta­tions across large lan­guage models

Dec 6, 2024, 10:56 PM
16 points
0 comments16 min readLW link

Early Ex­per­i­ments in Re­ward Model In­ter­pre­ta­tion Us­ing Sparse Autoencoders

Oct 3, 2023, 7:45 AM
17 points
0 comments5 min readLW link