RSS

Nandi

Karma: 227

In­tri­ca­cies of Fea­ture Geom­e­try in Large Lan­guage Models

Dec 7, 2024, 6:10 PM
68 points
0 comments12 min readLW link

The Geom­e­try of Feel­ings and Non­sense in Large Lan­guage Models

Sep 27, 2024, 5:49 PM
61 points
10 comments4 min readLW link

[Paper] Hid­den in Plain Text: Emer­gence and Miti­ga­tion of Stegano­graphic Col­lu­sion in LLMs

Sep 25, 2024, 2:52 PM
37 points
2 comments4 min readLW link
(arxiv.org)

Ro­bust­ness of Con­trast-Con­sis­tent Search to Ad­ver­sar­ial Prompting

Nov 1, 2023, 12:46 PM
18 points
1 comment7 min readLW link

Ma­chine Un­learn­ing Eval­u­a­tions as In­ter­pretabil­ity Benchmarks

Oct 23, 2023, 4:33 PM
33 points
2 comments11 min readLW link

Split­ting De­bate up into Two Subsystems

NandiJul 3, 2020, 8:11 PM
13 points
5 comments4 min readLW link

Ac­knowl­edg­ing Hu­man Prefer­ence Types to Sup­port Value Learning

NandiNov 13, 2018, 6:57 PM
34 points
4 comments9 min readLW link