RSS

Nandi

Karma: 210

In­tri­ca­cies of Fea­ture Geom­e­try in Large Lan­guage Models

7 Dec 2024 18:10 UTC
59 points
0 comments12 min readLW link

The Geom­e­try of Feel­ings and Non­sense in Large Lan­guage Models

27 Sep 2024 17:49 UTC
58 points
10 comments4 min readLW link

[Paper] Hid­den in Plain Text: Emer­gence and Miti­ga­tion of Stegano­graphic Col­lu­sion in LLMs

25 Sep 2024 14:52 UTC
30 points
2 comments4 min readLW link
(arxiv.org)

Ro­bust­ness of Con­trast-Con­sis­tent Search to Ad­ver­sar­ial Prompting

1 Nov 2023 12:46 UTC
18 points
1 comment7 min readLW link

Ma­chine Un­learn­ing Eval­u­a­tions as In­ter­pretabil­ity Benchmarks

23 Oct 2023 16:33 UTC
33 points
2 comments11 min readLW link

Split­ting De­bate up into Two Subsystems

Nandi3 Jul 2020 20:11 UTC
13 points
5 comments4 min readLW link

Ac­knowl­edg­ing Hu­man Prefer­ence Types to Sup­port Value Learning

Nandi13 Nov 2018 18:57 UTC
34 points
4 comments9 min readLW link