RSS

mikes

Karma: 214

Break­ing Cir­cuit Breakers

Jul 14, 2024, 6:57 PM
53 points
13 comments1 min readLW link
(confirmlabs.org)

Fluent dream­ing for lan­guage mod­els (AI in­ter­pretabil­ity method)

Feb 6, 2024, 6:02 AM
46 points
5 comments1 min readLW link
(arxiv.org)

Take­aways from the NeurIPS 2023 Tro­jan De­tec­tion Competition

mikesJan 13, 2024, 12:35 PM
20 points
2 comments1 min readLW link
(confirmlabs.org)