RSS

mikes

Karma: 213

Break­ing Cir­cuit Breakers

14 Jul 2024 18:57 UTC
53 points
13 comments1 min readLW link
(confirmlabs.org)

Fluent dream­ing for lan­guage mod­els (AI in­ter­pretabil­ity method)

6 Feb 2024 6:02 UTC
45 points
5 comments1 min readLW link
(arxiv.org)

Take­aways from the NeurIPS 2023 Tro­jan De­tec­tion Competition

mikes13 Jan 2024 12:35 UTC
20 points
2 comments1 min readLW link
(confirmlabs.org)