RSS

RobertKirk

Karma: 319

PhD student at UCL DARK doing RL, OOD Robustness and safety. Interested in self improvement.

A Sober Look at Steer­ing Vec­tors for LLMs

Nov 23, 2024, 5:30 PM
38 points
0 comments5 min readLW link

Spec­u­la­tive in­fer­ences about path de­pen­dence in LLM su­per­vised fine-tun­ing from re­sults on lin­ear mode con­nec­tivity and model souping

RobertKirkJul 20, 2023, 9:56 AM
39 points
2 comments5 min readLW link

Causal con­fu­sion as an ar­gu­ment against the scal­ing hypothesis

Jun 20, 2022, 10:54 AM
86 points
30 comments15 min readLW link

Spar­sity and in­ter­pretabil­ity?

Jun 1, 2020, 1:25 PM
41 points
3 comments7 min readLW link

How can In­ter­pretabil­ity help Align­ment?

May 23, 2020, 4:16 PM
37 points
3 comments9 min readLW link

What is In­ter­pretabil­ity?

Mar 17, 2020, 8:23 PM
39 points
1 comment11 min readLW link