Adam Karvonen

Eval­u­at­ing Sparse Au­toen­coders with Board Game Models

2 Aug 2024 19:50 UTC
Us­ing an LLM per­plex­ity filter to de­tect weight exfiltration

Adam Karvonen21 Jul 2024 18:18 UTC
Othel­loGPT learned a bag of heuristics

2 Jul 2024 9:12 UTC
An In­tu­itive Ex­pla­na­tion of Sparse Au­toen­coders for Mechanis­tic In­ter­pretabil­ity of LLMs

Adam Karvonen25 Jun 2024 15:57 UTC
A Chess-GPT Lin­ear Emer­gent World Representation

Adam Karvonen8 Feb 2024 4:25 UTC
