RSS

Adrià Garriga-alonso

Karma: 1,039

Craft­ing Poly­se­man­tic Trans­former Bench­marks with Known Circuits

23 Aug 2024 22:03 UTC
10 points
0 comments25 min readLW link

Pac­ing Out­side the Box: RNNs Learn to Plan in Sokoban

25 Jul 2024 22:00 UTC
59 points
8 comments2 min readLW link
(arxiv.org)

Com­pact Proofs of Model Perfor­mance via Mechanis­tic Interpretability

24 Jun 2024 19:27 UTC
95 points
3 comments8 min readLW link
(arxiv.org)

Catas­trophic Good­hart in RL with KL penalty

15 May 2024 0:58 UTC
62 points
10 comments7 min readLW link

An eval­u­a­tion of cir­cuit eval­u­a­tion metrics

15 Apr 2024 19:38 UTC
18 points
0 comments4 min readLW link

Ophiol­ogy (or, how the Mamba ar­chi­tec­ture works)

9 Apr 2024 19:31 UTC
67 points
8 comments10 min readLW link

Does liter­acy re­move your abil­ity to be a bard as good as Homer?

Adrià Garriga-alonso18 Jan 2024 3:43 UTC
51 points
19 comments3 min readLW link