Hoagy

Karma: 936

Some additional SAE thoughts

Hoagy13 Jan 2024 19:31 UTC

30 points

4 comments13 min readLW link

Sparse Autoencoders Find Highly Interpretable Directions in Language Models

Logan Riggs, Hoagy, Aidan Ewart and Robert_AIZI

21 Sep 2023 15:30 UTC

158 points

8 comments5 min readLW link

AutoInterpretation Finds Sparse Coding Beats Alternatives

Hoagy17 Jul 2023 1:41 UTC

56 points

1 comment7 min readLW link

[Replication] Conjecture’s Sparse Coding in Small Transformers

Hoagy and Logan Riggs

16 Jun 2023 18:02 UTC

52 points

0 comments5 min readLW link

[Replication] Conjecture’s Sparse Coding in Toy Models

Hoagy and Logan Riggs

2 Jun 2023 17:34 UTC

24 points

0 comments1 min readLW link

Universality and Hidden Information in Concept Bottleneck Models

Hoagy5 Apr 2023 14:00 UTC

23 points

0 comments11 min readLW link

Nokens: A potential method of investigating glitch tokens

Hoagy15 Mar 2023 16:23 UTC

21 points

0 comments4 min readLW link

Automating Consistency

Hoagy17 Feb 2023 13:24 UTC

10 points

0 comments1 min readLW link

Distilled Representations Research Agenda

Hoagy and mishajw

18 Oct 2022 20:59 UTC

15 points

2 comments8 min readLW link

Remaking EfficientZero (as best I can)

Hoagy4 Jul 2022 11:03 UTC

36 points

9 comments22 min readLW link

Note-Taking without Hidden Messages

Hoagy30 Apr 2022 11:15 UTC

17 points

2 comments4 min readLW link

ELK Sub—Note-taking in internal rollouts

Hoagy9 Mar 2022 17:23 UTC

6 points

0 comments5 min readLW link

Automated Fact Checking: A Look at the Field

Hoagy6 Oct 2021 23:52 UTC

12 points

0 comments8 min readLW link

Hoagy’s Shortform

Hoagy21 Sep 2020 22:00 UTC

3 points

12 comments1 min readLW link

Safe Scrambling?

Hoagy29 Aug 2020 14:31 UTC

3 points

1 comment2 min readLW link

When do utility functions constrain?

Hoagy23 Aug 2019 17:19 UTC

30 points

8 comments7 min readLW link