Adam Jermyn

Karma: 1,684

Tracing the Thoughts of a Large Language Model

Adam JermynMar 27, 2025, 5:20 PM

304 points

24 comments10 min readLW link

(www.anthropic.com)

Auditing language models for hidden objectives

Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Kei, 7vik, Akbir Khan, Austin Meek, Euan Ong, Christopher Olah, Fabien Roger, jeanne_, Meg, Drake Thomas, Adam Jermyn, Monte M and evhub

Mar 13, 2025, 7:18 PM

141 points

15 comments13 min readLW link

Conditioning Predictive Models: Open problems, Conclusion, and Appendix

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

Feb 10, 2023, 7:21 PM

36 points

3 comments11 min readLW link

Conditioning Predictive Models: Deployment strategy

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

Feb 9, 2023, 8:59 PM

28 points

0 comments10 min readLW link

Conditioning Predictive Models: Interactions with other approaches

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

Feb 8, 2023, 6:19 PM

32 points

2 comments11 min readLW link

Conditioning Predictive Models: Making inner alignment as easy as possible

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

Feb 7, 2023, 8:04 PM

27 points

2 comments19 min readLW link

Conditioning Predictive Models: The case for competitiveness

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

Feb 6, 2023, 8:08 PM

20 points

3 comments11 min readLW link

Conditioning Predictive Models: Outer alignment via careful conditioning

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

Feb 2, 2023, 8:28 PM

72 points

15 comments57 min readLW link

Conditioning Predictive Models: Large language models as predictors

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

Feb 2, 2023, 8:28 PM

88 points

4 comments13 min readLW link

Underspecification of Oracle AI

Rubi J. Hudson, Adam Jermyn and Johannes Treutlein

Jan 15, 2023, 8:10 PM

30 points

12 comments19 min readLW link

Multi-Component Learning and S-Curves

Adam Jermyn and Buck

Nov 30, 2022, 1:37 AM

63 points

24 comments7 min readLW link

Engineering Monosemanticity in Toy Models

Adam Jermyn, evhub and Nicholas Schiefer

Nov 18, 2022, 1:43 AM

75 points

7 comments3 min readLW link

(arxiv.org)

Toy Models and Tegum Products

Adam JermynNov 4, 2022, 6:51 PM

28 points

7 comments5 min readLW link

Humans do acausal coordination all the time

Adam JermynNov 2, 2022, 2:40 PM

57 points

35 comments3 min readLW link

Polysemanticity and Capacity in Neural Networks

Buck, Adam Jermyn and Kshitij Sachan

Oct 7, 2022, 5:51 PM

87 points

14 comments3 min readLW link

Smoke without fire is scary

Adam Jermyn4 Oct 2022 21:08 UTC

52 points

22 comments4 min readLW link

It matters when the first sharp left turn happens

Adam Jermyn29 Sep 2022 20:12 UTC

45 points

9 comments4 min readLW link

Brief Notes on Transformers

Adam Jermyn26 Sep 2022 14:46 UTC

48 points

3 comments2 min readLW link

Conditioning, Prompts, and Fine-Tuning

Adam Jermyn17 Aug 2022 20:52 UTC

38 points

9 comments4 min readLW link

Conditioning Generative Models with Restrictions

Adam Jermyn21 Jul 2022 20:33 UTC

18 points

4 comments8 min readLW link