Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Adam Jermyn
Karma:
1,684
All
Posts
Comments
New
Top
Old
Page
1
Tracing the Thoughts of a Large Language Model
Adam Jermyn
Mar 27, 2025, 5:20 PM
304
points
24
comments
10
min read
LW
link
(www.anthropic.com)
Auditing language models for hidden objectives
Sam Marks
,
Johannes Treutlein
,
dmz
,
Sam Bowman
,
Hoagy
,
Carson Denison
,
Kei
,
7vik
,
Akbir Khan
,
Austin Meek
,
Euan Ong
,
Christopher Olah
,
Fabien Roger
,
jeanne_
,
Meg
,
Drake Thomas
,
Adam Jermyn
,
Monte M
and
evhub
Mar 13, 2025, 7:18 PM
141
points
15
comments
13
min read
LW
link
Conditioning Predictive Models: Open problems, Conclusion, and Appendix
evhub
,
Adam Jermyn
,
Johannes Treutlein
,
Rubi J. Hudson
and
kcwoolverton
Feb 10, 2023, 7:21 PM
36
points
3
comments
11
min read
LW
link
Conditioning Predictive Models: Deployment strategy
evhub
,
Adam Jermyn
,
Johannes Treutlein
,
Rubi J. Hudson
and
kcwoolverton
Feb 9, 2023, 8:59 PM
28
points
0
comments
10
min read
LW
link
Conditioning Predictive Models: Interactions with other approaches
evhub
,
Adam Jermyn
,
Johannes Treutlein
,
Rubi J. Hudson
and
kcwoolverton
Feb 8, 2023, 6:19 PM
32
points
2
comments
11
min read
LW
link
Conditioning Predictive Models: Making inner alignment as easy as possible
evhub
,
Adam Jermyn
,
Johannes Treutlein
,
Rubi J. Hudson
and
kcwoolverton
Feb 7, 2023, 8:04 PM
27
points
2
comments
19
min read
LW
link
Conditioning Predictive Models: The case for competitiveness
evhub
,
Adam Jermyn
,
Johannes Treutlein
,
Rubi J. Hudson
and
kcwoolverton
Feb 6, 2023, 8:08 PM
20
points
3
comments
11
min read
LW
link
Conditioning Predictive Models: Outer alignment via careful conditioning
evhub
,
Adam Jermyn
,
Johannes Treutlein
,
Rubi J. Hudson
and
kcwoolverton
Feb 2, 2023, 8:28 PM
72
points
15
comments
57
min read
LW
link
Conditioning Predictive Models: Large language models as predictors
evhub
,
Adam Jermyn
,
Johannes Treutlein
,
Rubi J. Hudson
and
kcwoolverton
Feb 2, 2023, 8:28 PM
88
points
4
comments
13
min read
LW
link
Underspecification of Oracle AI
Rubi J. Hudson
,
Adam Jermyn
and
Johannes Treutlein
Jan 15, 2023, 8:10 PM
30
points
12
comments
19
min read
LW
link
Multi-Component Learning and S-Curves
Adam Jermyn
and
Buck
Nov 30, 2022, 1:37 AM
63
points
24
comments
7
min read
LW
link
Engineering Monosemanticity in Toy Models
Adam Jermyn
,
evhub
and
Nicholas Schiefer
Nov 18, 2022, 1:43 AM
75
points
7
comments
3
min read
LW
link
(arxiv.org)
Toy Models and Tegum Products
Adam Jermyn
Nov 4, 2022, 6:51 PM
28
points
7
comments
5
min read
LW
link
Humans do acausal coordination all the time
Adam Jermyn
Nov 2, 2022, 2:40 PM
57
points
35
comments
3
min read
LW
link
Polysemanticity and Capacity in Neural Networks
Buck
,
Adam Jermyn
and
Kshitij Sachan
Oct 7, 2022, 5:51 PM
87
points
14
comments
3
min read
LW
link
Smoke without fire is scary
Adam Jermyn
4 Oct 2022 21:08 UTC
52
points
22
comments
4
min read
LW
link
It matters when the first sharp left turn happens
Adam Jermyn
29 Sep 2022 20:12 UTC
45
points
9
comments
4
min read
LW
link
Brief Notes on Transformers
Adam Jermyn
26 Sep 2022 14:46 UTC
48
points
3
comments
2
min read
LW
link
Conditioning, Prompts, and Fine-Tuning
Adam Jermyn
17 Aug 2022 20:52 UTC
38
points
9
comments
4
min read
LW
link
Conditioning Generative Models with Restrictions
Adam Jermyn
21 Jul 2022 20:33 UTC
18
points
4
comments
8
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel