I realized I hadn’t given feedback on the actual results of the recommendation algorithm. Rating the recommendations I’ve gotten (from −10 to 10, 10 is best):
My experience using financial commitments to overcome akrasia: 3
An Introduction to AI Sandbagging: 3
Improving Dictionary Learning with Gated Sparse Autoencoders: 2
[April Fools’ Day] Introducing Open Asteroid Impact: −6
LLMs seem (relatively) safe: −3
The first future and the best future: −2
Examples of Highly Counterfactual Discoveries?: 5
“Why I Write” by George Orwell (1946): −3
My Clients, The Liars: −4
‘Empiricism!’ as Anti-Epistemology: −2
Toward a Broader Conception of Adverse Selection: 4
Ambitious Altruistic Software Engineering Efforts: Opportunities and Benefits: 6
I’d be interested in a comparison with the Latest tab.
Transformers Represent Belief State Geometry in their Residual Stream: 6
D&D.Sci: −5
Open Thread Spring 2024: 3
Introducing AI Lab Watch: −3
An explanation of evil in an organized world: −3
Mechanistically Eliciting Latent Behaviors in Language Models: 3
Shane Legg’s necessary properties for every AGI Safety plan: −1
LessWrong Community Weekend 2024, open for applications: −6
Ironing Out the Squiggles: 5
ACX Covid Origins Post convinced readers: −7
Why I’m doing PauseAI: −2
Manifund Q1 Retro: Learnings from impact certs: −1
Questions for labs: −3
Refusal in LLMs is mediated by a single direction: 5
Take SCIFs, it’s dangerous to go alone: 4
Current theme: default
Less Wrong (text)
Less Wrong (link)
Arrow keys: Next/previous image
Escape or click: Hide zoomed image
Space bar: Reset image size & position
Scroll to zoom in/out
(When zoomed in, drag to pan; double-click to close)
Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).
]
Keys shown in grey (e.g., ?) do not require any modifier keys.
?
Esc
h
f
a
m
v
c
r
q
t
u
o
,
.
/
s
n
e
;
Enter
[
\
k
i
l
=
-
0
′
1
2
3
4
5
6
7
8
9
→
↓
←
↑
Space
x
z
`
g
I realized I hadn’t given feedback on the actual results of the recommendation algorithm. Rating the recommendations I’ve gotten (from −10 to 10, 10 is best):
My experience using financial commitments to overcome akrasia: 3
An Introduction to AI Sandbagging: 3
Improving Dictionary Learning with Gated Sparse Autoencoders: 2
[April Fools’ Day] Introducing Open Asteroid Impact: −6
LLMs seem (relatively) safe: −3
The first future and the best future: −2
Examples of Highly Counterfactual Discoveries?: 5
“Why I Write” by George Orwell (1946): −3
My Clients, The Liars: −4
‘Empiricism!’ as Anti-Epistemology: −2
Toward a Broader Conception of Adverse Selection: 4
Ambitious Altruistic Software Engineering Efforts: Opportunities and Benefits: 6
I’d be interested in a comparison with the Latest tab.
Transformers Represent Belief State Geometry in their Residual Stream: 6
D&D.Sci: −5
Open Thread Spring 2024: 3
Introducing AI Lab Watch: −3
An explanation of evil in an organized world: −3
Mechanistically Eliciting Latent Behaviors in Language Models: 3
Shane Legg’s necessary properties for every AGI Safety plan: −1
LessWrong Community Weekend 2024, open for applications: −6
Ironing Out the Squiggles: 5
ACX Covid Origins Post convinced readers: −7
Why I’m doing PauseAI: −2
Manifund Q1 Retro: Learnings from impact certs: −1
Questions for labs: −3
Refusal in LLMs is mediated by a single direction: 5
Take SCIFs, it’s dangerous to go alone: 4