Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
NickGabs
Karma:
380
All
Posts
Comments
New
Top
Old
Steering Llama-2 with contrastive activation additions
Nina Panickssery
,
Wuschel Schulz
,
NickGabs
,
Meg
,
evhub
and
TurnTrout
2 Jan 2024 0:47 UTC
124
points
29
comments
8
min read
LW
link
(arxiv.org)
Science of Deep Learning more tractably addresses the Sharp Left Turn than Agent Foundations
NickGabs
19 Sep 2023 22:06 UTC
23
points
2
comments
6
min read
LW
link
An upcoming US Supreme Court case may impede AI governance efforts
NickGabs
16 Jul 2023 23:51 UTC
57
points
17
comments
2
min read
LW
link
Empirical Evidence Against “The Longest Training Run”
NickGabs
6 Jul 2023 18:32 UTC
24
points
0
comments
14
min read
LW
link
Proposal: labs should precommit to pausing if an AI argues for itself to be improved
NickGabs
2 Jun 2023 22:31 UTC
3
points
3
comments
4
min read
LW
link
AI Doom Is Not (Only) Disjunctive
NickGabs
30 Mar 2023 1:42 UTC
12
points
0
comments
5
min read
LW
link
We Need Holistic AI Macrostrategy
NickGabs
15 Jan 2023 2:13 UTC
39
points
4
comments
8
min read
LW
link
Takeoff speeds, the chimps analogy, and the Cultural Intelligence Hypothesis
NickGabs
2 Dec 2022 19:14 UTC
16
points
2
comments
4
min read
LW
link
Miscellaneous First-Pass Alignment Thoughts
NickGabs
21 Nov 2022 21:23 UTC
12
points
4
comments
10
min read
LW
link
Distillation of “How Likely Is Deceptive Alignment?”
NickGabs
18 Nov 2022 16:31 UTC
24
points
4
comments
10
min read
LW
link
Back to top