An Inside View of AI Alignment

I started to take AI Alignment seriously around early 2020. I’d been interested in AI and machine learning in particular since 2014 or so, taking several online ML courses in high school and implementing some simple models for various projects. I leaned into the same niche in college, taking classes in NLP, Computer Vision, and Deep Learning to learn more of the underlying theory and modern applications of AI, with a continued emphasis on ML. I was very optimistic about AI capabilities then (and still am) and if you’d asked me about AI alignment or safety as late as my sophomore year of college (2018-2019), I probably would have quoted Steven Pinker or Andrew Ng at you.

Somewhere in the process of reading The Sequences, portions of the AI Foom Debate, and texts like Superintelligence and Human Compatible, I changed my mind. Some 80,000 hours podcast episodes were no doubt influential as well, particularly the episodes with Paul Christiano. By late 2020, I probably took AI risk as seriously as I do today, believing it to be one of the world’s most pressing problems (perhaps the most) and was interested in learning more about it. I binged most of the sequences on the Alignment Forum at this point, learning about proposals and concepts like IDA, Debate, Recursive Reward Modeling, Embedded Agency, Attainable Utility Preservation, CIRL etc. Throughout 2021 I continued to keep a finger on the pulse of the field: I got a large amount of value out of the Late 2021 MIRI Conversations in particular, shifting away from a substantial amount of optimism in prosaic alignment methods, slower takeoff speeds, longer timelines, and a generally “Christiano-ish” view of the field and more towards a “Yudkowsky-ish” position.

I had a vague sense that AI safety would eventually be the problem I wanted to work on in my life, but going through the EA Cambridge AGI Safety Fundamentals Course helped make it clear that I could productively contribute to AI safety work right now or in the near future. This sequence is going to be an attempt to explicate my current model or “inside view” of the field. These viewpoints have been developed over several years and are no doubt influenced by my path into and through AI safety research: for example, I tend to take aligning modern ML models extremely seriously, perhaps more seriously than is deserved, because of my greater amount of experience with ML compared to other AI paradigms.

I’m writing with the express goal of having my beliefs critiqued and scrutinized: there’s a lot I don’t know and no doubt a large amount that I’m misunderstanding. I plan on writing on a wide variety of topics: the views of various researchers, my understanding and confidence in specific alignment proposals, timelines, takeoff speeds, the scaling hypothesis, interpretability, etc. I also don’t have a fixed timeline or planned order in which I plan to publish different pieces of the model.

Without further ado, the posts that follow comprise Ansh’s (current) Inside View of AI Alignment.

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer