Take 5: Another problem for natural abstractions is laziness.

As a writing exercise, I’m writing an AI Alignment Hot Take Advent Calendar—one new hot take, written every day for 25 days. Or until I run out of hot takes.

Soundtrack.

Natural abstractions are patterns in the environment that are so convenient and so useful that most right-thinking agents will learn to take advantage of them. But what if humans and modern ML are too lazy to be right-thinking?

One way of framing this point is in terms of gradient starvation. The reason neural networks don’t explore all possible abstractions (aside from the expense) is that once they find the first way of solving a problem, they don’t really have an incentive to find a second way—it doesn’t give them a higher score, so they don’t. When gradient starvation is strong, it means the loss landscape has a lot of local minima that the agent can roll into, that aren’t easily connected to the global minimum, and so what abstractions the network ends up using will depend strongly on the initial conditions.

Regularization and exploration can help ameliorate this problem, but often come with catastrophic forgetting—if a neural net finds a strictly better way to solve the problem it’s faced with, it might forget all about the previous way. When we imagine a right-thinking agent that learns natural abstractions, we often imagine something that’s intrinsically motivated to learn lots of different ways of solving a problem, and that doesn’t erase its memory of interesting methods just because they’re not on the Pareto frontier.

So that’s what I mean by “lazy”/”not lazy”, here. Neural networks, or humans, are lazy if they’re parochial in solution-space, doing local search in a way that sees them get stuck in what a less-lazy optimizer might consider to be ruts.

It’s important to note that laziness is not an unambiguously bad property. First, it’s usually more efficient. Second, maybe we don’t want our neural net to actually search through the weird and adversarial parts of parameter-space, and local search prevents it from doing so. Alex Turner et al. have recently been making arguments like this fairly forcefully. Still, we don’t want maximal laziness, especially not if we want to find natural abstractions like the various meanings of “human values.”

I might be attacking a strawman or a bailey here, I’m not totally sure. I’ve been using “natural abstraction” here as if it just means an abstraction that would be useful for a wide variety of agents to have in their toolbox. But we might also use “natural abstractions” to denote the vital abstractions, those that aren’t merely nice to have, but that you literally can’t complete certain tasks without using. In that second sense, neural networks are always highly incentivized to learn relevant natural abstractions, and you can easily tell when they do so by measuring their loss.

But as per yesterday, there are often multiple similarly-powerful ways to model the world, in particular when modeling humans and human values. There might be hard core vital abstractions for various human-interaction tasks, but I suspect they’re abstractions like “discrete object,” not anything nearly so far into the leaves of the tree as “human values.” And when I see informal speculation about natural abstractions it usually strikes me as thinking about the less strict “useful for most agents” abstractions.

Ultimately, I expect laziness to cause both artificial neural nets and humans to miss out on some sizeable fraction of abstractions that most agents would find useful. What to do? There are options:

Build an AI that isn’t lazy. But laziness is useful, and anyhow maybe we don’t want an AI to explore all the extrema. So build an AI that’s less lazy in a controlled way. Requires research on AI architectures, and might have to sneakily borrow from the other options to specify the shape of the remaining laziness.
Redefine “right-thinking” to involve a human-like local search. This moves you closer to shard theory or other even-more-anthropomorphic alignment schemes. This may give up some nice properties of universally-natural abstractions, but you do keep working on basically the same technology for picking out abstractions learned by an AI. Requires a really good picture of what “human-like local search” means.
Use information about humans in the optimization process itself. This might look like Stuart’s picture of concept extrapolation, or maybe it would look like a self-reflective AI that tries to direct its own learning process. Really gives up on neutrality, and instead tries to be efficient by learning the concepts humans want learned. Requires research on architectures and human-computer interaction.

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer