johnswentworth comments on Beliefs and Disagreements about Automating Alignment Research

johnswentworth 25 Aug 2022 0:06 UTC
9 points
2
I ended up talking to Ian mostly about the difficulties of simulating/predicting alignment researchers because I expected that to be the topic with the most important gaps in coverage from the other people he talked to. Having read this post, there’s a different class of “automating alignment research” proposals which I want to talk about more. The general pattern is: study the process of science, figure out some key bottleneck thoroughly enough to make it legible, and then automate that bottleneck.
Example 1: Measurement Devices
I started to explain this and it turned into a whole post. Short version: any measurement device automates a part of the research process (the part where a human observes something). But a “thermometer” which automatically mimicked a human sticking their finger in something and reporting its hotness in natural language would be a lot less useful than the thermometers which we actually use. Most of the value comes, not from the automation, but from noticing some robust pattern in the world: the fact that we can use a single number (“temperature”) to reproducibly and precisely predict a broad class of interactions (e.g. which of two things will get hotter/colder when the two things are put in contact).
Example 2: “AI Feynman”
AI Feynman is a project out of Max Tegmark’s lab. The main interesting part is to automatically notice certain kinds of structure, like locality or additivity, in data. It’s the sort of thing where you could feed it measurement-streams from hundreds of sensors in a piston, and it might rediscover the Ideal Gas Law, or at least figure out that pressure, volume, and temperature summarize all the key information.
I don’t necessarily think AI Feynman itself is going revolutionize anything, but I could imagine something like it being a big deal. The key is to automate the step of science where we look at some very-high-dimensional real world stuff interacting, and back out the relatively-low-dimensional parameters which actually matter for the relatively-long-range interactions.
Example 3: Automated Inter-Researcher Interfaces
Occasionally I’ve heard proposals to automate distillation, or rubber ducking, or even writing up new research. In principle, I think there’s a lot of potential there. In practice, I think the vast majority of proposals start from e.g. “Here’s some neat ML tech, how could we apply it to distillation?” rather than “What are the main ways good distillations provide value, and how can we reduce the cost of that?”. I’m much more optimistic about people starting with the nail than the hammer. Going one step further: don’t just start with a story about how good distillations provide value, go find some actual distillations which are actually providing lots of value and study those; consider what their key load-bearing features are. Maybe try writing a few yourself, to better understand which steps are difficult and also double-check whether the key load-bearing features you identified are indeed sufficient to generate a high-value distillation. Do all that before asking about the hammer.
The General Pattern
The general pattern in these: deeply understand a particular bottleneck to the scientific process. Once we understand it deeply and legibly enough, automation should be straightforward. The main failure mode, in all cases, is to jump into automating without really understanding what it is that we’re automating.
Science and General Intelligence
Now, this sort of strategy is not easy. Lots of people have tried to make the scientific process more legible over the past couple centuries, and most of them have done a pretty shit job. (Karl Popper gets a special callout for doing a completely shit job, but being a sufficiently successful salesperson that his shitty model of “the scientific method” was basically what my science teachers taught me in middle school.)
But the study of science overlaps particularly well with the study of general intelligence; insights in one usually correspond directly to insights in the other. For instance, compare these two questions:
- How do scientists notice the few summary variables which matter for physical laws (like e.g. pressure, volume, temperature of a gas) when faced with a very-high-dimensional real-world system?
- How do generally intelligent systems figure out which few variables to remember, pay attention to, etc, when faced with very-high-dimensional sensor data?
Or:
- How do scientists narrow down the exponentially large search space of possible physical laws?
- How do generally intelligent systems narrow down the exponentially large search space of possible world models?
Because of this correspondence, I expect that insights into general intelligence will produce corresponding insights into how to do science better. Indeed, insofar as research into general intelligence doesn’t produce insights into how to do science better, it’s probably on the wrong track.

johnswentworth comments on Beliefs and Disagreements about Automating Alignment Research

Example 1: Measurement Devices

Example 2: “AI Feynman”

Example 3: Automated Inter-Researcher Interfaces

The General Pattern

Science and General Intelligence