General v. Specific Planning
Epistemic Status: Everyone already knows about it?
I’ve been thinking a bit about two different manners of pursuing a goal.
I haven’t come up with a catchy jargon for them, and I don’t know of any existing catchy jargon for them either. General v. specific planning is pretty bad, but at least for the purpose of this post I’ll stick to it.
I know they’ve been discussed here in one form or other, probably many times, but I don’t think they’ve really been explicitly contrasted. I thought doing that might be useful.
Here are some suggestive, if imperfect, contrasts, illustrating what I mean.
Examples
How I try to win Chess versus how Chess grandmasters and AlphaZero try to win Chess.
I tend to play Chess by trying to get an advantage in valuable pieces, because that is generally useful to me. I then try to use these pieces to eventually obtain checkmate.
On the other hand, AlphaZero seems to play to obtain a specific, although gradually accumulated, positional advantage that ultimately results in a resounding victory. It is happy to sacrifice “generally useful” material to get this.
This isn’t simply a matter of just using different techniques to get to the end. It has more to do with my inability to identify strong positions and picture a game very far into the future.
Peter Thiel’s “indefinite optimism” about career success versus “definite optimism” about career success.
According to this schema, the typical indefinite optimist’s life-path consists in getting instrumentally useful things, such as education, status, or money, without committing to a definite course of action. The stereotypical career for such a person is finance or consulting or “business.” Their success is supposed to follow the pursuit of optionality.
The definite optimist’s life path, on the other hand, is more likely to consist in researching and shooting for a single, particular course of action. The stereotypical career for such a person is as an inventor or entrepreneur. Their success is supposed to follow after giving up a great deal of optionality.
The indefinite optimist accumulates generally useful resources, while the definite optimist will give up many “generally useful” things in order to reach a single goal.
OpenAI’s strategy versus DeepMind’s strategy of handling AGI successfully.
OpenAI seems like it is trying to position itself to be able to influence policy, influence other researchers in other institutions, and perhaps—although probably not—develop AGI itself, and thereby insure safe AGI. They recently made themselves into a “capped profit” company to gain more money, similarly.
On the other hand, DeepMind’s strategy simply seems be to try to develop AGI itself. They seem to have other efforts but these seem noncentral. (I might be entirely wrong about this, though.)
Stereotypically conservative, “white” advice for achieving happiness in life against stereotypically adventurous “red” advice for achieving happiness in life.
White advice for life would be to get a good education, to save money, etc., while expecting these to be instrumentally useful for a relatively undefined happiness. You plan to get the means for happiness, not happiness itself. This turns out badly when the presumed means pin you down and prevent you from getting what you really wanted.
Red advice, on the other hand, is more about picturing for yourself what you want in your inmost core, and saying to pursue that kind of happiness directly. You plan to get happiness itself, not the penumbral material for happiness. This turns out badly when what you thought would make you happy turns out not to make you happy, but now you lack general mobility.
The former is about pursuing particular things which are often useful for happiness; the latter is about trying to imagine what happiness is for yourself and pursuing that directly.
(I feel like this is giving short shrift to both perfected white and perfected red ideals, but I think it gets across what I mean.)
The kind of planning for success which frequently occurs in startups against the kind of planning which frequently tends to occur in established companies.
Startups are known for sacrificing instrumentally useful goods—generally, money—to carry out a longer-term plan. While established companies are known for carrying out short-term plans to gain an instrumentally-useful good, and in many cases known for ignoring long-term risks for short term gains.
Sketches of Definitions
If we want to stop giving examples and start talking about featherless bipeds, there are a few different ways to describe the examples above.
The distinction might be in the kind of resource pursued, and as alternate techniques for obtaining a goal.
On one hand, you can pursue resources that are generally useful along multiple hypothetical paths towards one’s goal.
On the other hand, you can pursue a resource that is mostly useful along a single path towards one’s goal.
But you can also cast these as different phases in pursuit of a goal, and say that a great deal of goal-seeking behavior goes through both.
When you start pursuing a goal, you often seek out generally useful things.
Demis Hassabis (apparently) pursued money and experience managing before starting DeepMind. AlphaZero pursues general positional advantage towards the start of a Chess game. Anyone founding a startup might pursue general knowledge about the domain for the startup before beginning.
And then at some point while pursuing the goal, you cease to seek generically useful things, and begin to turn these things to your specific advantage.
Demis Hassabis uses his resources to get founding from Peter Thiel, and later get acquired by Google. AlphaZero pursues a particular strategy for checkmate (?). Anyone founding a startup stops pursuing general knowledge to actually, you know, start the startup.
There’s obviously a continuum here.
For the sake of clarity, I take this distinction to be far different than the distinction between trying to try and actually trying.
There are certainly possible agents in possible worlds who, when trying to do their absolute best, must try to attain intermediate, generally instrumentally useful goals. If I were to try to play Chess like AlphaZero, I would do even worse than I actually do now.
Misc Notes
Generally, I’ve noticed a great deal of psychological resistance in myself to moving from the general to the specific.
General strategies feel safer (even if they aren’t) both because they offer more visible options and there’s a smaller social burden in following them. No one will fault you for pursuing money, influence, etc. Many people will think you’re foolish for deciding to bet your life on a particular path.
On the other hand, the more specific strategy can be more fun.
It can be fun because specific planning is more mentally interesting than sort of punting off a strategy to the future by thinking “well, once we’ve gotten more influence / money / status, then we’ll pursue X directly.”
Tighter constraints can be more interesting to work with. The problem of solving with tighter constraints also exercises your mind more, perhaps.
(As rather an aside, I think many people only exercise the sort of general planning about life planning, while only exercising the specific about work, which could lead to a little life boredom. I could be wrong.)
The more general feels more epistemically humble (potentially in a bad way) while the more specific feels more epistemically interesting (also potentially in a bad way).
Each strategy is sometimes correct.
Sometimes you just need to study more math, and sometimes you should actually propose a solution to that RL problem.
And really this whole framework might be fake.
AlphaZero plays chess in a manner that is completely unlike how humans, or even human-designed chess programs play chess. A human grandmaster does play much like you describe yourself playing: accumulating piece advantages, and only making limited sacrifices to gain position, when it’s clear that the positional advantage outweighs the piece disadvantage.
AlphaZero, on the other hand, plays much more positionally. In its games against Stockfish, it would make sacrifices that Stockfish thought were crazy, as Stockfish was evaluating the board based on pieces and AlphaZero was evaluating the board based on position.