Okay, but isn’t “playing to your outs” encouraging people to do crazy or violent things?
I think this is the largest downside to this framing (though I’d include simply “counter-productive” along with the others), and comes out of the MtG-vs-reality disparity: in MtG you know your outs; in reality, you do not. (I know you mentioned this, but it needs more shake-you-by-the-lapels-and-scream-it-in-your-face)
This framing needs to come with prominent warnings along the lines of: Do NOT assume that what you think is an out is certainly an out. Do NOT assume that the potential outs you’re aware of are a significant proportion of all outs.
The issue with crazy/violent plans is not only that they’re usually dumb/unlikely-to-work, but that even when they would be sensible given particular implicit assumptions, those assumptions may not hold.
People don’t know what they don’t know. Playing to the-outs-I-can-think-of will feel like “playing to my outs”—while I may be witlessly screwing up a load of outs I didn’t notice.
I guess this is why Eliezer went with the dignity framing—it suggests a default of caution, and perhaps deliberation. Playing to your outs suggest throwing caution to the wind (this is often how it works when you know your outs).
For anyone adopting this framing, I suggest an exercise where you first imagine having someone repeatedly scream YOU DO NOT KNOW ALL YOUR OUTS!! YOU WILL NEVER KNOW ALL YOUR OUTS!! at you for long enough that you know it in your bones.
In poker or MtG you can often actually figure out your outs.
Even in games with just a bit more complexity (yes, in certain ways MtG is very simple) it can be very very hard to know what your outs really are. And if one of your outs is “actually the next few turns will not be as devastating as I think they will”, then almost all versions of playing to your outs will be worse than playing for generic-looking EV.
Same with life. Don’t play to your outs until and unless you’re extremely sure you’ve modeled the world well enough to know what ALL of your outs ARE, and aren’t. And that boils down to, basically, don’t play to your outs. OP hinted at this with “find out what your outs are” but when the advice boils down to “don’t play to your outs, gain knowledge about the game state” I would say the framing should be discarded.
I agree finding your outs is very hard, but I don’t think this is actually a different challenge than increasing “dignity”. If you don’t have a map to victory, then you probably lose. I expect that in most worlds where we win, some people figured out some outs and played to them.
I currently don’t know of any outs. But I think I know some things that outs might require and am working on those, while hoping someone comes up with some good outs—and occasionally taking a stab at them myself.
I think the main problem is the first point and not the second point:
Do NOT assume that what you think is an out is certainly an out.
Do NOT assume that the potential outs you’re aware of are a significant proportion of all outs.
The current problem, if Eleizer is right, is basically that we have 0 outs. Not that the ones we have might be less promising than other ones. And he’s criticising people for not thinking their plans are outs when they’re actually not.
Well, I think that’s a real problem, but I worry Eliezer’s frame will generally discourage people from even trying to come up with good plans at all. That’s why I emphasize outs.
Oh sure—I don’t mean to imply there’s no upside in this framing, or that I don’t see a downside in Eliezer’s.
However, whether you know of outs depends on what you see as an out. E.g. buying much more time to come up with a solution could be seen as an out by some people. It’s easy to imagine many bad plans to do that, with potentially hugely negative side-effects.
Some of those bad plans would look rational, conditional on an assumption that there was no other way to avoid losing the future. Of course making such an assumption is poor reasoning, but the trouble is that it happens implicitly: nobody needs to say to themselves ”...and here I assume that no-one on earth has or will come up with approaches I’ve missed”, they only need to fail to ask themselves the right questions.
Conditional on being very clear on not knowing the outs, I think this framing may well be a good one for many people—but I’m serious about the mental exercise.
A similar principle I have about this situation is: Don’t get too clever.
Don’t do anything questionable or too complicated. If you do, you’re just as likely to cause harm as to cause good. The psychological warfare campaign you’ve envisioned against OpenAI is going to backfire on you and undermine your team.
Keep it simple. Promote alignment research. Persuade your friends. Volunteer on one of the many relevant projects.
I think this is the largest downside to this framing (though I’d include simply “counter-productive” along with the others), and comes out of the MtG-vs-reality disparity: in MtG you know your outs; in reality, you do not. (I know you mentioned this, but it needs more shake-you-by-the-lapels-and-scream-it-in-your-face)
This framing needs to come with prominent warnings along the lines of:
Do NOT assume that what you think is an out is certainly an out.
Do NOT assume that the potential outs you’re aware of are a significant proportion of all outs.
The issue with crazy/violent plans is not only that they’re usually dumb/unlikely-to-work, but that even when they would be sensible given particular implicit assumptions, those assumptions may not hold.
People don’t know what they don’t know. Playing to the-outs-I-can-think-of will feel like “playing to my outs”—while I may be witlessly screwing up a load of outs I didn’t notice.
I guess this is why Eliezer went with the dignity framing—it suggests a default of caution, and perhaps deliberation. Playing to your outs suggest throwing caution to the wind (this is often how it works when you know your outs).
For anyone adopting this framing, I suggest an exercise where you first imagine having someone repeatedly scream YOU DO NOT KNOW ALL YOUR OUTS!! YOU WILL NEVER KNOW ALL YOUR OUTS!! at you for long enough that you know it in your bones.
100% this.
In poker or MtG you can often actually figure out your outs.
Even in games with just a bit more complexity (yes, in certain ways MtG is very simple) it can be very very hard to know what your outs really are. And if one of your outs is “actually the next few turns will not be as devastating as I think they will”, then almost all versions of playing to your outs will be worse than playing for generic-looking EV.
Same with life. Don’t play to your outs until and unless you’re extremely sure you’ve modeled the world well enough to know what ALL of your outs ARE, and aren’t. And that boils down to, basically, don’t play to your outs. OP hinted at this with “find out what your outs are” but when the advice boils down to “don’t play to your outs, gain knowledge about the game state” I would say the framing should be discarded.
I agree finding your outs is very hard, but I don’t think this is actually a different challenge than increasing “dignity”. If you don’t have a map to victory, then you probably lose. I expect that in most worlds where we win, some people figured out some outs and played to them.
I currently don’t know of any outs. But I think I know some things that outs might require and am working on those, while hoping someone comes up with some good outs—and occasionally taking a stab at them myself.
I think the main problem is the first point and not the second point:
Do NOT assume that what you think is an out is certainly an out.
Do NOT assume that the potential outs you’re aware of are a significant proportion of all outs.
The current problem, if Eleizer is right, is basically that we have 0 outs. Not that the ones we have might be less promising than other ones. And he’s criticising people for not thinking their plans are outs when they’re actually not.
Well, I think that’s a real problem, but I worry Eliezer’s frame will generally discourage people from even trying to come up with good plans at all. That’s why I emphasize outs.
Oh sure—I don’t mean to imply there’s no upside in this framing, or that I don’t see a downside in Eliezer’s.
However, whether you know of outs depends on what you see as an out. E.g. buying much more time to come up with a solution could be seen as an out by some people. It’s easy to imagine many bad plans to do that, with potentially hugely negative side-effects.
Some of those bad plans would look rational, conditional on an assumption that there was no other way to avoid losing the future. Of course making such an assumption is poor reasoning, but the trouble is that it happens implicitly: nobody needs to say to themselves ”...and here I assume that no-one on earth has or will come up with approaches I’ve missed”, they only need to fail to ask themselves the right questions.
Conditional on being very clear on not knowing the outs, I think this framing may well be a good one for many people—but I’m serious about the mental exercise.
A similar principle I have about this situation is: Don’t get too clever.
Don’t do anything questionable or too complicated. If you do, you’re just as likely to cause harm as to cause good. The psychological warfare campaign you’ve envisioned against OpenAI is going to backfire on you and undermine your team.
Keep it simple. Promote alignment research. Persuade your friends. Volunteer on one of the many relevant projects.