This is a big crux on how to view the world. Does it take irrationality to do hard things? Eliezer Yudkowsky explicitly says no. In his view, rationality is systematized winning. If you think you need to be ‘irrational’ to do a hard thing, either that hard thing is not actually worth doing, or your sense of what is rational is confused.
But this has always been circular. Is the thing Ilya is doing going to systematically win? Well, it’s worked out pretty well for him so far. By this standard, maybe focusing on calibration is the real irrationality.
I also think that “fooling oneself or having false beliefs” is a mischaracterization of the alternative to classic “rationality”, or maybe a type error. Consider growth mindset: it’s not really a specific belief, more like an attitude; and specifically, an attitude from which focusing on “what’s the probability that I succeed” is the wrong type of question to ask. I’ll say more about this later in my sequence on meta-rationality.
What is bizarre is the idea that most of the ML community ‘doesn’t think misalignment will be a problem.’
I should have been clearer, I meant “existential problem”, since I assumed that was what Conor was referring to by “not going to work”. I think that, with that addition, the statement is correct. I also still think that Conor’s original statement is so wildly false that it’s a clear signal of operating via mood affiliation.
This is the opposite of their perspective, which is that ‘good enough’ alignment for the human-level is all you need. That seems very wrong to me. You would have to think you can somehow ‘recover’ the lost alignment later in the process.
I mostly thing this ontology is wrong, but attempting to phrase a response within it: as long as you can extract useful intellectual work out of a system, you can “recover” lost alignment. Misaligned models are not going to be lying to us about everything, they are going to be lying to us about specific things which it’s difficult for us to verify. And a misaligned human-level model simply won’t have a very easy time lying to us, or coordinating lies across many copies of itself.
On circularity and what wins, the crux to me in spots like this is whether you do better by actually fooling yourself and actually assume you can solve the problem, or whether you want to take a certain attitude like ‘I am going to attack this problem like it is solvable,’ while not forgetting in case it matters elsewhere that you don’t actually know that - which in some cases that I think includes this one matters a lot, in others not as much. I think we agree that you want to at least do that second one in many situations, given typical human limitations.
My current belief is that fooling oneself for real is at most second-best as a solution, unless you are being punished/rewarded via interpretability.
On circularity and what wins, the crux to me in spots like this is whether you do better by actually fooling yourself and actually assume you can solve the problem
As per my comment I think “fooling yourself” is the wrong ontology here, it’s more like “devote x% of your time to thinking about what happens if you fail” where x is very small. (Analogously, someone with strong growth mindset might only rarely consider what happens if they can’t ever do better than they’re currently doing—but wouldn’t necessarily deny that it’s a possibility.)
Or another analogy: what percentage of their time should a startup founder spend thinking about whether or not to shut down their company? At the beginning, almost zero. (They should plausibly spend a lot of time figuring out whether to pivot or not, but I expect Ilya also does that.)
That is such an interesting example because if I had to name my biggest mistake (of which there were many) when founding MetaMed, it was failing to think enough about whether to shut down the company, and doing what I could to keep it going rather than letting things gracefully fail (or, if possible, taking what I could get). We did think a bunch about various pivots.
Your proposed ontology is strange to me, but I suppose one could say that one can hold such things as ‘I don’t know and don’t have a guess’ if it need not impact one’s behavior.
Whether or not it makes sense for Ilya to think about what happens if he fails is a good question. In some ways it seems very important for him to be aware he might fail and to ensure that such failure is graceful if it happens. In others, it’s fine to leave that to the future or someone else. I do want him aware enough to check for the difference.
With growth mindset, I try to cultivate it a bunch, but also it’s important to recognize where growth is too expensive to make sense or actually impossible—for me, for example, learning to give up trying to learn foreign languages.
To further clarify the statement, do you mean ‘most ML researchers do not expect to die’ or do you mean ‘most ML researchers do not think there is an existential risk here at all?’ Or something in between? The first is clearly true, I thought the second was false in general at this point.
(I do agree that Connor’s statement was misleading and worded poorly, and that he is often highly mood affiliated, I can certainly sympathize with being frustrated there.)
When Conor says “won’t work”, I infer that to mean “will, if implemented as the main alignment plan, lead to existential catastrophe with high probability”. And then my claim is that most ML researchers don’t think there’s a high probability of existential catastrophe from misaligned AGI at all, so it’s very implausible that they think there’s a high probability conditional on this being the alignment plan used.
(This does depend on what you count as “high” but I’m assuming that if this plan dropped the risk down to 5% or 1% or whatever the median ML researcher thinks it is, then Conor would be deeply impressed.)
Thanks, that’s exactly what I needed to know, and makes perfect sense.
I don’t think it’s quite as implausible to think both (1) this probably won’t work as stated and (2) we will almost certainly be fine, if you think those involved will notice this and pivot. Yann LeCun for example seems to think a version of this? That we will be fine, despite thinking current model and technique paths won’t work, because we will therefore move away from such paths.
While I generally agree with you, I don’t think growth mindset actually works, or at least is wildly misleading in what it can do.
Re the issue where talking about the probability of success is the wrong question to ask, I think the key question here is how undetermined the probability of success is, or how much it’s conditional on your own actions.
That’s a great point, but aren’t you saying the same thing in disguise?
To me you’re both saying « The map is not the territory. ».
If I map a question about nutrition using a frame known useful for thermodynamics, I’ll make mistakes even if rational (because I’d fail to ask the right questions). But « asking the right question » is something I count as « my own actions, potentially », so you could totally reword that as « success is conditional on my own decision to stop gathering the thermodynamic frame for thinking about nutrition ».
Also I’d say that what you can call « growth mindset » (as anything for mental health, especially placebos) can sometime help you. And by « you » I mean « me », of course. 😉
In general, I think that a wrong frame can only slow you down, not stop you. The catch is that the slowdown can be arbitrarily bad, which is the biggest problem here.
You can use a logic/math frame to answer a whole lot of questions, but the general case is exponentially slow the more variables you add, and that’s in a bounded logic/math frame, with relatively simplistic logics. Any more complexity and we immediately run into ever more intractability, and this goes on for a long time.
I’d say the more appropriate saying is that the map is equivalent to the territory, but computing the map is in the general case completely hopeless, so in practice our maps will fail to match the logical/mathematical territory.
I agree that many problems in practice have a case where the probability of success is dependent on you somewhat, and that the probability of success is underdetermined, so it’s not totally useful to ask the question “what is the probability of success”.
On growth mindset, I mostly agree with it if we are considering any possible changes, but in practice, it’s usually referred to as the belief that one can reliably improve yourself with sheer willpower, and I’m way more skeptical that this actually works, for a combination of reasons. It would be nice to have that be true, and I’d say that a little bit of it could survive, but unfortunately I don’t think this actually works, for most problems. Of course, most is not all, and the fact that the world is probably heavy tailed goes some way to restore control, but still I don’t think growth mindset as defined is very right.
I’d say the largest issue about rationality that I have is that it generally ignores bounds on agents, and in general it gives little thought to what happens if agents are bounded in their ability to think or do things. It’s perhaps the biggest reason why I suspect a lot of paradoxes of rationality come about.
in practice our maps will fail to match the logical/mathematical territory.
It’s perhaps the biggest reason why I suspect a lot of paradoxes of rationality come about.
That’s an interesting hypothesis. Let’s see if that works for ![this problem]/(https://en.m.wikipedia.org/wiki/Bertrand_paradox_(probability)). Would you say Jaynes is the only one who manage to match the logical/mathematical territory? Or would you say he completely misses the point because his frame puts too much weight on « There must be one unique answer that is better than any other answer »? How would you try to reason two bayesians who would take opposite positions on this mathematical question?
//smaller
a wrong frame can only slow you down, not stop you. The catch is that the slowdown can be arbitrarily bad
This vocabulary feels misleading, like saying: We can break RSA with a fast algorithm. The catch is that it’s slow for some instances.
the general case is exponentially slow the more variables you add
(as a concrete application for permutation testing: if you randomize condition, fine, if you randomize pixels, not fine… because the latter is the general case while the former is the special case)
though it can’t be too strong, or else we’d be able to do anything we couldn’t do today.
I don’t get this sentence.
I don’t think growth mindset [as the belief that one can reliably improve yourself with sheer willpower] is very right.
That’s an interesting hypothesis. Let’s see if that works for ![this problem]/(https://en.m.wikipedia.org/wiki/Bertrand_paradox_(probability)). Would you say Jaynes is the only one who manage to match the logical/mathematical territory? Or would you say he completely misses the point because his frame puts too much weight on « There must be one unique answer that is better than any other answer »? How would you try to reason two bayesians who would take opposite positions on this mathematical question?
Basically, you sort of mentioned it yourself: There is no unique answer to the question, so the question as given underdetermines the answer. There is more than 1 solution, and that’s fine. This means that the question to answer is not one-to-one, so some choices must be made here.
This vocabulary feels misleading, like saying: We can break RSA with a fast algorithm. The catch is that it’s slow for some instances.
This is indeed the problem. I never stated that it must be a reasonable amount of time, and that’s arguably the biggest issue here: Bounded rationality is important, much more important than we realize, because there is limited time and memory/resources to dedicate to problems.
(as a concrete application for permutation testing: if you randomize condition, fine, if you randomize pixels, not fine… because the latter is the general case while the former is the special case)
The point here is I was trying to answer the question of why there’s no universal frame, or at least why logic/mathematics isn’t a useful universal frame, and the results are important here in this context.
I don’t get this sentence.
I didn’t speak well, and I want to either edit or remove this sentence.
Okay, the biggest disagreements with stuff like growth mindset is I believe a lot of your outcomes are due to luck/chance events swinging in your favor. Heavy tails sort of restores some control, since a single action can have large impact, thus even a little control multiplies, but a key claim I’m making is that a lot of your outcomes are due to luck/chance, and the stuff that isn’t luck probably isn’t stuff you control yet, and that we post-hoc a merit/growth based story even when in reality luck did a lot of the work.
The point here is I was trying to answer the question of why there’s no universal frame, or at least why logic/mathematics isn’t a useful universal frame, and the results are important here in this context.
Great point!
There is no unique answer to the question, so the question as given underdetermines the answer.
That’s how I feel about most interesting questions.
Bounded rationality is important, much more important than we realize
Do you feel IP=PSPACE relevant on this?
a key claim I’m making is that a lot of your outcomes are due to luck/chance, and the stuff that isn’t luck probably isn’t stuff you control yet, and that we post-hoc a merit/growth based story
Sure And there’s some luck and things I don’t control in who I had children with. Should I feel less grateful because someone else could have done the same?
Sure And there’s some luck and things I don’t control in who I had children with. Should I feel less grateful because someone else could have done the same?
No. It has a lot of other implications, just not this one.
Do you feel IP=PSPACE relevant on this?
Yes, but in general computational complexity/bounded computation matter a lot more than people think.
That’s how I feel about most interesting questions.
But this has always been circular. Is the thing Ilya is doing going to systematically win? Well, it’s worked out pretty well for him so far. By this standard, maybe focusing on calibration is the real irrationality.
I also think that “fooling oneself or having false beliefs” is a mischaracterization of the alternative to classic “rationality”, or maybe a type error. Consider growth mindset: it’s not really a specific belief, more like an attitude; and specifically, an attitude from which focusing on “what’s the probability that I succeed” is the wrong type of question to ask. I’ll say more about this later in my sequence on meta-rationality.
I should have been clearer, I meant “existential problem”, since I assumed that was what Conor was referring to by “not going to work”. I think that, with that addition, the statement is correct. I also still think that Conor’s original statement is so wildly false that it’s a clear signal of operating via mood affiliation.
I mostly thing this ontology is wrong, but attempting to phrase a response within it: as long as you can extract useful intellectual work out of a system, you can “recover” lost alignment. Misaligned models are not going to be lying to us about everything, they are going to be lying to us about specific things which it’s difficult for us to verify. And a misaligned human-level model simply won’t have a very easy time lying to us, or coordinating lies across many copies of itself.
On circularity and what wins, the crux to me in spots like this is whether you do better by actually fooling yourself and actually assume you can solve the problem, or whether you want to take a certain attitude like ‘I am going to attack this problem like it is solvable,’ while not forgetting in case it matters elsewhere that you don’t actually know that - which in some cases that I think includes this one matters a lot, in others not as much. I think we agree that you want to at least do that second one in many situations, given typical human limitations.
My current belief is that fooling oneself for real is at most second-best as a solution, unless you are being punished/rewarded via interpretability.
As per my comment I think “fooling yourself” is the wrong ontology here, it’s more like “devote x% of your time to thinking about what happens if you fail” where x is very small. (Analogously, someone with strong growth mindset might only rarely consider what happens if they can’t ever do better than they’re currently doing—but wouldn’t necessarily deny that it’s a possibility.)
Or another analogy: what percentage of their time should a startup founder spend thinking about whether or not to shut down their company? At the beginning, almost zero. (They should plausibly spend a lot of time figuring out whether to pivot or not, but I expect Ilya also does that.)
That is such an interesting example because if I had to name my biggest mistake (of which there were many) when founding MetaMed, it was failing to think enough about whether to shut down the company, and doing what I could to keep it going rather than letting things gracefully fail (or, if possible, taking what I could get). We did think a bunch about various pivots.
Your proposed ontology is strange to me, but I suppose one could say that one can hold such things as ‘I don’t know and don’t have a guess’ if it need not impact one’s behavior.
Whether or not it makes sense for Ilya to think about what happens if he fails is a good question. In some ways it seems very important for him to be aware he might fail and to ensure that such failure is graceful if it happens. In others, it’s fine to leave that to the future or someone else. I do want him aware enough to check for the difference.
With growth mindset, I try to cultivate it a bunch, but also it’s important to recognize where growth is too expensive to make sense or actually impossible—for me, for example, learning to give up trying to learn foreign languages.
To further clarify the statement, do you mean ‘most ML researchers do not expect to die’ or do you mean ‘most ML researchers do not think there is an existential risk here at all?’ Or something in between? The first is clearly true, I thought the second was false in general at this point.
(I do agree that Connor’s statement was misleading and worded poorly, and that he is often highly mood affiliated, I can certainly sympathize with being frustrated there.)
When Conor says “won’t work”, I infer that to mean “will, if implemented as the main alignment plan, lead to existential catastrophe with high probability”. And then my claim is that most ML researchers don’t think there’s a high probability of existential catastrophe from misaligned AGI at all, so it’s very implausible that they think there’s a high probability conditional on this being the alignment plan used.
(This does depend on what you count as “high” but I’m assuming that if this plan dropped the risk down to 5% or 1% or whatever the median ML researcher thinks it is, then Conor would be deeply impressed.)
Thanks, that’s exactly what I needed to know, and makes perfect sense.
I don’t think it’s quite as implausible to think both (1) this probably won’t work as stated and (2) we will almost certainly be fine, if you think those involved will notice this and pivot. Yann LeCun for example seems to think a version of this? That we will be fine, despite thinking current model and technique paths won’t work, because we will therefore move away from such paths.
While I generally agree with you, I don’t think growth mindset actually works, or at least is wildly misleading in what it can do.
Re the issue where talking about the probability of success is the wrong question to ask, I think the key question here is how undetermined the probability of success is, or how much it’s conditional on your own actions.
That’s a great point, but aren’t you saying the same thing in disguise?
To me you’re both saying « The map is not the territory. ».
If I map a question about nutrition using a frame known useful for thermodynamics, I’ll make mistakes even if rational (because I’d fail to ask the right questions). But « asking the right question » is something I count as « my own actions, potentially », so you could totally reword that as « success is conditional on my own decision to stop gathering the thermodynamic frame for thinking about nutrition ».
Also I’d say that what you can call « growth mindset » (as anything for mental health, especially placebos) can sometime help you. And by « you » I mean « me », of course. 😉
In general, I think that a wrong frame can only slow you down, not stop you. The catch is that the slowdown can be arbitrarily bad, which is the biggest problem here.
You can use a logic/math frame to answer a whole lot of questions, but the general case is exponentially slow the more variables you add, and that’s in a bounded logic/math frame, with relatively simplistic logics. Any more complexity and we immediately run into ever more intractability, and this goes on for a long time.
I’d say the more appropriate saying is that the map is equivalent to the territory, but computing the map is in the general case completely hopeless, so in practice our maps will fail to match the logical/mathematical territory.
I agree that many problems in practice have a case where the probability of success is dependent on you somewhat, and that the probability of success is underdetermined, so it’s not totally useful to ask the question “what is the probability of success”.
On growth mindset, I mostly agree with it if we are considering any possible changes, but in practice, it’s usually referred to as the belief that one can reliably improve yourself with sheer willpower, and I’m way more skeptical that this actually works, for a combination of reasons. It would be nice to have that be true, and I’d say that a little bit of it could survive, but unfortunately I don’t think this actually works, for most problems. Of course, most is not all, and the fact that the world is probably heavy tailed goes some way to restore control, but still I don’t think growth mindset as defined is very right.
I’d say the largest issue about rationality that I have is that it generally ignores bounds on agents, and in general it gives little thought to what happens if agents are bounded in their ability to think or do things. It’s perhaps the biggest reason why I suspect a lot of paradoxes of rationality come about.
//bigger
That’s an interesting hypothesis. Let’s see if that works for ![this problem]/(https://en.m.wikipedia.org/wiki/Bertrand_paradox_(probability)). Would you say Jaynes is the only one who manage to match the logical/mathematical territory? Or would you say he completely misses the point because his frame puts too much weight on « There must be one unique answer that is better than any other answer »? How would you try to reason two bayesians who would take opposite positions on this mathematical question?
//smaller
This vocabulary feels misleading, like saying: We can break RSA with a fast algorithm. The catch is that it’s slow for some instances.
This proves too much, like the ![no free lunch]/(https://www.researchgate.net/publication/228671734_Toward_a_justification_of_meta-learning_Is_the_no_free_lunch_theorem_a_show-stopper). The catch is exactly the same: we don’t care about the general case. All we care about is the very small number of cases that can arise in practice.
(as a concrete application for permutation testing: if you randomize condition, fine, if you randomize pixels, not fine… because the latter is the general case while the former is the special case)
I don’t get this sentence.
That sounds reasonable, condition on interpreting sheer willpower as magical thinking rather than ![cultivating agency]/(https://www.lesswrong.com/posts/vL8A62CNK6hLMRp74/agency-begets-agency)
Basically, you sort of mentioned it yourself: There is no unique answer to the question, so the question as given underdetermines the answer. There is more than 1 solution, and that’s fine. This means that the question to answer is not one-to-one, so some choices must be made here.
This is indeed the problem. I never stated that it must be a reasonable amount of time, and that’s arguably the biggest issue here: Bounded rationality is important, much more important than we realize, because there is limited time and memory/resources to dedicate to problems.
The point here is I was trying to answer the question of why there’s no universal frame, or at least why logic/mathematics isn’t a useful universal frame, and the results are important here in this context.
I didn’t speak well, and I want to either edit or remove this sentence.
Okay, the biggest disagreements with stuff like growth mindset is I believe a lot of your outcomes are due to luck/chance events swinging in your favor. Heavy tails sort of restores some control, since a single action can have large impact, thus even a little control multiplies, but a key claim I’m making is that a lot of your outcomes are due to luck/chance, and the stuff that isn’t luck probably isn’t stuff you control yet, and that we post-hoc a merit/growth based story even when in reality luck did a lot of the work.
Great point!
That’s how I feel about most interesting questions.
Do you feel IP=PSPACE relevant on this?
Sure And there’s some luck and things I don’t control in who I had children with. Should I feel less grateful because someone else could have done the same?
No. It has a lot of other implications, just not this one.
Yes, but in general computational complexity/bounded computation matter a lot more than people think.
I definitely sympthatize with this view.