Perhaps deconfusion and formalization are not identical, but I’m partial to the notion that if you’ve truly deconfused something (meaning you, personally, are no longer confused about that thing), it should not take much further effort to formalize the thing in question. (Examples of this include TurnTrout’s sequence on power-seeking, John Wentworth’s sequence on abstraction, and Scott Garrabrant’s sequence on Cartesian frames.)
So, although the path to deconfusing some concept may not involve formalizing that concept, being able to formalize the concept is necessary: if you find yourself (for some reason or other) thinking that you’ve deconfused something, but are nonetheless unable to produce a formalization of it, that’s a warning sign that you may not actually be less confused.
Perhaps deconfusion and formalization are not identical, but I’m partial to the notion that if you’ve truly deconfused something (meaning you, personally, are no longer confused about that thing), it should not take much further effort to formalize the thing in question.
So I have the perspective that deconfusion requires an application. And this application in return constrains what count as a successful deconfusion. There are definitely applications for which successful deconfusion requires a formalization: e.g if you want to implement the concept/use it in a program. But I think it’s important to point out that some applications (maybe most?) don’t require that level of formalization.
(Examples of this include TurnTrout’s sequence on power-seeking, John Wentworth’s sequence on abstraction, and Scott Garrabrant’s sequence on Cartesian frames.)
What’s interesting about these examples is that only the abstraction case is considered to have completely deconfused the concept it looked at. I personally think that Alex’s power-seeking formalization is very promising and captures important aspects of the process, but it’s not yet at the point where we can unambiguously apply it to all convergent subgoals for example. Similarly, Cartesian frames sound promising but it’s not clear that they actually completely deconfuse the concept they focus on.
And that’s exactly what I want to point out: although from the outside these work look more relevant/important/serious because they are formal, the actual value comes from the back-and-forth/grounding in the weird informal intuitions, a fact that I think John, Alex and Scott would agree with.
So, although the path to deconfusing some concept may not involve formalizing that concept, being able to formalize the concept is necessary: if you find yourself (for some reason or other) thinking that you’ve deconfused something, but are nonetheless unable to produce a formalization of it, that’s a warning sign that you may not actually be less confused.
But that doesn’t apply to so many of the valuable concepts and ideas we’ve come up with in alignment, like deception, myopia, HCH, universality, orthogonality thesis, convergent subgoals,… Once again the necessary condition isn’t one for value or usefulness, and that seems so overlooked around here. If you can formalize without losing the grounding, by all means do so because that’s a step towards more concrete concepts. Yet not being able to ground everything you think about formally doesn’t mean you’re not making progress in deconfusing them, nor that they can’t be useful at that less than perfectly formal stage.
I think it’s becoming less clear to me what you mean by deconfusion. In particular, I don’t know what to make of the following claims:
So I have the perspective that deconfusion requires an application. And this application in return constrains what count as a successful deconfusion. [...]
What’s interesting about these examples is that only the abstraction case is considered to have completely deconfused the concept it looked at. [...]
Similarly, Cartesian frames sound promising but it’s not clear that they actually completely deconfuse the concept they focus on. [...]
Yet not being able to ground everything you think about formally doesn’t mean you’re not making progress in deconfusing them, nor that they can’t be useful at that less than perfectly formal stage.
I don’t [presently] think these claims dovetail with my understanding of deconfusion. My [present] understanding of deconfusion is that (loosely speaking) it’s a process for taking ideas from the [fuzzy, intuitive, possibly ill-defined] sub-cluster and moving them to the [concrete, grounded, well-specified] sub-cluster.
I don’t think this process, as I described it, entails having an application in mind. (Perhaps I’m also misunderstanding what you mean by application!) It seems to me that, although many attempts at deconfusion-style alignment research (such as the three examples I gave in my previous comment) might be ultimately said to have been motivated by the “application” of aligning superhuman agents, in practice they happened more because somebody noticed that whenever some word/phrase/cluster-of-related-words-and-phrases came up in conversation, people would talk about them in conflicting ways, use/abuse contradictory intuitions while talking about them, and just in general (to borrow Nate’s words) “continuously accidentally spout nonsense”.
But perhaps from your perspective, that kind of thing also counts as an application, e.g. the application of “making us able to talk about the thing we actually care about”. If so, then:
I agree that it’s possible to make progress towards this goal without performing steps that look like formalization. (I would characterize this as the “philosophy” part of Luke’s post about going from philosophy to math to engineering.)
Conversely, I also agree that it’s possible to perform formalization in a way that doesn’t perfectly capture the essence of “the thing we want to talk about”, or perhaps doesn’t usefully capture it in any sense at all; if one wanted to use unkind words, one could describe the former category as “premature [formalization]”, and the latter category as “unnecessary [formalization]”. (Separately, I also see you as claiming that TurnTrout’s work on power-seeking and Scott’s work on Cartesian frames fall somewhere in the “premature” category, but this may simply be me putting words in your mouth.)
And perhaps your contention is that there’s too much research being done currently that falls under the second bullet point; or, alternatively, that too many people are pursuing research that falls (or may fall) under the second bullet point, in a way that they (counterfactually) wouldn’t if there were less (implicit) prestige attached to formal research.
If this (or something like it) is your claim, then I don’t think I necessarily disagree; in fact, it’s probably fair to say you’re in a better position to judge than I am, being “closer to the ground”. But I also don’t think this precludes my initial position from being valid, where—having laid the groundwork in the previous two bullet points—I can now characterize my initial position as [establishing the existence of] a bullet point number 3:
A successful, complete deconfusion of a concept will, almost by definition, admit to a natural formalization; if one then goes to the further step of producing such a formalism, it will be evident that the essence of the original concept is present in said formalism.
(Or, to borrow Eliezer’s words this time, “Do you understand [the concept/property/attribute] well enough to write a computer program that has it?”)
And yes, in a certain sense perhaps there might be no point to writing a computer program with the [concept/property/attribute] in question, because such a computer program wouldn’t do anything useful. But in another sense, there is a point: the point isn’t to produce a useful computer program, but to check whether your understanding has actually reached the level you think it has. If one further takes the position (as I do) that such checks are useful and necessary, then [replacing “writing a computer program” with “producing a formalism”] I claim that many productive lines of deconfusion research will in fact produce formalisms that look “premature” or even “unnecessary”, as a part of the process of checking the researchers’ understanding.
I think that about sums up [the part of] the disagreement [that I currently know how to verbalize]. I’m curious to see whether you agree this is a valid summary; let me know if (abstractly) you think I’ve been using the term “deconfusion” differently from you, or (concretely) if you disagree with anything I said about “my” version of deconfusion.
I think this is an excellent summary, and I agree with almost all of it. My only claim is that it’s easy to think that deconfusion is only useful when it results in formalization (when it is complete by your sense) but that isn’t actually true, especially at the point where we are in the field. And I’m simply pointing out that for some concrete applications (reasons for which we want to be able to use a concept without spouting nonsense), going to this complete formalization isn’t necessary.
But yeah, if you have totally deconfused a concept, you should be able to write it down as a mathematical model/program.
Perhaps deconfusion and formalization are not identical, but I’m partial to the notion that if you’ve truly deconfused something (meaning you, personally, are no longer confused about that thing), it should not take much further effort to formalize the thing in question. (Examples of this include TurnTrout’s sequence on power-seeking, John Wentworth’s sequence on abstraction, and Scott Garrabrant’s sequence on Cartesian frames.)
So, although the path to deconfusing some concept may not involve formalizing that concept, being able to formalize the concept is necessary: if you find yourself (for some reason or other) thinking that you’ve deconfused something, but are nonetheless unable to produce a formalization of it, that’s a warning sign that you may not actually be less confused.
So I have the perspective that deconfusion requires an application. And this application in return constrains what count as a successful deconfusion. There are definitely applications for which successful deconfusion requires a formalization: e.g if you want to implement the concept/use it in a program. But I think it’s important to point out that some applications (maybe most?) don’t require that level of formalization.
What’s interesting about these examples is that only the abstraction case is considered to have completely deconfused the concept it looked at. I personally think that Alex’s power-seeking formalization is very promising and captures important aspects of the process, but it’s not yet at the point where we can unambiguously apply it to all convergent subgoals for example. Similarly, Cartesian frames sound promising but it’s not clear that they actually completely deconfuse the concept they focus on.
And that’s exactly what I want to point out: although from the outside these work look more relevant/important/serious because they are formal, the actual value comes from the back-and-forth/grounding in the weird informal intuitions, a fact that I think John, Alex and Scott would agree with.
But that doesn’t apply to so many of the valuable concepts and ideas we’ve come up with in alignment, like deception, myopia, HCH, universality, orthogonality thesis, convergent subgoals,… Once again the necessary condition isn’t one for value or usefulness, and that seems so overlooked around here. If you can formalize without losing the grounding, by all means do so because that’s a step towards more concrete concepts. Yet not being able to ground everything you think about formally doesn’t mean you’re not making progress in deconfusing them, nor that they can’t be useful at that less than perfectly formal stage.
I think it’s becoming less clear to me what you mean by deconfusion. In particular, I don’t know what to make of the following claims:
I don’t [presently] think these claims dovetail with my understanding of deconfusion. My [present] understanding of deconfusion is that (loosely speaking) it’s a process for taking ideas from the [fuzzy, intuitive, possibly ill-defined] sub-cluster and moving them to the [concrete, grounded, well-specified] sub-cluster.
I don’t think this process, as I described it, entails having an application in mind. (Perhaps I’m also misunderstanding what you mean by application!) It seems to me that, although many attempts at deconfusion-style alignment research (such as the three examples I gave in my previous comment) might be ultimately said to have been motivated by the “application” of aligning superhuman agents, in practice they happened more because somebody noticed that whenever some word/phrase/cluster-of-related-words-and-phrases came up in conversation, people would talk about them in conflicting ways, use/abuse contradictory intuitions while talking about them, and just in general (to borrow Nate’s words) “continuously accidentally spout nonsense”.
But perhaps from your perspective, that kind of thing also counts as an application, e.g. the application of “making us able to talk about the thing we actually care about”. If so, then:
I agree that it’s possible to make progress towards this goal without performing steps that look like formalization. (I would characterize this as the “philosophy” part of Luke’s post about going from philosophy to math to engineering.)
Conversely, I also agree that it’s possible to perform formalization in a way that doesn’t perfectly capture the essence of “the thing we want to talk about”, or perhaps doesn’t usefully capture it in any sense at all; if one wanted to use unkind words, one could describe the former category as “premature [formalization]”, and the latter category as “unnecessary [formalization]”. (Separately, I also see you as claiming that TurnTrout’s work on power-seeking and Scott’s work on Cartesian frames fall somewhere in the “premature” category, but this may simply be me putting words in your mouth.)
And perhaps your contention is that there’s too much research being done currently that falls under the second bullet point; or, alternatively, that too many people are pursuing research that falls (or may fall) under the second bullet point, in a way that they (counterfactually) wouldn’t if there were less (implicit) prestige attached to formal research.
If this (or something like it) is your claim, then I don’t think I necessarily disagree; in fact, it’s probably fair to say you’re in a better position to judge than I am, being “closer to the ground”. But I also don’t think this precludes my initial position from being valid, where—having laid the groundwork in the previous two bullet points—I can now characterize my initial position as [establishing the existence of] a bullet point number 3:
A successful, complete deconfusion of a concept will, almost by definition, admit to a natural formalization; if one then goes to the further step of producing such a formalism, it will be evident that the essence of the original concept is present in said formalism.
(Or, to borrow Eliezer’s words this time, “Do you understand [the concept/property/attribute] well enough to write a computer program that has it?”)
And yes, in a certain sense perhaps there might be no point to writing a computer program with the [concept/property/attribute] in question, because such a computer program wouldn’t do anything useful. But in another sense, there is a point: the point isn’t to produce a useful computer program, but to check whether your understanding has actually reached the level you think it has. If one further takes the position (as I do) that such checks are useful and necessary, then [replacing “writing a computer program” with “producing a formalism”] I claim that many productive lines of deconfusion research will in fact produce formalisms that look “premature” or even “unnecessary”, as a part of the process of checking the researchers’ understanding.
I think that about sums up [the part of] the disagreement [that I currently know how to verbalize]. I’m curious to see whether you agree this is a valid summary; let me know if (abstractly) you think I’ve been using the term “deconfusion” differently from you, or (concretely) if you disagree with anything I said about “my” version of deconfusion.
I think this is an excellent summary, and I agree with almost all of it. My only claim is that it’s easy to think that deconfusion is only useful when it results in formalization (when it is complete by your sense) but that isn’t actually true, especially at the point where we are in the field. And I’m simply pointing out that for some concrete applications (reasons for which we want to be able to use a concept without spouting nonsense), going to this complete formalization isn’t necessary.
But yeah, if you have totally deconfused a concept, you should be able to write it down as a mathematical model/program.