Strong upvote, you’re pointing at something very important here. I don’t think I’m defending epistemic modesty, I think I’m defending epistemic rigour, of the sort that’s valuable even if you’re the only person in the world.
I suspect Richard isn’t actually operating from a frame where he can produce the thing I asked for in the previous paragraphs (a strong model of where expected utility is likely to fail, a strong model of how a lack of “successful advance predictions”/”wide applications” corresponds to those likely failure modes, etc).
Yes, this is correct. In my frame, getting to a theory that’s wrong is actually the hardest part—most theories aimed at unifying phenomena from a range of different domains (aka attempted “deep fundamental theories”) are not even wrong (e.g. incoherent, underspecified, ambiguous). Perhaps they can better be understood as evocative metaphors, or intuitions pointing in a given direction, than “theories” in the standard scientific sense.
Expected utility is a well-defined theory in very limited artificial domains. When applied to the rest of the world, the big question is whether it’s actually a theory in any meaningful sense, as opposed to just a set of vague intuitions about how a formalism from a particular artificial domain generalises. (As an aside, I think of FDT as being roughly in the same category: well-defined in Newcomb’s problem and with exact duplicates, but reliant on vague intuitions to generalise to anything else.)
So my default reaction to being asked how expected utility theory is wrong about AI feels like the same way I’d react if asked how the theory of fluid dynamics is wrong about the economy. I mean, money flows, right? And the economy can be more or less turbulent… Now, this is an exaggerated analogy, because I do think that there’s something very important about consequentialism as an abstraction. But I’d like Yudkowsky to tell me what that is in a way which someone couldn’t do if they were trying to sell me on an evocative metaphor about how a technical theory should be applied outside its usual domain—and advance predictions are one of the best ways to verify that.
A more realistic example: cultural evolution. Clearly there’s a real phenomenon there, one which is crucial to human history. But calling cultural evolution a type of “evolution” is more like an evocative metaphor than a fundamental truth which we should expect to hold up in very novel circumstances (like worlds where AIs are shaping culture).
I also wrote about this intuition (using the example of the “health points” abstraction) in this comment.
I think some of your confusion may be that you’re putting “probability theory” and “Newtonian gravity” into the same bucket. You’ve been raised to believe that powerful theories ought to meet certain standards, like successful bold advance experimental predictions, such as Newtonian gravity made about the existence of Neptune (quite a while after the theory was first put forth, though). “Probability theory” also sounds like a powerful theory, and the people around you believe it, so you think you ought to be able to produce a powerful advance prediction it made; but it is for some reason hard to come up with an example like the discovery of Neptune, so you cast about a bit and think of the central limit theorem. That theorem is widely used and praised, so it’s “powerful”, and it wasn’t invented before probability theory, so it’s “advance”, right? So we can go on putting probability theory in the same bucket as Newtonian gravity?
They’re actually just very different kinds of ideas, ontologically speaking, and the standards to which we hold them are properly different ones. It seems like the sort of thing that would take a subsequence I don’t have time to write, expanding beyond the underlying obvious ontological difference between validities and empirical-truths, to cover the way in which “How do we trust this, when” differs between “I have the following new empirical theory about the underlying model of gravity” and “I think that the logical notion of ‘arithmetic’ is a good tool to use to organize our current understanding of this little-observed phenomenon, and it appears within making the following empirical predictions...” But at least step one could be saying, “Wait, do these two kinds of ideas actually go into the same bucket at all?”
In particular it seems to me that you want properly to be asking “How do we know this empirical thing ends up looking like it’s close to the abstraction?” and not “Can you show me that this abstraction is a very powerful one?” Like, imagine that instead of asking Newton about planetary movements and how we know that the particular bits of calculus he used were empirically true about the planets in particular, you instead started asking Newton for proof that calculus is a very powerful piece of mathematics worthy to predict the planets themselves—but in a way where you wanted to see some highly valuable material object that calculus had produced, like earlier praiseworthy achievements in alchemy. I think this would reflect confusion and a wrongly directed inquiry; you would have lost sight of the particular reasoning steps that made ontological sense, in the course of trying to figure out whether calculus was praiseworthy under the standards of praiseworthiness that you’d been previously raised to believe in as universal standards about all ideas.
it seems to me that you want properly to be asking “How do we know this empirical thing ends up looking like it’s close to the abstraction?” and not “Can you show me that this abstraction is a very powerful one?”
I agree that “powerful” is probably not the best term here, so I’ll stop using it going forward (note, though, that I didn’t use it in my previous comment, which I endorse more than my claims in the original debate).
But before I ask “How do we know this empirical thing ends up looking like it’s close to the abstraction?”, I need to ask “Does the abstraction even make sense?” Because you have the abstraction in your head, and I don’t, and so whenever you tell me that X is a (non-advance) prediction of your theory of consequentialism, I end up in a pretty similar epistemic state as if George Soros tells me that X is a prediction of the theory of reflexivity, or if a complexity theorist tells me that X is a prediction of the theory of self-organisation. The problem in those two cases is less that the abstraction is a bad fit for this specific domain, and more that the abstraction is not sufficiently well-defined (outside very special cases) to even be the type of thing that can robustly make predictions.
Perhaps another way of saying it is that they’re not crisp/robust/coherent concepts (although I’m open to other terms, I don’t think these ones are particularly good). And it would be useful for me to have evidence that the abstraction of consequentialism you’re using is a crisper concept than Soros’ theory of reflexivity or the theory of self-organisation. If you could explain the full abstraction to me, that’d be the most reliable way—but given the difficulties of doing so, my backup plan was to ask for impressive advance predictions, which are the type of evidence that I don’t think Soros could come up with.
I also think that, when you talk about me being raised to hold certain standards of praiseworthiness, you’re still ascribing too much modesty epistemology to me. I mainly care about novel predictions or applications insofar as they help me distinguish crisp abstractions from evocative metaphors. To me it’s the same type of rationality technique as asking people to make bets, to help distinguish post-hoc confabulations from actual predictions.
Of course there’s a social component to both, but that’s not what I’m primarily interested in. And of course there’s a strand of naive science-worship which thinks you have to follow the Rules in order to get anywhere, but I’d thank you to assume I’m at least making a more interesting error than that.
Lastly, on probability theory and Newtonian mechanics: I agree that you shouldn’t question how much sense it makes to use calculus in the way that you described, but that’s because the application of calculus to mechanics is so clearly-defined that it’d be very hard for the type of confusion I talked about above to sneak in. I’d put evolutionary theory halfway between them: it’s partly a novel abstraction, and partly a novel empirical truth. And in this case I do think you have to be very careful in applying the core abstraction of evolution to things like cultural evolution, because it’s easy to do so in a confused way.
Lastly, on probability theory and Newtonian mechanics: I agree that you shouldn’t question how much sense it makes to use calculus in the way that you described, but that’s because the application of calculus to mechanics is so clearly-defined that it’d be very hard for the type of confusion I talked about above to sneak in. I’d put evolutionary theory halfway between them: it’s partly a novel abstraction, and partly a novel empirical truth.
I think this might be a big part of the disagreement/confusion. I think of evolution (via natural selection) as something like a ‘Platonic inevitability’ in the same way that probability theory and Newtonian mechanics are. (Daniel Dennett’s book Darwin’s Dangerous Idea does a good job I think of imparting intuitions about the ‘Platonic inevitability’ of it.)
You’re right that there are empirical truths – about how well some system ‘fits’ the ‘shape’ of the abstract theory. But once you’ve ‘done the homework exercises’ of mapping a few systems to the components of the abstract theory, it seems somewhat unnecessary to repeat that same work for every new system. Similarly, once you can ‘look’ at something and observe that, e.g. there are multiple ‘discrete’ instances of some kind of abstract category, you can be (relatively) confident that counting groups or sets of those instances will ‘obey’ arithmetic.
I must admit tho that I very much appreciate some of the specific examples that other commenters have supplied for applications of expected utility theory!
(Daniel Dennett’s book Darwin’s Dangerous Idea does a good job I think of imparting intuitions about the ‘Platonic inevitability’ of it.)
Possibly when Richard says “evolutionary theory” he means stuff like ‘all life on Earth has descended with modification from a common pool of ancestors’, not just ‘selection is a thing’? It’s also an empirical claim that any of the differences between real-world organisms in the same breeding population are heritable.
‘all life on Earth has descended with modification from a common pool of ancestors’
That’s pretty reasonable, but, yes, I might not have a good sense of what Richard means by “evolutionary theory”.
It’s also an empirical claim that any of the differences between real-world organisms in the same breeding population are heritable.
Yes! That’s a good qualification and important for lots of things.
But I think the claim that any/many differences are heritable was massively overdetermined by the time Darwin published his ideas/theory of evolution via natural selection. I think it’s easy to overlook the extremely strong prior that “organisms in the same breeding population” produce offspring that is almost always , and obviously, member of the same class/category/population. That certainly seems to imply that a huge variety of possible differences are obviously heritable.
I admit tho that it’s very difficult (e.g. for me) to adopt a reasonable ‘anti-perspective’. I also remember reading something not too long ago about how systematic animal breeding was extremely rare until relatively recently, so that’s possibly not as extremely strong of evidence as it now seems like it might have been (with the benefit of hindsight).
That’s a really helpful comment (at least for me)!
But at least step one could be saying, “Wait, do these two kinds of ideas actually go into the same bucket at all?”
I’m guessing that a lot of the hidden work here and in the next steps would come from asking stuff like:
so I need to alter the bucket for each new idea, or does it instead fit in its current form each time?
does the mental act of finding that an idea fit into the bucket removes some confusion and clarifies, or is it just a mysterious answer?
Does the bucket become more simple and more elegant with each new idea that fit in it?
Is there some truth in this, or am I completely off the mark?
It seems like the sort of thing that would take a subsequence I don’t have time to write
You obviously can do whatever you want, but I find myself confused at this idea being discarded. Like, it sounds exactly like the antidote to so much confusion around these discussions and your position, such that if that was clarified, more people could contribute helpfully to the discussion, and either come to your side or point out non-trivial issues with your perspective. Which sounds really valuable for both you and the field!
So I’m left wondering:
Do you disagree with my impression of the value of such a subsequence?
Do you think it would have this value but are spending your time doing something more valuable?
Do you think it would be valuable but really don’t want to write it?
Do you think it would be valuable, you could in principle write it, but probably no one would get it even if you did?
Something else I’m failing to imagine?
Once again, you do what you want, but I feel like this would be super valuable if there was anyway of making that possible. That’s also completely relevant to my own focus on the different epistemic strategies used in alignment research, especially because we don’t have access to empirical evidence or trial and error at all for AGI-type problems.
(I’m also quite curious if you think this comment by dxu points at the same thing you are pointing at)
You obviously can do whatever you want, but I find myself confused at this idea being discarded. Like, it sounds exactly like the antidote to so much confusion around these discussions and your position, such that if that was clarified, more people could contribute helpfully to the discussion, and either come to your side or point out non-trivial issues with your perspective. Which sounds really valuable for both you and the field!
I’ma guess that Eliezer thinks there’s a long list of sequences he could write meeting these conditions, each on a different topic.
Still, I have to admit that my first reaction is that this particular sequence seems quite uniquely in a position to increase the quality of the debate and of alignment research singlehandedly. Of course, maybe I only feel that way because it’s the only one of the long list that I know of. ^^
(Another possibility I just thought of is that maybe this subsequence requires a lot of new preliminary subsequences, such that the work is far larger than you could expect from reading the words “a subsequence”. Still sounds like it would be really valuable though.
I don’t expect such a sequence to be particularly useful, compared with focusing on more object-level arguments. Eliezer says that the largest mistake he made in writing his original sequences was that he “didn’t realize that the big problem in learning this valuable way of thinking was figuring out how to practice it, not knowing the theory”. Better, I expect, to correct the specific mistakes alignment researchers are currently making, until people have enough data points to generalise better.
Do you actually think that Yudkowsky having to correct everyone’s object-level mistakes all the time is strictly more productive and will lead faster to the meat of the deconfusion than trying to state the underlying form of the argument and theory, and then adapting it to the object-level arguments and comments?
I have trouble understanding this, because for me the outcome of the first one is that no one gets it, he has to repeat himself all the time without making the debate progress, and this is one more giant hurdle for anyone trying to get into alignment and understand his position. It’s unclear whether the alternative would solve all these problems (as you quote from the preface of the Sequences, learning the theory is often easier and less useful than practicing), but it still sounds like a powerful accelerator.
There is no dichotomy of “theory or practice”, we probably need both here. And based on my own experience reading the discussion posts and the discussions I’ve seen around these posts, the object-level refutations have not been particularly useful forms of practice, even if they’re better than nothing.
Your comment is phrased as if the object-level refutations have been tried, while conveying the meta-level intuitions hasn’t been tried. If anything, it’s the opposite: the sequences (and to some extent HPMOR) are practically all content about how to think, whereas Yudkowsky hasn’t written anywhere near as extensively on object-level AI safety.
This has been valuable for community-building, but less so for making intellectual progress—because in almost all domains, the most important way to make progress is to grapple with many object-level problems, until you’ve developed very good intuitions for how those problems work. In the case of alignment, it’s hard to learn things from grappling with most of these problems, because we don’t have signals of when we’re going in the right direction. Insofar as Eliezer has correct intuitions about when and why attempted solutions are wrong, those intuitions are important training data.
By contrast, trying to first agree on very high-level epistemological principles, and then do the object-level work, has a very poor track record. See how philosophy of science has done very little to improve how science works; and how reading the sequences doesn’t improve people’s object-level rationality very much.
I model you as having a strong tendency to abstract towards higher-level discussion of epistemology in order to understand things. (I also have a strong tendency to do this, but I think yours is significantly stronger than mine.) I expect that there’s just a strong clash of intuitions here, which would be hard to resolve. But one prompt which might be useful: why aren’t epistemologists making breakthroughs in all sorts of other domains?
Thanks for giving more details about your perspective.
Your comment is phrased as if the object-level refutations have been tried, while conveying the meta-level intuitions hasn’t been tried. If anything, it’s the opposite: the sequences (and to some extent HPMOR) are practically all content about how to think, whereas Yudkowsky hasn’t written anywhere near as extensively on object-level AI safety.
It’s not clear to me that the sequences and HPMOR are good pointers for this particular approach to theory building. I mean, I’m sure there are posts in the sequences that touch on that (Einstein’s Arrogance is an example I already mentioned), but I expect that they only talk about it in passing and obliquely, and that such posts are spread all over the sequences. Plus the fact that Yudkowsky said that there was a new subsequence to write lead me to believe that he doesn’t think the information is clearly stated already.
So I don’t think you can really put the current confusion as an evidence that the explanation of how that kind of theory would work doesn’t help, given that this isn’t readily available in a form I or anyone reading this can access AFAIK.
This has been valuable for community-building, but less so for making intellectual progress—because in almost all domains, the most important way to make progress is to grapple with many object-level problems, until you’ve developed very good intuitions for how those problems work. In the case of alignment, it’s hard to learn things from grappling with most of these problems, because we don’t have signals of when we’re going in the right direction. Insofar as Eliezer has correct intuitions about when and why attempted solutions are wrong, those intuitions are important training data.
Completely agree that these intuitions are important training data. But your whole point in other comments is that we want to understand why we should expect these intuitions to differ from apparently bad/useless analogies between AGI and other stuff. And some explanation of where these intuitions come from could help with evaluating these intuitions, even more because Yudkowsky has said that he could write a sequence about the process.
By contrast, trying to first agree on very high-level epistemological principles, and then do the object-level work, has a very poor track record. See how philosophy of science has done very little to improve how science works; and how reading the sequences doesn’t improve people’s object-level rationality very much.
This sounds to me like a strawman of my position (which might be my fault for not explaining it well).
First, I don’t think explaining a methodology is a “very high-level epistemological principle”, because it let us concretely pick apart and criticize the methodology as a truthfinding method.
Second, the object-level work has already been done by Yudkowsky! I’m not saying that some outside-of-the-field epistemologist should ponder really hard about what would make sense for alignment without ever working on it concretely and then give us their teaching. Instead I’m pushing for a researcher who has built a coherent collections of intuitions and has thought about the epistemology of this process to share the latter to help us understand the former.
A bit similar to my last point, I think the correct comparison here is not “philosophers of science outside the field helping the field”, which happens but is rare as you say, but “scientists thinking about epistemology for very practical reasons”. And given that the latter is from my understanding what started the scientific revolution and a common activity of all scientists until the big paradigms were established (in Physics and biology at least) in the early 20th century, I would say there is a good track record here. (Note that this is more your specialty, so I would appreciate evidence that I’m wrong in my historical interpretation here)
I model you as having a strong tendency to abstract towards higher-level discussion of epistemology in order to understand things. (I also have a strong tendency to do this, but I think yours is significantly stronger than mine.)
Hum, I certainly like a lot of epistemic stuff, but I would say my tendencies to use epistemology are almost always grounded in concrete questions, like understanding why a given experiment tells us something relevant about what we’re studying.
I also have to admit that I’m kind of confused, because I feel like you’re consistently using the sort of epistemic discussion that I’m advocating for when discussing predictions and what gives us confidence in a theory, and yet you don’t think it would be useful to have a similar-level model of the epistemology used by Yudkowsky to make the sort of judgment you’re investigating?
I expect that there’s just a strong clash of intuitions here, which would be hard to resolve. But one prompt which might be useful: why aren’t epistemologists making breakthroughs in all sorts of other domains?
As I wrote about, I don’t think this is a good prompt, because we’re talking about scientists using epistemology to make sense of their own work there.
Here is an analogy I just thought of: I feel that in this discussion, you and Yudkowsky are talking about objects which have different types. So when you’re asking question about his model, there’s a type mismatch. And when he’s answering, having noticed the type mismatch, he’s trying to find what to ascribe it to (his answer has been quite consistently modest epistemology, which I think is clearly incorrect). Tracking the confusing does tell you some information about the type mismatch, and is probably part of the process to resolve it. But having his best description of his type (given that your type is quite standardized) would make this process far faster, by helping you triangulate the differences.
As an aside, I think of FDT as being roughly in the same category: well-defined in Newcomb’s problem and with exact duplicates, but reliant on vague intuitions to generalise to anything else.
FDT was made rigorous byinfra-Bayesianism, at least in the pseudocausal case.
(As an aside, I think of FDT as being roughly in the same category: well-defined in Newcomb’s problem and with exact duplicates, but reliant on vague intuitions to generalise to anything else.)
Is this saying that you don’t think FDT’s behavior is well-defined in a Newcomb’s-problem-like dilemma without exact duplicates?
Well, in Newcomb’s problem it’s primarily a question of how good the predictor is, not how close the duplicate is. I think FDT is well-defined in cases with an (approximately) perfect predictor, and also in cases with (very nearly) exact duplicates, but much less so in other cases.
(I think that it also makes sense to talk about FDT in cases where a perfect predictor randomises its answers x% of the time, so you know that there’s a very robust (1-x/2)% probability it’s correct. But then once we start talking about predictors that are nearer the human level, or evidence that’s more like statistical correlations, then it feels like we’re in tricky territory. Probably “non-exact duplicates in a prisoner’s dilemma” is a more central example of the problem I’m talking about; and even then it feels more robust to me than Eliezer’s applications of expected utility theory to predict big neural networks.)
Strong upvote, you’re pointing at something very important here. I don’t think I’m defending epistemic modesty, I think I’m defending epistemic rigour, of the sort that’s valuable even if you’re the only person in the world.
Yes, this is correct. In my frame, getting to a theory that’s wrong is actually the hardest part—most theories aimed at unifying phenomena from a range of different domains (aka attempted “deep fundamental theories”) are not even wrong (e.g. incoherent, underspecified, ambiguous). Perhaps they can better be understood as evocative metaphors, or intuitions pointing in a given direction, than “theories” in the standard scientific sense.
Expected utility is a well-defined theory in very limited artificial domains. When applied to the rest of the world, the big question is whether it’s actually a theory in any meaningful sense, as opposed to just a set of vague intuitions about how a formalism from a particular artificial domain generalises. (As an aside, I think of FDT as being roughly in the same category: well-defined in Newcomb’s problem and with exact duplicates, but reliant on vague intuitions to generalise to anything else.)
So my default reaction to being asked how expected utility theory is wrong about AI feels like the same way I’d react if asked how the theory of fluid dynamics is wrong about the economy. I mean, money flows, right? And the economy can be more or less turbulent… Now, this is an exaggerated analogy, because I do think that there’s something very important about consequentialism as an abstraction. But I’d like Yudkowsky to tell me what that is in a way which someone couldn’t do if they were trying to sell me on an evocative metaphor about how a technical theory should be applied outside its usual domain—and advance predictions are one of the best ways to verify that.
A more realistic example: cultural evolution. Clearly there’s a real phenomenon there, one which is crucial to human history. But calling cultural evolution a type of “evolution” is more like an evocative metaphor than a fundamental truth which we should expect to hold up in very novel circumstances (like worlds where AIs are shaping culture).
I also wrote about this intuition (using the example of the “health points” abstraction) in this comment.
I think some of your confusion may be that you’re putting “probability theory” and “Newtonian gravity” into the same bucket. You’ve been raised to believe that powerful theories ought to meet certain standards, like successful bold advance experimental predictions, such as Newtonian gravity made about the existence of Neptune (quite a while after the theory was first put forth, though). “Probability theory” also sounds like a powerful theory, and the people around you believe it, so you think you ought to be able to produce a powerful advance prediction it made; but it is for some reason hard to come up with an example like the discovery of Neptune, so you cast about a bit and think of the central limit theorem. That theorem is widely used and praised, so it’s “powerful”, and it wasn’t invented before probability theory, so it’s “advance”, right? So we can go on putting probability theory in the same bucket as Newtonian gravity?
They’re actually just very different kinds of ideas, ontologically speaking, and the standards to which we hold them are properly different ones. It seems like the sort of thing that would take a subsequence I don’t have time to write, expanding beyond the underlying obvious ontological difference between validities and empirical-truths, to cover the way in which “How do we trust this, when” differs between “I have the following new empirical theory about the underlying model of gravity” and “I think that the logical notion of ‘arithmetic’ is a good tool to use to organize our current understanding of this little-observed phenomenon, and it appears within making the following empirical predictions...” But at least step one could be saying, “Wait, do these two kinds of ideas actually go into the same bucket at all?”
In particular it seems to me that you want properly to be asking “How do we know this empirical thing ends up looking like it’s close to the abstraction?” and not “Can you show me that this abstraction is a very powerful one?” Like, imagine that instead of asking Newton about planetary movements and how we know that the particular bits of calculus he used were empirically true about the planets in particular, you instead started asking Newton for proof that calculus is a very powerful piece of mathematics worthy to predict the planets themselves—but in a way where you wanted to see some highly valuable material object that calculus had produced, like earlier praiseworthy achievements in alchemy. I think this would reflect confusion and a wrongly directed inquiry; you would have lost sight of the particular reasoning steps that made ontological sense, in the course of trying to figure out whether calculus was praiseworthy under the standards of praiseworthiness that you’d been previously raised to believe in as universal standards about all ideas.
I agree that “powerful” is probably not the best term here, so I’ll stop using it going forward (note, though, that I didn’t use it in my previous comment, which I endorse more than my claims in the original debate).
But before I ask “How do we know this empirical thing ends up looking like it’s close to the abstraction?”, I need to ask “Does the abstraction even make sense?” Because you have the abstraction in your head, and I don’t, and so whenever you tell me that X is a (non-advance) prediction of your theory of consequentialism, I end up in a pretty similar epistemic state as if George Soros tells me that X is a prediction of the theory of reflexivity, or if a complexity theorist tells me that X is a prediction of the theory of self-organisation. The problem in those two cases is less that the abstraction is a bad fit for this specific domain, and more that the abstraction is not sufficiently well-defined (outside very special cases) to even be the type of thing that can robustly make predictions.
Perhaps another way of saying it is that they’re not crisp/robust/coherent concepts (although I’m open to other terms, I don’t think these ones are particularly good). And it would be useful for me to have evidence that the abstraction of consequentialism you’re using is a crisper concept than Soros’ theory of reflexivity or the theory of self-organisation. If you could explain the full abstraction to me, that’d be the most reliable way—but given the difficulties of doing so, my backup plan was to ask for impressive advance predictions, which are the type of evidence that I don’t think Soros could come up with.
I also think that, when you talk about me being raised to hold certain standards of praiseworthiness, you’re still ascribing too much modesty epistemology to me. I mainly care about novel predictions or applications insofar as they help me distinguish crisp abstractions from evocative metaphors. To me it’s the same type of rationality technique as asking people to make bets, to help distinguish post-hoc confabulations from actual predictions.
Of course there’s a social component to both, but that’s not what I’m primarily interested in. And of course there’s a strand of naive science-worship which thinks you have to follow the Rules in order to get anywhere, but I’d thank you to assume I’m at least making a more interesting error than that.
Lastly, on probability theory and Newtonian mechanics: I agree that you shouldn’t question how much sense it makes to use calculus in the way that you described, but that’s because the application of calculus to mechanics is so clearly-defined that it’d be very hard for the type of confusion I talked about above to sneak in. I’d put evolutionary theory halfway between them: it’s partly a novel abstraction, and partly a novel empirical truth. And in this case I do think you have to be very careful in applying the core abstraction of evolution to things like cultural evolution, because it’s easy to do so in a confused way.
I think this might be a big part of the disagreement/confusion. I think of evolution (via natural selection) as something like a ‘Platonic inevitability’ in the same way that probability theory and Newtonian mechanics are. (Daniel Dennett’s book Darwin’s Dangerous Idea does a good job I think of imparting intuitions about the ‘Platonic inevitability’ of it.)
You’re right that there are empirical truths – about how well some system ‘fits’ the ‘shape’ of the abstract theory. But once you’ve ‘done the homework exercises’ of mapping a few systems to the components of the abstract theory, it seems somewhat unnecessary to repeat that same work for every new system. Similarly, once you can ‘look’ at something and observe that, e.g. there are multiple ‘discrete’ instances of some kind of abstract category, you can be (relatively) confident that counting groups or sets of those instances will ‘obey’ arithmetic.
I must admit tho that I very much appreciate some of the specific examples that other commenters have supplied for applications of expected utility theory!
Possibly when Richard says “evolutionary theory” he means stuff like ‘all life on Earth has descended with modification from a common pool of ancestors’, not just ‘selection is a thing’? It’s also an empirical claim that any of the differences between real-world organisms in the same breeding population are heritable.
That’s pretty reasonable, but, yes, I might not have a good sense of what Richard means by “evolutionary theory”.
Yes! That’s a good qualification and important for lots of things.
But I think the claim that any/many differences are heritable was massively overdetermined by the time Darwin published his ideas/theory of evolution via natural selection. I think it’s easy to overlook the extremely strong prior that “organisms in the same breeding population” produce offspring that is almost always , and obviously, member of the same class/category/population. That certainly seems to imply that a huge variety of possible differences are obviously heritable.
I admit tho that it’s very difficult (e.g. for me) to adopt a reasonable ‘anti-perspective’. I also remember reading something not too long ago about how systematic animal breeding was extremely rare until relatively recently, so that’s possibly not as extremely strong of evidence as it now seems like it might have been (with the benefit of hindsight).
That’s a really helpful comment (at least for me)!
I’m guessing that a lot of the hidden work here and in the next steps would come from asking stuff like:
so I need to alter the bucket for each new idea, or does it instead fit in its current form each time?
does the mental act of finding that an idea fit into the bucket removes some confusion and clarifies, or is it just a mysterious answer?
Does the bucket become more simple and more elegant with each new idea that fit in it?
Is there some truth in this, or am I completely off the mark?
You obviously can do whatever you want, but I find myself confused at this idea being discarded. Like, it sounds exactly like the antidote to so much confusion around these discussions and your position, such that if that was clarified, more people could contribute helpfully to the discussion, and either come to your side or point out non-trivial issues with your perspective. Which sounds really valuable for both you and the field!
So I’m left wondering:
Do you disagree with my impression of the value of such a subsequence?
Do you think it would have this value but are spending your time doing something more valuable?
Do you think it would be valuable but really don’t want to write it?
Do you think it would be valuable, you could in principle write it, but probably no one would get it even if you did?
Something else I’m failing to imagine?
Once again, you do what you want, but I feel like this would be super valuable if there was anyway of making that possible. That’s also completely relevant to my own focus on the different epistemic strategies used in alignment research, especially because we don’t have access to empirical evidence or trial and error at all for AGI-type problems.
(I’m also quite curious if you think this comment by dxu points at the same thing you are pointing at)
Sounds like you should try writing it.
I’ma guess that Eliezer thinks there’s a long list of sequences he could write meeting these conditions, each on a different topic.
Good point, I hadn’t thought about that one.
Still, I have to admit that my first reaction is that this particular sequence seems quite uniquely in a position to increase the quality of the debate and of alignment research singlehandedly. Of course, maybe I only feel that way because it’s the only one of the long list that I know of. ^^
(Another possibility I just thought of is that maybe this subsequence requires a lot of new preliminary subsequences, such that the work is far larger than you could expect from reading the words “a subsequence”. Still sounds like it would be really valuable though.
I don’t expect such a sequence to be particularly useful, compared with focusing on more object-level arguments. Eliezer says that the largest mistake he made in writing his original sequences was that he “didn’t realize that the big problem in learning this valuable way of thinking was figuring out how to practice it, not knowing the theory”. Better, I expect, to correct the specific mistakes alignment researchers are currently making, until people have enough data points to generalise better.
I’m honestly confused by this answer.
Do you actually think that Yudkowsky having to correct everyone’s object-level mistakes all the time is strictly more productive and will lead faster to the meat of the deconfusion than trying to state the underlying form of the argument and theory, and then adapting it to the object-level arguments and comments?
I have trouble understanding this, because for me the outcome of the first one is that no one gets it, he has to repeat himself all the time without making the debate progress, and this is one more giant hurdle for anyone trying to get into alignment and understand his position. It’s unclear whether the alternative would solve all these problems (as you quote from the preface of the Sequences, learning the theory is often easier and less useful than practicing), but it still sounds like a powerful accelerator.
There is no dichotomy of “theory or practice”, we probably need both here. And based on my own experience reading the discussion posts and the discussions I’ve seen around these posts, the object-level refutations have not been particularly useful forms of practice, even if they’re better than nothing.
Your comment is phrased as if the object-level refutations have been tried, while conveying the meta-level intuitions hasn’t been tried. If anything, it’s the opposite: the sequences (and to some extent HPMOR) are practically all content about how to think, whereas Yudkowsky hasn’t written anywhere near as extensively on object-level AI safety.
This has been valuable for community-building, but less so for making intellectual progress—because in almost all domains, the most important way to make progress is to grapple with many object-level problems, until you’ve developed very good intuitions for how those problems work. In the case of alignment, it’s hard to learn things from grappling with most of these problems, because we don’t have signals of when we’re going in the right direction. Insofar as Eliezer has correct intuitions about when and why attempted solutions are wrong, those intuitions are important training data.
By contrast, trying to first agree on very high-level epistemological principles, and then do the object-level work, has a very poor track record. See how philosophy of science has done very little to improve how science works; and how reading the sequences doesn’t improve people’s object-level rationality very much.
I model you as having a strong tendency to abstract towards higher-level discussion of epistemology in order to understand things. (I also have a strong tendency to do this, but I think yours is significantly stronger than mine.) I expect that there’s just a strong clash of intuitions here, which would be hard to resolve. But one prompt which might be useful: why aren’t epistemologists making breakthroughs in all sorts of other domains?
Thanks for giving more details about your perspective.
It’s not clear to me that the sequences and HPMOR are good pointers for this particular approach to theory building. I mean, I’m sure there are posts in the sequences that touch on that (Einstein’s Arrogance is an example I already mentioned), but I expect that they only talk about it in passing and obliquely, and that such posts are spread all over the sequences. Plus the fact that Yudkowsky said that there was a new subsequence to write lead me to believe that he doesn’t think the information is clearly stated already.
So I don’t think you can really put the current confusion as an evidence that the explanation of how that kind of theory would work doesn’t help, given that this isn’t readily available in a form I or anyone reading this can access AFAIK.
Completely agree that these intuitions are important training data. But your whole point in other comments is that we want to understand why we should expect these intuitions to differ from apparently bad/useless analogies between AGI and other stuff. And some explanation of where these intuitions come from could help with evaluating these intuitions, even more because Yudkowsky has said that he could write a sequence about the process.
This sounds to me like a strawman of my position (which might be my fault for not explaining it well).
First, I don’t think explaining a methodology is a “very high-level epistemological principle”, because it let us concretely pick apart and criticize the methodology as a truthfinding method.
Second, the object-level work has already been done by Yudkowsky! I’m not saying that some outside-of-the-field epistemologist should ponder really hard about what would make sense for alignment without ever working on it concretely and then give us their teaching. Instead I’m pushing for a researcher who has built a coherent collections of intuitions and has thought about the epistemology of this process to share the latter to help us understand the former.
A bit similar to my last point, I think the correct comparison here is not “philosophers of science outside the field helping the field”, which happens but is rare as you say, but “scientists thinking about epistemology for very practical reasons”. And given that the latter is from my understanding what started the scientific revolution and a common activity of all scientists until the big paradigms were established (in Physics and biology at least) in the early 20th century, I would say there is a good track record here.
(Note that this is more your specialty, so I would appreciate evidence that I’m wrong in my historical interpretation here)
Hum, I certainly like a lot of epistemic stuff, but I would say my tendencies to use epistemology are almost always grounded in concrete questions, like understanding why a given experiment tells us something relevant about what we’re studying.
I also have to admit that I’m kind of confused, because I feel like you’re consistently using the sort of epistemic discussion that I’m advocating for when discussing predictions and what gives us confidence in a theory, and yet you don’t think it would be useful to have a similar-level model of the epistemology used by Yudkowsky to make the sort of judgment you’re investigating?
As I wrote about, I don’t think this is a good prompt, because we’re talking about scientists using epistemology to make sense of their own work there.
Here is an analogy I just thought of: I feel that in this discussion, you and Yudkowsky are talking about objects which have different types. So when you’re asking question about his model, there’s a type mismatch. And when he’s answering, having noticed the type mismatch, he’s trying to find what to ascribe it to (his answer has been quite consistently modest epistemology, which I think is clearly incorrect). Tracking the confusing does tell you some information about the type mismatch, and is probably part of the process to resolve it. But having his best description of his type (given that your type is quite standardized) would make this process far faster, by helping you triangulate the differences.
FDT was made rigorous by infra-Bayesianism, at least in the pseudocausal case.
Is this saying that you don’t think FDT’s behavior is well-defined in a Newcomb’s-problem-like dilemma without exact duplicates?
Well, in Newcomb’s problem it’s primarily a question of how good the predictor is, not how close the duplicate is. I think FDT is well-defined in cases with an (approximately) perfect predictor, and also in cases with (very nearly) exact duplicates, but much less so in other cases.
(I think that it also makes sense to talk about FDT in cases where a perfect predictor randomises its answers x% of the time, so you know that there’s a very robust (1-x/2)% probability it’s correct. But then once we start talking about predictors that are nearer the human level, or evidence that’s more like statistical correlations, then it feels like we’re in tricky territory. Probably “non-exact duplicates in a prisoner’s dilemma” is a more central example of the problem I’m talking about; and even then it feels more robust to me than Eliezer’s applications of expected utility theory to predict big neural networks.)