Oh, I can just give you a class of nontrivial predictions of expected utility theory. I have not seen any empirical results on whether these actually hold, so consider them advance predictions.
So, a bacteria needs a handful of different metabolic resources—most obviously energy (i.e. ATP), but also amino acids, membrane lipids, etc. And often bacteria can produce some metabolic resources via multiple different paths, including cyclical paths—e.g. it’s useful to be able to turn A into B but also B into A, because sometimes the environment will have lots of B and other times it will have lots of A. Now, there’s the obvious prediction that the bacteria won’t waste energy turning B into A and then back into B again—i.e. it will suppress one of those two pathways (assuming the cycle is energy-burning), depending on which metabolite is more abundant. Utility generalizes this idea to arbitrarily many reactions and products, and predicts that at any given time we can assign some (non-unique) “values” to each metabolite (including energy carriers), such that any reaction whose reactants have more total “value” than its products is suppressed (or at least not catalyzed; the cell doesn’t really have good ways to suppress spontaneous reactions other than putting things in separate compartments).
Of course in practice this will be an approximation, and there may be occasional exceptions where the cell is doing something the model doesn’t capture. If we were to do this sort of analysis in a signalling network rather than a metabolic network, for instance, there would likely be many exceptions: cells sometimes burn energy to maintain a concentration at a specific level, or to respond quickly to changes, and this particular model doesn’t capture the “value” of information-content in signals; we’d have to extend our value-function in order for the utility framework to capture that. But for metabolic networks, I expect that to mostly not be an issue.
That’s really just utility theory; expected utility theory would involve an organism storing some resources over time (like e.g. fat). Then we’d expect to be able to assign “values” such that the relative “value” assigned to stored resources which are not currently used is a weighted sum of the “values” assigned to those resources in different possible future environments (of the sort the organism might find itself in after something like its current environment, in the ancestral world), and the weights in the sums should be consistent. (This is a less-fleshed-out prediction than the other one, but hopefully it’s enough of a sketch to give the idea.)
Of course, if we understand expected utility theory deeply, then these predictions are quite trivial; they’re just saying that organisms won’t make pareto-suboptimal use of their resources! It’s one of those predictions where, if it’s false, then we’ve probably discovered something interesting—most likely some place where an organism is spending resources to do something useful which we haven’t understood yet. [EDIT-TO-ADD: This is itself intended as a falsifiable prediction—if we go look at an anomaly and don’t find any unaccounted-for phenomenon, then that’s a very big strike against expected utility theory.] And that’s the really cool prediction here: it gives us a tool to uncover unknown-unknowns in our understanding of a cell’s behavior.
(Note that I only read the whole Epistemology section of this post and skimmed the rest, so I might be saying stuff that are repeated/resolved elsewhere. Please point me to the relevant parts/quotes if that’s the case. ;) )
Einstein’s arrogance sounds to me like an early pointer in the Sequences for that kind of thing, with a specific claim about General Relativity being that kind of theory.
That being said, I still understand Richard’s position and difficulty with this whole part (or at least what I read of Richard’s difficulty). He’s coming from the perspective of philosophy of science, which has focused mostly on ideas related to advanced predictions and taking into account the mental machinery of humans to catch biases and mistakes that we systematically make. The Sequences also spend a massive amount of words on exactly this, and yet in this discussion (and in select points in the Sequences like the aforementioned post), Yudkowsky sounds a bit like considering that his fundamental theory/observation doesn’t need any of these to be accepted as obvious (I don’t think he is thinking that way, but that’s hard to extract out of the text).
It’s even more frustrating because Yudkowsky focuses on “showing epistemic modesty” as his answer/rebuttal to Richard’s inquiry, when Richard just sounds like he’s asking the completely relevant question “why should we take your word on it?” And the confusion IMO is because the last sentence sounds very status-y (How do you dare claiming such crazy stuff?), but I’m pretty convinced Richard actually means it in a very methodological/philosophy of science/epistemic strategies way of “What are the ways of thinking that you’re using here that you expect to be particularly good at aiming at the truth?”
Furthermore, I agree with (my model of) Richard that the main issue with the way Yudkowsky (and you John) are presenting your deep idea is that you don’t give a way of showing it wrong. For example, you (John) write:
It’s one of those predictions where, if it’s false, then we’ve probably discovered something interesting—most likely some place where an organism is spending resources to do something useful which we haven’t understood yet.
And even if I feel what you’re gesturing at, this sounds/looks like you’re saying “even if my prediction is false, that doesn’t mean that my theory would be invalidated”. Whereas I feel you want to convey something like “this is not a prediction/part of the theory that has the ability to falsify the theory” or “it’s part of the obvious wiggle room of the theory”. What I want is a way of finding the parts of the theory/model/prediction that could actually invalidate it, because that’s what we should be discussing really. (A difficulty might be that such theories are so fundamental and powerful than being able to see them makes it really hard to find any way they could go wrong and endanger the theory)
An analogy that comes to my mind is with the barriers for proving P vs NP. These make explicit ways in which you can’t solve the P vs NP question, such that it becomes far easier to weed proof attempts out. My impression is that You (Yudkowky and John) have models/generators that help you see at a glance that a given alignment proposal will fail. Which is awesome! I want to be able to find and extract and use those. But what Richard is pointing out IMO is that having the generators explicit would give us a way to stress test them, which is a super important step to start believing in them further. Just like we want people to actually try to go beyond GR, and for that they need to understand it deeply.
(Obviously, maybe the problem is that as you two are pointing it out, making the models/generators explicit and understandable is just really hard and you don’t know how to do that. That’s fair).
It’s one of those predictions where, if it’s false, then we’ve probably discovered something interesting—most likely some place where an organism is spending resources to do something useful which we haven’t understood yet.
… is also intended as a falsifiable prediction. Like, if we go look at the anomaly and there’s no new thing going on there, then that’s a very big strike against expected utility theory.
This particular type of fallback-prediction is a common one in general: we have some theory which makes predictions, but “there’s a phenomenon which breaks one of the modelling assumption in a way noncentral to the main theory” is a major way the predictions can fail. But then we expect to be able to go look and find the violation of that noncentral modelling assumption, which would itself yield some interesting information. If we don’t find such a violation, it’s a big strike against the theory.
This particular type of fallback-prediction is a common one in general: we have some theory which makes predictions, but “there’s a phenomenon which breaks one of the modelling assumption in a way noncentral to the main theory” is a major way the predictions can fail.
That’s a great way of framing it! And a great way of thinking about why these are not failures that are “worrysome” at first/in most cases.
And even if I feel what you’re gesturing at, this sounds/looks like you’re saying “even if my prediction is false, that doesn’t mean that my theory would be invalidated”.
So, thermodynamics also feels like a deep fundamental theory to me, and one of the predictions it makes is “you can’t make an engine more efficient than a Carnot engine.” Suppose someone exhibits an engine that appears to be more efficient than a Carnot engine; my response is not going to be “oh, thermodynamics is wrong”, and instead it’s going to be “oh, this engine is making use of some unseen source.”
[Of course, you can show me enough such engines that I end up convinced, or show me the different theoretical edifice that explains both the old observations and these new engines.]
What I want is a way of finding the parts of the theory/model/prediction that could actually invalidate it, because that’s what we should be discussing really. (A difficulty might be that such theories are so fundamental and powerful than being able to see them makes it really hard to find any way they could go wrong and endanger the theory)
So, later Eliezer gives “addition” as an example of a deep fundamental theory. And… I’m not sure I can imagine a universe where addition is wrong? Like, I can say “you would add 2 and 2 and get 5″ but that sentence doesn’t actually correspond to any universes.
Like, similarly, I can imagine universes where evolution doesn’t describe the historical origin of species in that universe. But I can’t imagine universes where the elements of evolution are present and evolution doesn’t happen.
[That said, I can imagine universes with Euclidean geometry and different universes with non-Euclidean geometry, so I’m not trying to claim this is true of all deep fundamental theories, but maybe the right way to think about this is “geometry except for the parallel postulate” is the deep fundamental theory.]
So, thermodynamics also feels like a deep fundamental theory to me, and one of the predictions it makes is “you can’t make an engine more efficient than a Carnot engine.” Suppose someone exhibits an engine that appears to be more efficient than a Carnot engine; my response is not going to be “oh, thermodynamics is wrong”, and instead it’s going to be “oh, this engine is making use of some unseen source.”
My gut reaction here is that “you can’t make an engine more efficient than a Carnot engine” is not the right kind of prediction to try to break thermodynamics, because even if you could break it in principle, staying at that level without going into the detailed mechanisms of thermodynamics will only make you try the same thing as everyone else does. Do you think that’s an adequate response to your point, or am I missing what you’re trying to say?
So, later Eliezer gives “addition” as an example of a deep fundamental theory. And… I’m not sure I can imagine a universe where addition is wrong? Like, I can say “you would add 2 and 2 and get 5″ but that sentence doesn’t actually correspond to any universes.
Like, similarly, I can imagine universes where evolution doesn’t describe the historical origin of species in that universe. But I can’t imagine universes where the elements of evolution are present and evolution doesn’t happen.
[That said, I can imagine universes with Euclidean geometry and different universes with non-Euclidean geometry, so I’m not trying to claim this is true of all deep fundamental theories, but maybe the right way to think about this is “geometry except for the parallel postulate” is the deep fundamental theory.]
The mental move I’m doing for each of these examples is not imagining universes where addition/evolution/other deep theory is wrong, but imagining phenomena/problems where addition/evolution/other deep theory is not adapted. If you’re describing something that doesn’t commute, addition might be a deep theory, but it’s not useful for what you want. Similarly, you could argue that given how we’re building AIs and trying to build AGI, evolution is not the deep theory that you want to use.
It sounds to me like you (and your internal-Yudkowsky) are using “deep fundamental theory” to mean “powerful abstraction that is useful in a lot of domains”. Which addition and evolution fundamentally are. But claiming that the abstraction is useful in some new domain requires some justification IMO. And even if you think the burden of proof is on the critics, the difficulty of formulating the generators makes that really hard.
Once again, do you think that answers your point adequately?
From my (dxu’s) perspective, it’s allowable for there to be “deep fundamental theories” such that, once you understand those theories well enough, you lose the ability to imagine coherent counterfactual worlds where the theories in question are false.
To use thermodynamics as an example: the first law of thermodynamics (conservation of energy) is actually a consequence of Noether’s theorem, which ties conserved quantities in physics to symmetries in physical laws. Before someone becomes aware of this, it’s perhaps possible for them to imagine a universe exactly like our own, except that energy is not conserved; once they understand the connection implied by Noether’s theorem, this becomes an incoherent notion: you cannot remove the conservation-of-energy property without changing deep aspects of the laws of physics.
The second law of thermodynamics is similarly deep: it’s actually a consequence of there being a (low-entropy) boundary condition at the beginning of the universe, but no corresponding (low-entropy) boundary condition at any future state. This asymmetry in boundary conditions is what causes entropy to appear directionally increasing—and again, once someone becomes aware of this, it is no longer possible for them to imagine living in a universe which started out in a very low-entropy state, but where the second law of thermodynamics does not hold.
In other words, thermodynamics as a “deep fundamental theory” is not merely [what you characterized as] a “powerful abstraction that is useful in a lot of domains”. Thermodynamics is a logically necessary consequence of existing, more primitive notions—and the fact that (historically) we arrived at our understanding of thermodynamics via a substantially longer route (involving heat engines and the like), without noticing this deep connection until much later on, does not change the fact that grasping said deep connection allows one to see “at a glance” why the laws of thermodynamics inevitably follow.
Of course, this doesn’t imply infinite certainty, but it does imply a level of certainty substantially higher than what would be assigned merely to a “powerful abstraction that is useful in a lot of domains”. So the relevant question would seem to be: given my above described epistemic state, how might one convince me that the case for thermodynamics is not as airtight as I currently think it is? I think there are essentially two angles of attack: (1) convince me that the arguments for thermodynamics being a logically necessary consequence of the laws of physics are somehow flawed, or (2) convince me that the laws of physics don’t have the properties I think they do.
Both of these are hard to do, however—and for good reason! And absent arguments along those lines, I don’t think I am (or should be) particularly moved by [what you characterized as] philosophy-of-science-style objections about “advance predictions”, “systematic biases”, and the like. I think there are certain theories for which the object-level case is strong enough that it more or less screens off meta-level objections; and I think this is right, and good.
Which is to say:
The mental move I’m doing for each of these examples is not imagining universes where addition/evolution/other deep theory is wrong, but imagining phenomena/problems where addition/evolution/other deep theory is not adapted. If you’re describing something that doesn’t commute, addition might be a deep theory, but it’s not useful for what you want. Similarly, you could argue that given how we’re building AIs and trying to build AGI, evolution is not the deep theory that you want to use. (emphasis mine)
I think you could argue this, yes—but the crucial point is that you have to actually argue it. You have to (1) highlight some aspect of the evolutionary paradigm, (2) point out [what appears to you to be] an important disanalogy between that aspect and [what you expect cognition to look like in] AGI, and then (3) argue that that disanalogy directly undercuts the reliability of the conclusions you would like to contest. In other words, you have to do things the “hard way”—no shortcuts.
...and the sense I got from Richard’s questions in the post (as well as the arguments you made in this subthread) is one that very much smells like a shortcut is being attempted. This is why I wrote, in my other comment, that
I don’t think I have a good sense of the implied objections contained within Richard’s model. That is to say: I don’t have a good handle on the way(s) in which Richard expects expected utility theoryto fail, even conditioning on Eliezer being wrong about the theory being useful. I think this important because—absent a strong model of expected utility theory’s likely failure modes—I don’t think questions of the form “but why hasn’t your theory made a lot of successful advance predictions yet?” move me very much on the object level.
I think I share Eliezer’s sense of not really knowing what Richard means by “deep fundamental theory” or “wide range of applications we hadn’t previous thought of”, and I think what would clarify this for me would have been for Richard to provide examples of “deep fundamental theories [with] a wide range of applications we hadn’t previously thought of”, accompanied by an explanation of why, if those applications hadn’t been present, that would have indicated something wrong with the theory.
My objection is mostly fleshed out in my other comment. I’d just flag here that “In other words, you have to do things the “hard way”—no shortcuts” assigns the burden of proof in a way which I think is not usually helpful. You shouldn’t believe my argument that I have a deep theory linking AGI and evolution unless I can explain some really compelling aspects of that theory. Because otherwise you’ll also believe in the deep theory linking AGI and capitalism, and the one linking AGI and symbolic logic, and the one linking intelligence and ethics, and the one linking recursive self-improvement with cultural evolution, etc etc etc.
Now, I’m happy to agree that all of the links I just mentioned are useful lenses which help you understand AGI. But for utility theory to do the type of work Eliezer tries to make it do, it can’t just be a useful lens—it has to be something much more fundamental. And that’s what I don’t think Eliezer’s established.
It also isn’t clear to me that Eliezer has established the strong inferences he draws from noticing this general pattern (“expected utility theory/consequentialism”). But when you asked Eliezer (in the original dialogue) to give examples of successful predictions, I was thinking “No, that’s not how these things work.” In the mistaken applications of Grand Theories you mention (AGI and capitalism, AGI and symbolic logic, intelligence and ethics, recursive self-improvement and cultural evolution, etc.), the easiest way to point out why they are dumb is with counterexamples. We can quickly “see” the counterexamples. E.g., if you’re trying to see AGI as the next step in capitalism, you’ll be able to find counterexamples where things become altogether different (misaligned AI killing everything; singleton that brings an end to the need to compete). By contrast, if the theory fits, you’ll find that whenever you try to construct such a counterexample, it is just a non-central (but still valid) manifestation of the theory. Eliezer would probably say that people who are good at this sort of thinking will quickly see how the skeptics’ counterexamples fall relevantly short.
---
The reason I remain a bit skeptical about Eliezer’s general picture: I’m not sure if his thinking about AGI makes implicit questionable predictions about humans
I don’t understand his thinking well enough to be confident that it doesn’t
It seems to me that Eliezer_2011 placed weirdly strong emphasis on presenting humans in ways that matched the pattern “(scary) consequentialism always generalizes as you scale capabilities.” I consider some of these claims false or at least would want to make the counterexamples more salient
For instance:
Eliezer seemed to think that “extremely few things are worse than death” is something all philosophically sophisticated humans would agree with
Early writings on CEV seemed to emphasize things like the “psychological unity of humankind” and talk as though humans would mostly have the same motivational drives, also with respect to how it relates to “enjoying being agenty” as opposed to “grudgingly doing agenty things but wishing you could be done with your obligations faster”
In HPMOR all the characters are either not philosophically sophisticated or they were amped up into scary consequentialists plotting all the time
All of the above could be totally innocent matters of wanting to emphasize the thing that other commenters were missing, so they aren’t necessarily indicative of overlooking certain possibilities. Still, the pattern there makes me wonder if maybe Eliezer hasn’t spent a lot of time imagining what sorts of motivations humans can have that make them benign not in terms outcome-related ethics (what they want the world to look like), but relational ethics (who they want to respect or assist, what sort of role model they want to follow). It makes me wonder if it’s really true that when you try to train an AI to be helpful and corrigible, the “consequentialism-wants-to-become-agenty-with-its-own-goals part” will be stronger than the “helping this person feels meaningful” part. (Leading to an agent that’s consequentialist about following proper cognition rather than about other world-outcomes.)
FWIW I think I mostly share Eliezer’s intuitions about the arguments where he makes them; I just feel like I lack the part of his picture that lets him discount the observation that some humans are interpersonally corrigible and not all that focused on other explicit goals, and that maybe this means corrigibility has a crisp/natural shape after all.
the easiest way to point out why they are dumb is with counterexamples. We can quickly “see” the counterexamples. E.g., if you’re trying to see AGI as the next step in capitalism, you’ll be able to find counterexamples where things become altogether different (misaligned AI killing everything; singleton that brings an end to the need to compete).
I’m not sure how this would actually work. The proponent of the AGI-capitalism analogy might say “ah yes, AGI killing everyone is another data point on the trend of capitalism becoming increasingly destructive”. Or they might say (as Marx did) that capitalism contains the seeds of its own destruction. Or they might just deny that AGI will play out the way you claim, because their analogy to capitalism is more persuasive than your analogy to humans (or whatever other reasoning you’re using). How do you then classify this as a counterexample rather than a “non-central (but still valid) manifestation of the theory”?
My broader point is that these types of theories are usually sufficiently flexible that they can “predict” most outcomes, which is why it’s so important to pin them down by forcing them to make advance predictions.
On the rest of your comment, +1. I think that one of the weakest parts of Eliezer’s argument was when he appealed to the difference between von Neumann and the village idiot in trying to explain why the next step above humans will be much more consequentialist than most humans (although unfortunately I failed to pursue this point much in the dialogue).
How do you then classify this as a counterexample rather than a “non-central (but still valid) manifestation of the theory”?
My only reply is “You know it when you see it.” And yeah, a crackpot would reason the same way, but non-modest epistemology says that if it’s obvious to you that you’re not a crackpot then you have to operate on the assumption that you’re not a crackpot. (In the alternative scenario, you won’t have much impact anyway.)
Specifically, the situation I mean is the following:
You have an epistemic track record like Eliezer or someone making lots of highly upvoted posts in our communities.
You find yourself having strong intuitions about how to apply powerful principles like “consequentialism” to new domains, and your intuitions are strong because it feels to you like you have a gears-level understanding that others lack. You trust your intuitions in cases like these.
My recommended policy in cases where this applies is “trust your intuitions and operate on the assumption that you’re not a crackpot.”
Maybe there’s a potential crux here about how much of scientific knowledge is dependent on successful predictions. In my view, the sequences have convincingly argued that locating the hypothesis in the first place is often done in the absence of already successful predictions, which goes to show that there’s a core of “good reasoning” that lets you jump to (tentative) conclusions, or at least good guesses, much faster than if you were to try lots of things at random.
My recommended policy in cases where this applies is “trust your intuitions and operate on the assumption that you’re not a crackpot.”
Oh, certainly Eliezer should trust his intuitions and believe that he’s not a crackpot. But I’m not arguing about what the person with the theory should believe, I’m arguing about what outside observers should believe, if they don’t have enough time to fully download and evaluate the relevant intuitions. Asking the person with the theory to give evidence that their intuitions track reality isn’t modest epistemology.
Damn. I actually think you might have provided the first clear pointer I’ve seen about this form of knowledge production, why and how it works, and what could break it. There’s a lot to chew on in this reply, but thanks a lot for the amazing food for thought!
(I especially like that you explained the physical points and put links that actually explain the specific implication)
And I agree (tentatively) that a lot of the epistemology of science stuff doesn’t have the same object-level impact. I was not claiming that normal philosophy of science was required, just that if that was not how we should evaluate and try to break the deep theory, I wanted to understand how I was supposed to do that.
The difference between evolution and gradient descent is sexual selection and predator/prey/parasite relations.
Agents running around inside everywhere—completely changes the process.
Likewise for comparing any kind of flat optimization or search to evolution. I think sexual selection and predator-prey made natural selection dramatically more efficient.
So I think it’s pretty fair to object that you don’t take evolution as adequate evidence to expect this flat, dead, temporary number cruncher will blow up in exponential intelligence.
I think there are other reasons to expect that though.
I haven’t read these 500 pages of dialogues so somebody probably made this point already.
I think “deep fundamental theory” is deeper than just “powerful abstraction that is useful in a lot of domains”.
Part of what makes a Deep Fundamental Theory deeper is that it is inevitably relevant for anything existing in a certain way. For example, Ramón y Cajal (discoverer of the neuronal structure of brains) wrote:
Before the correction of the law of polarization, we have thought in vain about the usefulness of the referred facts. Thus, the early emergence of the axon, or the displacement of the soma, appeared to us as unfavorable arrangements acting against the conduction velocity, or the convenient separation of cellulipetal and cellulifugal impulses in each neuron. But as soon as we ruled out the requirement of the passage of the nerve impulse through the soma, everything became clear; because we realized that the referred displacements were morphologic adaptations ruled by the laws of economy of time, space and matter. These laws of economy must be considered as the teleological causes that preceded the variations in the position of the soma and the emergence of the axon. They are so general and evident that, if carefully considered, they impose themselves with great force on the intellect, and once becoming accepted, they are firm bases for the theory of axipetal polarization.
At first, I was surprised to see that the structure of physical space gave the fundamental principles in neuroscience too! But then I realized I shouldn’t have been: neurons exist in physical spacetime. It’s not a coincidence that neurons look like lightning: they’re satisfying similar constraints in the same spatial universe. And once observed, it’s easy to guess that what Ramón y Cajal might call “economy of metabolic energy” is also a fundamental principle of neuroscience, which of course is attested by modern neuroscientists. That’s when I understood that spatial structure is a Deep Fundamental Theory.
And it doesn’t stop there. The same thing explains the structure of our roadways, blood vessels, telecomm networks, and even why the first order differential equations for electric currents, masses on springs, and water in pipes are the same.
(The exact deep structure of physical space which explains all of these is differential topology, which I think is what Vaniver was gesturing towards with “geometry except for the parallel postulate”.)
That’s when I understood that spatial structure is a Deep Fundamental Theory.
And it doesn’t stop there. The same thing explains the structure of our roadways, blood vessels, telecomm networks, and even why the first order differential equations for electric currents, masses on springs, and water in pipes are the same.
(The exact deep structure of physical space which explains all of these is differential topology, which I think is what Vaniver was gesturing towards with “geometry except for the parallel postulate”.)
Can you go into more detail here? I have done a decent amount of maths but always had trouble in physics due to my lack of physical intuition, so it might be completely obvious but I’m not clear about what is “that same thing” or how it explains all your examples? Is it about shortest path? What aspect of differential topology (a really large field) captures it?
(Maybe you literally can’t explain it to me without me seeing the deep theory, which would be frustrating, but I’d want to know if that was the case. )
There’s more than just differential topology going on, but it’s the thing that unifies it all. You can think of differential topology as being about spaces you can divide into cells, and the boundaries of those cells. Conservation laws are naturally expressed here as constraints that the net flow across the boundary must be zero. This makes conserved quantities into resources, for which the use of is convergently minimized. Minimal structures with certain constraints are thus led to forming the same network-like shapes, obeying the same sorts of laws. (See chapter 3 of Grady’s Discrete Calculus for details of how this works in the electric circuit case.)
The mental move I’m doing for each of these examples is not imagining universes where addition/evolution/other deep theory is wrong, but imagining phenomena/problems where addition/evolution/other deep theory is not adapted. If you’re describing something that doesn’t commute, addition might be a deep theory, but it’s not useful for what you want.
Yeah, this seems reasonable to me. I think “how could you tell that theory is relevant to this domain?” seems like a reasonable question in a way that “what predictions does that theory make?” seems like it’s somehow coming at things from the wrong angle.
Thanks! I think that this is a very useful example of an advance prediction of utility theory; and that gathering more examples like this is one of the most promising way to make progress on bridging the gap between Eliezer’s and most other people’s understandings of consequentialism.
Potentially important thing to flag here: at least in my mind, expected utility theory (i.e. the property Eliezer was calling “laser-like” or “coherence”) and consequentialism are two distinct things. Consequentialism will tend to produce systems with (approximate) coherent expected utilities, and that is one major way I expect coherent utilities to show up in practice. But coherent utilities can in-principle occur even without consequentialism (e.g. conservative vector fields in physics), and consequentialism can in-principle not be very coherent (e.g. if it just has tons of resources and doesn’t have to be very efficient to achieve a goal-state).
(I’m not sure whether Eliezer would agree with this. The thing-I-think-Eliezer-means-by-consequentialism does not yet have a good mathematical formulation which I know of, which makes it harder to check that two people even mean the same thing when pointing to the concept.)
My model of Eliezer says that there is some deep underlying concept of consequentialism, of which the “not very coherent consequentialism” is a distorted reflection; and that this deep underlying concept is very closely related to expected utility theory. (I believe he said at one point that he started using the word “consequentialism” instead of “expected utility maximisation” mainly because people kept misunderstanding what he meant by the latter.)
I don’t know enough about conservative vector fields to comment, but on priors I’m pretty skeptical of this being a good example of coherent utilities; I also don’t have a good guess about what Eliezer would say here.
I don’t know enough about conservative vector fields to comment, but on priors I’m pretty skeptical of this being a good example of coherent utilities; I also don’t have a good guess about what Eliezer would say here.
I think johnswentworth (and others) are claiming that they have the same ‘math’/‘shape’, which seems much more likely (if you trust their claims about such things generally).
Oh, I can just give you a class of nontrivial predictions of expected utility theory. I have not seen any empirical results on whether these actually hold, so consider them advance predictions.
So, a bacteria needs a handful of different metabolic resources—most obviously energy (i.e. ATP), but also amino acids, membrane lipids, etc. And often bacteria can produce some metabolic resources via multiple different paths, including cyclical paths—e.g. it’s useful to be able to turn A into B but also B into A, because sometimes the environment will have lots of B and other times it will have lots of A. Now, there’s the obvious prediction that the bacteria won’t waste energy turning B into A and then back into B again—i.e. it will suppress one of those two pathways (assuming the cycle is energy-burning), depending on which metabolite is more abundant. Utility generalizes this idea to arbitrarily many reactions and products, and predicts that at any given time we can assign some (non-unique) “values” to each metabolite (including energy carriers), such that any reaction whose reactants have more total “value” than its products is suppressed (or at least not catalyzed; the cell doesn’t really have good ways to suppress spontaneous reactions other than putting things in separate compartments).
Of course in practice this will be an approximation, and there may be occasional exceptions where the cell is doing something the model doesn’t capture. If we were to do this sort of analysis in a signalling network rather than a metabolic network, for instance, there would likely be many exceptions: cells sometimes burn energy to maintain a concentration at a specific level, or to respond quickly to changes, and this particular model doesn’t capture the “value” of information-content in signals; we’d have to extend our value-function in order for the utility framework to capture that. But for metabolic networks, I expect that to mostly not be an issue.
That’s really just utility theory; expected utility theory would involve an organism storing some resources over time (like e.g. fat). Then we’d expect to be able to assign “values” such that the relative “value” assigned to stored resources which are not currently used is a weighted sum of the “values” assigned to those resources in different possible future environments (of the sort the organism might find itself in after something like its current environment, in the ancestral world), and the weights in the sums should be consistent. (This is a less-fleshed-out prediction than the other one, but hopefully it’s enough of a sketch to give the idea.)
Of course, if we understand expected utility theory deeply, then these predictions are quite trivial; they’re just saying that organisms won’t make pareto-suboptimal use of their resources! It’s one of those predictions where, if it’s false, then we’ve probably discovered something interesting—most likely some place where an organism is spending resources to do something useful which we haven’t understood yet. [EDIT-TO-ADD: This is itself intended as a falsifiable prediction—if we go look at an anomaly and don’t find any unaccounted-for phenomenon, then that’s a very big strike against expected utility theory.] And that’s the really cool prediction here: it gives us a tool to uncover unknown-unknowns in our understanding of a cell’s behavior.
Thanks John for this whole thread!
(Note that I only read the whole Epistemology section of this post and skimmed the rest, so I might be saying stuff that are repeated/resolved elsewhere. Please point me to the relevant parts/quotes if that’s the case. ;) )
Einstein’s arrogance sounds to me like an early pointer in the Sequences for that kind of thing, with a specific claim about General Relativity being that kind of theory.
That being said, I still understand Richard’s position and difficulty with this whole part (or at least what I read of Richard’s difficulty). He’s coming from the perspective of philosophy of science, which has focused mostly on ideas related to advanced predictions and taking into account the mental machinery of humans to catch biases and mistakes that we systematically make. The Sequences also spend a massive amount of words on exactly this, and yet in this discussion (and in select points in the Sequences like the aforementioned post), Yudkowsky sounds a bit like considering that his fundamental theory/observation doesn’t need any of these to be accepted as obvious (I don’t think he is thinking that way, but that’s hard to extract out of the text).
It’s even more frustrating because Yudkowsky focuses on “showing epistemic modesty” as his answer/rebuttal to Richard’s inquiry, when Richard just sounds like he’s asking the completely relevant question “why should we take your word on it?” And the confusion IMO is because the last sentence sounds very status-y (How do you dare claiming such crazy stuff?), but I’m pretty convinced Richard actually means it in a very methodological/philosophy of science/epistemic strategies way of “What are the ways of thinking that you’re using here that you expect to be particularly good at aiming at the truth?”
Furthermore, I agree with (my model of) Richard that the main issue with the way Yudkowsky (and you John) are presenting your deep idea is that you don’t give a way of showing it wrong. For example, you (John) write:
And even if I feel what you’re gesturing at, this sounds/looks like you’re saying “even if my prediction is false, that doesn’t mean that my theory would be invalidated”. Whereas I feel you want to convey something like “this is not a prediction/part of the theory that has the ability to falsify the theory” or “it’s part of the obvious wiggle room of the theory”. What I want is a way of finding the parts of the theory/model/prediction that could actually invalidate it, because that’s what we should be discussing really. (A difficulty might be that such theories are so fundamental and powerful than being able to see them makes it really hard to find any way they could go wrong and endanger the theory)
An analogy that comes to my mind is with the barriers for proving P vs NP. These make explicit ways in which you can’t solve the P vs NP question, such that it becomes far easier to weed proof attempts out. My impression is that You (Yudkowky and John) have models/generators that help you see at a glance that a given alignment proposal will fail. Which is awesome! I want to be able to find and extract and use those. But what Richard is pointing out IMO is that having the generators explicit would give us a way to stress test them, which is a super important step to start believing in them further. Just like we want people to actually try to go beyond GR, and for that they need to understand it deeply.
(Obviously, maybe the problem is that as you two are pointing it out, making the models/generators explicit and understandable is just really hard and you don’t know how to do that. That’s fair).
To be clear, this part:
… is also intended as a falsifiable prediction. Like, if we go look at the anomaly and there’s no new thing going on there, then that’s a very big strike against expected utility theory.
This particular type of fallback-prediction is a common one in general: we have some theory which makes predictions, but “there’s a phenomenon which breaks one of the modelling assumption in a way noncentral to the main theory” is a major way the predictions can fail. But then we expect to be able to go look and find the violation of that noncentral modelling assumption, which would itself yield some interesting information. If we don’t find such a violation, it’s a big strike against the theory.
That’s a great way of framing it! And a great way of thinking about why these are not failures that are “worrysome” at first/in most cases.
So, thermodynamics also feels like a deep fundamental theory to me, and one of the predictions it makes is “you can’t make an engine more efficient than a Carnot engine.” Suppose someone exhibits an engine that appears to be more efficient than a Carnot engine; my response is not going to be “oh, thermodynamics is wrong”, and instead it’s going to be “oh, this engine is making use of some unseen source.”
[Of course, you can show me enough such engines that I end up convinced, or show me the different theoretical edifice that explains both the old observations and these new engines.]
So, later Eliezer gives “addition” as an example of a deep fundamental theory. And… I’m not sure I can imagine a universe where addition is wrong? Like, I can say “you would add 2 and 2 and get 5″ but that sentence doesn’t actually correspond to any universes.
Like, similarly, I can imagine universes where evolution doesn’t describe the historical origin of species in that universe. But I can’t imagine universes where the elements of evolution are present and evolution doesn’t happen.
[That said, I can imagine universes with Euclidean geometry and different universes with non-Euclidean geometry, so I’m not trying to claim this is true of all deep fundamental theories, but maybe the right way to think about this is “geometry except for the parallel postulate” is the deep fundamental theory.]
That’s not what it predicts. It predicts you can’t make a heat engine more efficient than a Carnot engine.
Owned
Thanks for the thoughtful answer!
My gut reaction here is that “you can’t make an engine more efficient than a Carnot engine” is not the right kind of prediction to try to break thermodynamics, because even if you could break it in principle, staying at that level without going into the detailed mechanisms of thermodynamics will only make you try the same thing as everyone else does. Do you think that’s an adequate response to your point, or am I missing what you’re trying to say?
The mental move I’m doing for each of these examples is not imagining universes where addition/evolution/other deep theory is wrong, but imagining phenomena/problems where addition/evolution/other deep theory is not adapted. If you’re describing something that doesn’t commute, addition might be a deep theory, but it’s not useful for what you want. Similarly, you could argue that given how we’re building AIs and trying to build AGI, evolution is not the deep theory that you want to use.
It sounds to me like you (and your internal-Yudkowsky) are using “deep fundamental theory” to mean “powerful abstraction that is useful in a lot of domains”. Which addition and evolution fundamentally are. But claiming that the abstraction is useful in some new domain requires some justification IMO. And even if you think the burden of proof is on the critics, the difficulty of formulating the generators makes that really hard.
Once again, do you think that answers your point adequately?
From my (dxu’s) perspective, it’s allowable for there to be “deep fundamental theories” such that, once you understand those theories well enough, you lose the ability to imagine coherent counterfactual worlds where the theories in question are false.
To use thermodynamics as an example: the first law of thermodynamics (conservation of energy) is actually a consequence of Noether’s theorem, which ties conserved quantities in physics to symmetries in physical laws. Before someone becomes aware of this, it’s perhaps possible for them to imagine a universe exactly like our own, except that energy is not conserved; once they understand the connection implied by Noether’s theorem, this becomes an incoherent notion: you cannot remove the conservation-of-energy property without changing deep aspects of the laws of physics.
The second law of thermodynamics is similarly deep: it’s actually a consequence of there being a (low-entropy) boundary condition at the beginning of the universe, but no corresponding (low-entropy) boundary condition at any future state. This asymmetry in boundary conditions is what causes entropy to appear directionally increasing—and again, once someone becomes aware of this, it is no longer possible for them to imagine living in a universe which started out in a very low-entropy state, but where the second law of thermodynamics does not hold.
In other words, thermodynamics as a “deep fundamental theory” is not merely [what you characterized as] a “powerful abstraction that is useful in a lot of domains”. Thermodynamics is a logically necessary consequence of existing, more primitive notions—and the fact that (historically) we arrived at our understanding of thermodynamics via a substantially longer route (involving heat engines and the like), without noticing this deep connection until much later on, does not change the fact that grasping said deep connection allows one to see “at a glance” why the laws of thermodynamics inevitably follow.
Of course, this doesn’t imply infinite certainty, but it does imply a level of certainty substantially higher than what would be assigned merely to a “powerful abstraction that is useful in a lot of domains”. So the relevant question would seem to be: given my above described epistemic state, how might one convince me that the case for thermodynamics is not as airtight as I currently think it is? I think there are essentially two angles of attack: (1) convince me that the arguments for thermodynamics being a logically necessary consequence of the laws of physics are somehow flawed, or (2) convince me that the laws of physics don’t have the properties I think they do.
Both of these are hard to do, however—and for good reason! And absent arguments along those lines, I don’t think I am (or should be) particularly moved by [what you characterized as] philosophy-of-science-style objections about “advance predictions”, “systematic biases”, and the like. I think there are certain theories for which the object-level case is strong enough that it more or less screens off meta-level objections; and I think this is right, and good.
Which is to say:
I think you could argue this, yes—but the crucial point is that you have to actually argue it. You have to (1) highlight some aspect of the evolutionary paradigm, (2) point out [what appears to you to be] an important disanalogy between that aspect and [what you expect cognition to look like in] AGI, and then (3) argue that that disanalogy directly undercuts the reliability of the conclusions you would like to contest. In other words, you have to do things the “hard way”—no shortcuts.
...and the sense I got from Richard’s questions in the post (as well as the arguments you made in this subthread) is one that very much smells like a shortcut is being attempted. This is why I wrote, in my other comment, that
My objection is mostly fleshed out in my other comment. I’d just flag here that “In other words, you have to do things the “hard way”—no shortcuts” assigns the burden of proof in a way which I think is not usually helpful. You shouldn’t believe my argument that I have a deep theory linking AGI and evolution unless I can explain some really compelling aspects of that theory. Because otherwise you’ll also believe in the deep theory linking AGI and capitalism, and the one linking AGI and symbolic logic, and the one linking intelligence and ethics, and the one linking recursive self-improvement with cultural evolution, etc etc etc.
Now, I’m happy to agree that all of the links I just mentioned are useful lenses which help you understand AGI. But for utility theory to do the type of work Eliezer tries to make it do, it can’t just be a useful lens—it has to be something much more fundamental. And that’s what I don’t think Eliezer’s established.
It also isn’t clear to me that Eliezer has established the strong inferences he draws from noticing this general pattern (“expected utility theory/consequentialism”). But when you asked Eliezer (in the original dialogue) to give examples of successful predictions, I was thinking “No, that’s not how these things work.” In the mistaken applications of Grand Theories you mention (AGI and capitalism, AGI and symbolic logic, intelligence and ethics, recursive self-improvement and cultural evolution, etc.), the easiest way to point out why they are dumb is with counterexamples. We can quickly “see” the counterexamples. E.g., if you’re trying to see AGI as the next step in capitalism, you’ll be able to find counterexamples where things become altogether different (misaligned AI killing everything; singleton that brings an end to the need to compete). By contrast, if the theory fits, you’ll find that whenever you try to construct such a counterexample, it is just a non-central (but still valid) manifestation of the theory. Eliezer would probably say that people who are good at this sort of thinking will quickly see how the skeptics’ counterexamples fall relevantly short.
---
The reason I remain a bit skeptical about Eliezer’s general picture: I’m not sure if his thinking about AGI makes implicit questionable predictions about humans
I don’t understand his thinking well enough to be confident that it doesn’t
It seems to me that Eliezer_2011 placed weirdly strong emphasis on presenting humans in ways that matched the pattern “(scary) consequentialism always generalizes as you scale capabilities.” I consider some of these claims false or at least would want to make the counterexamples more salient
For instance:
Eliezer seemed to think that “extremely few things are worse than death” is something all philosophically sophisticated humans would agree with
Early writings on CEV seemed to emphasize things like the “psychological unity of humankind” and talk as though humans would mostly have the same motivational drives, also with respect to how it relates to “enjoying being agenty” as opposed to “grudgingly doing agenty things but wishing you could be done with your obligations faster”
In HPMOR all the characters are either not philosophically sophisticated or they were amped up into scary consequentialists plotting all the time
All of the above could be totally innocent matters of wanting to emphasize the thing that other commenters were missing, so they aren’t necessarily indicative of overlooking certain possibilities. Still, the pattern there makes me wonder if maybe Eliezer hasn’t spent a lot of time imagining what sorts of motivations humans can have that make them benign not in terms outcome-related ethics (what they want the world to look like), but relational ethics (who they want to respect or assist, what sort of role model they want to follow). It makes me wonder if it’s really true that when you try to train an AI to be helpful and corrigible, the “consequentialism-wants-to-become-agenty-with-its-own-goals part” will be stronger than the “helping this person feels meaningful” part. (Leading to an agent that’s consequentialist about following proper cognition rather than about other world-outcomes.)
FWIW I think I mostly share Eliezer’s intuitions about the arguments where he makes them; I just feel like I lack the part of his picture that lets him discount the observation that some humans are interpersonally corrigible and not all that focused on other explicit goals, and that maybe this means corrigibility has a crisp/natural shape after all.
I’m not sure how this would actually work. The proponent of the AGI-capitalism analogy might say “ah yes, AGI killing everyone is another data point on the trend of capitalism becoming increasingly destructive”. Or they might say (as Marx did) that capitalism contains the seeds of its own destruction. Or they might just deny that AGI will play out the way you claim, because their analogy to capitalism is more persuasive than your analogy to humans (or whatever other reasoning you’re using). How do you then classify this as a counterexample rather than a “non-central (but still valid) manifestation of the theory”?
My broader point is that these types of theories are usually sufficiently flexible that they can “predict” most outcomes, which is why it’s so important to pin them down by forcing them to make advance predictions.
On the rest of your comment, +1. I think that one of the weakest parts of Eliezer’s argument was when he appealed to the difference between von Neumann and the village idiot in trying to explain why the next step above humans will be much more consequentialist than most humans (although unfortunately I failed to pursue this point much in the dialogue).
My only reply is “You know it when you see it.” And yeah, a crackpot would reason the same way, but non-modest epistemology says that if it’s obvious to you that you’re not a crackpot then you have to operate on the assumption that you’re not a crackpot. (In the alternative scenario, you won’t have much impact anyway.)
Specifically, the situation I mean is the following:
You have an epistemic track record like Eliezer or someone making lots of highly upvoted posts in our communities.
You find yourself having strong intuitions about how to apply powerful principles like “consequentialism” to new domains, and your intuitions are strong because it feels to you like you have a gears-level understanding that others lack. You trust your intuitions in cases like these.
My recommended policy in cases where this applies is “trust your intuitions and operate on the assumption that you’re not a crackpot.”
Maybe there’s a potential crux here about how much of scientific knowledge is dependent on successful predictions. In my view, the sequences have convincingly argued that locating the hypothesis in the first place is often done in the absence of already successful predictions, which goes to show that there’s a core of “good reasoning” that lets you jump to (tentative) conclusions, or at least good guesses, much faster than if you were to try lots of things at random.
Oh, certainly Eliezer should trust his intuitions and believe that he’s not a crackpot. But I’m not arguing about what the person with the theory should believe, I’m arguing about what outside observers should believe, if they don’t have enough time to fully download and evaluate the relevant intuitions. Asking the person with the theory to give evidence that their intuitions track reality isn’t modest epistemology.
Damn. I actually think you might have provided the first clear pointer I’ve seen about this form of knowledge production, why and how it works, and what could break it. There’s a lot to chew on in this reply, but thanks a lot for the amazing food for thought!
(I especially like that you explained the physical points and put links that actually explain the specific implication)
And I agree (tentatively) that a lot of the epistemology of science stuff doesn’t have the same object-level impact. I was not claiming that normal philosophy of science was required, just that if that was not how we should evaluate and try to break the deep theory, I wanted to understand how I was supposed to do that.
The difference between evolution and gradient descent is sexual selection and predator/prey/parasite relations.
Agents running around inside everywhere—completely changes the process.
Likewise for comparing any kind of flat optimization or search to evolution. I think sexual selection and predator-prey made natural selection dramatically more efficient.
So I think it’s pretty fair to object that you don’t take evolution as adequate evidence to expect this flat, dead, temporary number cruncher will blow up in exponential intelligence.
I think there are other reasons to expect that though.
I haven’t read these 500 pages of dialogues so somebody probably made this point already.
I think “deep fundamental theory” is deeper than just “powerful abstraction that is useful in a lot of domains”.
Part of what makes a Deep Fundamental Theory deeper is that it is inevitably relevant for anything existing in a certain way. For example, Ramón y Cajal (discoverer of the neuronal structure of brains) wrote:
At first, I was surprised to see that the structure of physical space gave the fundamental principles in neuroscience too! But then I realized I shouldn’t have been: neurons exist in physical spacetime. It’s not a coincidence that neurons look like lightning: they’re satisfying similar constraints in the same spatial universe. And once observed, it’s easy to guess that what Ramón y Cajal might call “economy of metabolic energy” is also a fundamental principle of neuroscience, which of course is attested by modern neuroscientists. That’s when I understood that spatial structure is a Deep Fundamental Theory.
And it doesn’t stop there. The same thing explains the structure of our roadways, blood vessels, telecomm networks, and even why the first order differential equations for electric currents, masses on springs, and water in pipes are the same.
(The exact deep structure of physical space which explains all of these is differential topology, which I think is what Vaniver was gesturing towards with “geometry except for the parallel postulate”.)
Can you go into more detail here? I have done a decent amount of maths but always had trouble in physics due to my lack of physical intuition, so it might be completely obvious but I’m not clear about what is “that same thing” or how it explains all your examples? Is it about shortest path? What aspect of differential topology (a really large field) captures it?
(Maybe you literally can’t explain it to me without me seeing the deep theory, which would be frustrating, but I’d want to know if that was the case. )
There’s more than just differential topology going on, but it’s the thing that unifies it all. You can think of differential topology as being about spaces you can divide into cells, and the boundaries of those cells. Conservation laws are naturally expressed here as constraints that the net flow across the boundary must be zero. This makes conserved quantities into resources, for which the use of is convergently minimized. Minimal structures with certain constraints are thus led to forming the same network-like shapes, obeying the same sorts of laws. (See chapter 3 of Grady’s Discrete Calculus for details of how this works in the electric circuit case.)
Yeah, this seems reasonable to me. I think “how could you tell that theory is relevant to this domain?” seems like a reasonable question in a way that “what predictions does that theory make?” seems like it’s somehow coming at things from the wrong angle.
Thanks! I think that this is a very useful example of an advance prediction of utility theory; and that gathering more examples like this is one of the most promising way to make progress on bridging the gap between Eliezer’s and most other people’s understandings of consequentialism.
Potentially important thing to flag here: at least in my mind, expected utility theory (i.e. the property Eliezer was calling “laser-like” or “coherence”) and consequentialism are two distinct things. Consequentialism will tend to produce systems with (approximate) coherent expected utilities, and that is one major way I expect coherent utilities to show up in practice. But coherent utilities can in-principle occur even without consequentialism (e.g. conservative vector fields in physics), and consequentialism can in-principle not be very coherent (e.g. if it just has tons of resources and doesn’t have to be very efficient to achieve a goal-state).
(I’m not sure whether Eliezer would agree with this. The thing-I-think-Eliezer-means-by-consequentialism does not yet have a good mathematical formulation which I know of, which makes it harder to check that two people even mean the same thing when pointing to the concept.)
My model of Eliezer says that there is some deep underlying concept of consequentialism, of which the “not very coherent consequentialism” is a distorted reflection; and that this deep underlying concept is very closely related to expected utility theory. (I believe he said at one point that he started using the word “consequentialism” instead of “expected utility maximisation” mainly because people kept misunderstanding what he meant by the latter.)
I don’t know enough about conservative vector fields to comment, but on priors I’m pretty skeptical of this being a good example of coherent utilities; I also don’t have a good guess about what Eliezer would say here.
I think johnswentworth (and others) are claiming that they have the same ‘math’/‘shape’, which seems much more likely (if you trust their claims about such things generally).