Sammy Martin comments on Christiano, Cotra, and Yudkowsky on AI progress

Sammy Martin 2 Dec 2021 18:55 UTC
LW: 7 AF: 4
AF
Updates on this after reflection and discussion (thanks to Rohin):
Human Evolution tells us very little about the ‘cognitive landscape of all minds’ (if that’s even a coherent idea) - it’s simply a loosely analogous individual historical example
Saying Paul’s view is that the cognitive landscape of minds might be simply incoherent isn’t quite right—at the very least you can talk about the distribution over programs implied by the random initialization of a neural network.
I could have just said ‘Paul doesn’t see this strong generality attractor in the cognitive landscape’ but it seems to me that it’s not just a disagreement about the abstraction, but that he trusts claims made on the basis of these sorts of abstractions less than Eliezer.
Also, on Paul’s view, it’s not that evolution is irrelevant as a counterexample. Rather, the specific fact of ‘evolution gave us general intelligence suddenly by evolutionary timescales’ is an unimportant surface fact, and the real truth about evolution is consistent with the continuous view.
No core of generality and extrapolation of quantitative metrics for things we care about and lack of common huge secrets in relevant tech progress reference class
These two initial claims are connected in a way I didn’t make explicit—No core of generality and lack of common secrets in the reference class together imply that there are lots of paths to improving on practical metrics (not just those that give us generality), that we are putting in lots of effort into improving such metrics and that we tend to take the best ones first, so the metric improves continuously, and trend extrapolation will be especially correct.
Core of generality and very common presence of huge secrets in relevant tech progress reference class
The first clause already implies the second clause (since “how to get the core of generality” is itself a huge secret), but Eliezer seems to use non-intelligence related examples of sudden tech progress as evidence that huge secrets are common in tech progress in general, independent of the specific reason to think generality is one such secret.

Nate’s Summary
… Eliezer was saying something like “the fact that humans go around doing something vaguely like weighting outcomes by possibility and also by attractiveness, which they then roughly multiply, is quite sufficient evidence for my purposes, as one who does not pay tribute to the gods of modesty”, while Richard protested something more like “but aren’t you trying to use your concept to carry a whole lot more weight than that amount of evidence supports?”..
And, ofc, at this point, my Eliezer-model is again saying “This is why we should be discussing things concretely! It is quite telling that all the plans we can concretely visualize for saving our skins, are scary-adjacent; and all the non-scary plans, can’t save our skins!”
Nate’s summary brings up two points I more or less ignored in my summary because I wasn’t sure what I thought—one is, just what role do the considerations about expected incompetent response/regulatory barriers/mistakes in choosing alignment strategies play? Are they necessary for a high likelihood of doom, or just peripheral assumptions? Clearly, you have to posit some level of “civilization fails to do the x-risk-minimizing thing” if you want to argue doom, but how extreme are the scenarios Eliezer is imagining where success is likely?
The other is the role that the modesty worldview plays in Eliezer’s objections.
I feel confused/suspect we might have all lost track of what Modesty epistemology is supposed to consist of—I thought it was something like “overuse of the outside view, especially in a social cognition context”.
Which of the following is:
a) probably the product of a Modesty world-view?
b) no good reason to think comes from a Modesty world-view but still bad epistemology?
c) good epistemology?
1. Not believing theories which don’t make new testable predictions just because they retrodict lots of things in a way that the theories proponents claim is more natural, but that you don’t understand, because that seems generally suspicious
2. Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because you don’t trust your own assessments of naturalness that much in the absence of discriminating evidence
3. Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because most powerful theories which cause conceptual revolutions also make new testable predictions, so it’s a bad sign if the newly proposed theory doesn’t.
4. As a general matter, accepting that there are lots of cases of theories which are knowably true independent of any new testable predictions they make because of features of the theory. Things like the implication of general relativity from the equivalence principle, or the second law of thermodynamics from Noether’s theorem, or many-worlds from QM are real, but you’ll only believe you’ve found a case like this if you’re walked through to the conclusion, so you’re sure that the underlying concepts are clear and applicable, or there’s already a scientific consensus behind it.
- Rob Bensinger 3 Dec 2021 2:17 UTC
  LW: 6 AF: 4
  AF Parent
  Not believing theories which don’t make new testable predictions just because they retrodict lots of things in a way that the theories proponents claim is more natural, but that you don’t understand, because that seems generally suspicious
  My Eliezer-model doesn’t categorically object to this. See, e.g., Fake Causality:
  [Phlogiston] feels like an explanation. It’s represented using the same cognitive data format. But the human mind does not automatically detect when a cause has an unconstraining arrow to its effect. Worse, thanks to hindsight bias, it may feel like the cause constrains the effect, when it was merely fitted to the effect.
  [...] Thanks to hindsight bias, it’s also not enough to check how well your theory “predicts” facts you already know. You’ve got to predict for tomorrow, not yesterday.
  And A Technical Explanation of Technical Explanation:
  Nineteenth century evolutionism made no quantitative predictions. It was not readily subject to falsification. It was largely an explanation of what had already been seen. It lacked an underlying mechanism, as no one then knew about DNA. It even contradicted the nineteenth century laws of physics. Yet natural selection was such an amazingly good post facto explanation that people flocked to it, and they turned out to be right. Science, as a human endeavor, requires advance prediction. Probability theory, as math, does not distinguish between post facto and advance prediction, because probability theory assumes that probability distributions are fixed properties of a hypothesis.
  The rule about advance prediction is a rule of the social process of science—a moral custom and not a theorem. The moral custom exists to prevent human beings from making human mistakes that are hard to even describe in the language of probability theory, like tinkering after the fact with what you claim your hypothesis predicts. People concluded that nineteenth century evolutionism was an excellent explanation, even if it was post facto. That reasoning was correct as probability theory, which is why it worked despite all scientific sins. Probability theory is math. The social process of science is a set of legal conventions to keep people from cheating on the math.
  Yet it is also true that, compared to a modern-day evolutionary theorist, evolutionary theorists of the late nineteenth and early twentieth century often went sadly astray. Darwin, who was bright enough to invent the theory, got an amazing amount right. But Darwin’s successors, who were only bright enough to accept the theory, misunderstood evolution frequently and seriously. The usual process of science was then required to correct their mistakes.
  My Eliezer-model does object to things like ‘since I (from my position as someone who doesn’t understand the model) find the retrodictions and obvious-seeming predictions suspicious, you should share my worry and have relatively low confidence in the model’s applicability’. Or ‘since the case for this model’s applicability isn’t iron-clad, you should sprinkle in a lot more expressions of verbal doubt’. My Eliezer-model views these as isolated demands for rigor, or as isolated demands for social meekness.
  Part of his general anti-modesty and pro-Thielian-secrets view is that it’s very possible for other people to know things that justifiably make them much more confident than you are. So if you can’t pass the other person’s ITT / you don’t understand how they’re arriving at their conclusion (and you have no principled reason to think they can’t have a good model here), then you should be a lot more wary of inferring from their confidence that they’re biased.
  Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because you don’t trust your own assessments of naturalness that much in the absence of discriminating evidence
  My Eliezer-model thinks it’s possible to be so bad at scientific reasoning that you need to be hit over the head with lots of advance predictive successes in order to justifiably trust a model. But my Eliezer-model thinks people like Richard are way better than that, and are (for modesty-ish reasons) overly distrusting their ability to do inside-view reasoning, and (as a consequence) aren’t building up their inside-view-reasoning skills nearly as much as they could. (At least in domains like AGI, where you stand to look a lot sillier to others if you go around expressing confident inside-view models that others don’t share.)
  Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because most powerful theories which cause conceptual revolutions also make new testable predictions, so it’s a bad sign if the newly proposed theory doesn’t.
  My Eliezer-model thinks this is correct as stated, but thinks this is a claim that applies to things like Newtonian gravity and not to things like probability theory. (He’s also suspicious that modest-epistemology pressures have something to do with this being non-obvious — e.g., because modesty discourages you from trusting your own internal understanding of things like probability theory, and instead encourages you to look at external public signs of probability theory’s impressiveness, of a sort that could be egalitarianly accepted even by people who don’t understand probability theory.)

Sammy Martin comments on Christiano, Cotra, and Yudkowsky on AI progress

Nate’s Summary