TurnTrout comments on ‘Empiricism!’ as Anti-Epistemology

TurnTrout 18 Mar 2024 19:00 UTC
42 points
10
This scans as less “here’s a helpful parable for thinking more clearly” and more “here’s who to sneer at”—namely, at AI optimists. Or “hopesters”, as Eliezer recently called them, which I think is a play on “huckster” (and which accords with this essay analogizing optimists to Ponzi scheme scammers).
I am saddened (but unsurprised) to see few others decrying the obvious strawmen:
what if [the optimists] cried ‘Unfalsifiable!’ when we couldn’t predict whether a phase shift would occur within the next two years exactly?
...
“But now imagine if—like this Spokesperson here—the AI-allowers cried ‘Empiricism!‘, to try to convince you to do the blindly naive extrapolation from the raw data of ‘Has it destroyed the world yet?’ or ‘Has it threatened humans? no not that time with Bing Sydney we’re not counting that threat as credible’.”
Thinly-veiled insults:
Nobody could possibly be foolish enough to reason from the apparently good behavior of AI models too dumb to fool us or scheme, to AI models smart enough to kill everyone; it wouldn’t fly even as a parable, and would just be confusing as a metaphor.
and insinuations of bad faith:
What if, when you tried to reason about why the model might be doing what it was doing, or how smarter models might be unlike stupider models, they tried to shout you down for relying on unreliable theorizing instead of direct observation to predict the future?” The Epistemologist stopped to gasp for breath.
“Well, then that would be stupid,” said the Listener.
“You misspelled ‘an attempt to trigger a naive intuition, and then abuse epistemology in order to prevent you from doing the further thinking that would undermine that naive intuition, which would be transparently untrustworthy if you were allowed to think about it instead of getting shut down with a cry of “Empiricism!”’,” said the Epistemologist.
Apparently Eliezer decided to not take the time to read e.g. Quintin Pope’s actual critiques, but he does have time to write a long chain of strawmen and smears-by-analogy.
As someone who used to eagerly read essays like these, I am quite disappointed.
- habryka 18 Mar 2024 21:11 UTC
  27 points
  5
  Parent
  I don’t think this essay is commenting on AI optimists in-general. It is commenting on some specific arguments that I have seen around, but I don’t really see how it relates to the recent stuff that Quintin, Nora or you have been writing (and I would be reasonably surprised if Eliezer intended it to apply to that).
  You can also leave it up to the reader to decide whether and when the analogy discussed here applies or not. I could spend a few hours digging up people engaging in reasoning really very closely to what is discussed in this article, though by default I am not going to.
  - Martin Randall 31 Mar 2024 2:18 UTC
    12 points
    8
    Parent
    Ideally Yudkowsky would have linked to the arguments he is commenting on. This would demonstrate that he is responding to real, prominent, serious arguments, and that he is not distorting those arguments. It would also have saved me some time.
    
    But now imagine if—like this Spokesperson here—the AI-allowers cried ‘Empiricism!‘, to try to convince you to do the blindly naive extrapolation from the raw data of ‘Has it destroyed the world yet?’
    
    The first hit I got searching for “AI risk empiricism” was Ignore the Doomers: Why AI marks a resurgence of empiricism. The second hit was AI Doom and David Hume: A Defence of Empiricism in AI Safety, which linked Anthropic’s Core Views on AI Safety. These are hardly analogous to the Spokesman’s claims of 100% risk-free returns.
    
    Next I sampled several Don’t Worry about the Vase AI newsletters and “some people are not so worried”. I didn’t really see any cases of blindly naive extrapolation from the raw data of ‘Has AI destroyed the world yet?’. I found Alex Tabarrok saying “I want to see that the AI baby is dangerous before we strangle it in the crib.”. I found Jacob Buckman saying “I’m Not Worried About An AI Apocalypse”. These things are related but clearly admit the possibility of danger and are arguing for waiting to see evidence of danger before acting.
    
    An argument I have seen is blindly naive extrapolation from the raw data of ‘Has tech destroyed the world yet?’ Eg, The Techno-Optimist Manifesto implies this argument. My current best read of the quoted text above is that it’s an attack on an exaggerated and simplified version of this type of view. In other words, a straw man.
- tailcalled 19 Mar 2024 8:58 UTC
  16 points
  6
  Parent
  
  Apparently Eliezer decided to not take the time to read e.g. Quintin Pope’s actual critiques, but he does have time to write a long chain of strawmen and smears-by-analogy.
  
  A lot of Quintin Pope’s critiques are just obviously wrong and lots of commenters were offering to help correct them. In such a case, it seems legitimate to me for a busy person to request that Quintin sorts out the problems together with the commenters before spending time on it. Even from the perspective of correcting and informing Eliezer, people can more effectively be corrected and informed if their attention is guided to the right place, with junk/distractions removed.
  
  (Note: I mainly say this because I think the main point of the message you and Quintin are raising does not stand up to scrutiny, and so I mainly think the value the message can provide is in certain technical corrections that you don’t emphasize as much, even if strictly speaking they are part of your message. If I thought the main point of your message stood up to scrutiny, I’d also think it would be Eliezer’s job to realize it despite the inconvenience.)
  - Quintin Pope 19 Mar 2024 21:58 UTC
    11 points
    7
    Parent
    I stand by pretty much everything I wrote in Objections, with the partial exception of the stuff about strawberry alignment, which I should probably rewrite at some point.
    Also, Yudkowsky explained exactly how he’d prefer someone to engage with his position “To grapple with the intellectual content of my ideas, consider picking one item from “A List of Lethalities” and engaging with that.”, which I pointed out I’d previously done in a post that literally quotes exactly one point from LoL and explains why it’s wrong. I’ve gotten no response from him on that post, so it seems clear that Yudkowsky isn’t running an optimal ‘good discourse promoting’ engagement policy.
    
    I don’t hold that against him, though. I personally hate arguing with people on this site.
    - Eliezer Yudkowsky 20 Mar 2024 0:13 UTC
      15 points
      −3
      Parent
      Unless I’m greatly misremembering, you did pick out what you said was your strongest item from Lethalities, separately from this, and I responded to it. You’d just straightforwardly misunderstood my argument in that case, so it wasn’t a long response, but I responded. Asking for a second try is one thing, but I don’t think it’s cool to act like you never picked out any one item or I never responded to it.
      EDIT: I’m misremembering, it was Quintin’s strongest point about the Bankless podcast. https://www.lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky?commentId=cr54ivfjndn6dxraD
      - tailcalled 20 Mar 2024 18:32 UTC
        6 points
        2
        Parent
        I’m kind of ambivalent about this. On the one hand, when there is a misunderstanding, but he claims his argument still goes through after correcting the misunderstanding, it seems like you should also address that corrected form. On the other hand, Quintin Pope’s correction does seem very silly. At least by my analysis:
        Similarly, the reason that “GPT-4 does not get smarter each time an instance of it is run in inference mode” is because it’s not programmed to do that^[7]. OpenAI could^[8] continuously train its models on the inputs you give it, such that the model adapts to your particular interaction style and content, even during the course of a single conversation, similar to the approach suggested in this paper. Doing so would be significantly more expensive and complicated on the backend, and it would also open GPT-4 up to data poisoning attacks.
        This approach considers only the things OpenAI could do with their current ChatGPT setup, and yes it’s correct that there’s not much online learning opportunity in this. But that’s precisely why you’d expect GPT+DPO to not be the future of AI; Quintin Pope has clearly identified a capabilities bottleneck that prevents it from staying fully competitive. (Note that humans can learn even if there is a fraction of people who are sharing intentionally malicious information, because unlike GPT and DPO, humans don’t believe everything we’re told.)
        A more autonomous AI could collect actionable information at much greater scale, as it wouldn’t be dependent on trusting its users for evaluating what information to update on, and it would have much more information about what’s going on than the chat-based I/O.
        This sure does look to me like a huge bottleneck that’s blocking current AI methods, analogous to the evolutionary bottleneck: The full power of the AI cannot be used to accumulate OOM more information to further improve the power of the AI.
  - Noosphere89 19 Mar 2024 22:09 UTC
    9 points
    0
    Parent
    My main disagreement is that I actually do think that at least some of the critiques are right here.
    
    In particular, the claims that Quintin Pope is making that I think are right is that evolution is extremely different from how we train our AIs, and thus none of the inferences that work under an evolution model work under the AIs under consideration, which importantly includes a lot of analogies to apes/Neanderthals making smarter humans (which they didn’t do, BTW.), which presumably failed to be aligned, ergo we can’t align AI smarter than us.
    
    The basic issue though is that evolution doesn’t have a purpose or goal, and thus the common claim that evolution failed to align humans to X thing is nonsensical, as it assumes a teleological goal that just does not exist in evolution, which is quite different from humans making AIs with particular goals in mind. Thus talk of an alignment problem between say chimps/Neanderthals and humans is entirely nonsensical. This is also why this generalized example of misgeneralization fails to work, since evolution is not a trainer or designer in the way that say. an OpenAI employee making AI would be, and thus there is no generalization error, since there wasn’t a goal or behavior to purposefully generalize in the first place:
    
    “In the ancestral environment, evolution trained humans to do X, but in the modern environment, they do Y instead.”
    
    There are other problems with the analogy that Quintin Pope covered, like the fact that it doesn’t actually capture misgeneralization correctly, since the ancient/modern human distinction is not the same as one AI doing a treacherous turn, or how the example of ice cream overwhelming our reward center isn’t misgeneralization, but the fact that evolution has no purpose or goal is the main problem I see with a lot of evolution analogies.
    
    Another issue is that evolution is extremely inefficient at the timescales required, which is why dominant training methods for AI borrow little from evolution at best, and even from an AI capabilities perspective it’s not really worth it to rerun evolution to get AI progress.
    
    Some other criticisms I agree with from Quintin Pope is that current AI can already self-improve, albeit more weakly and having more limits than humans, though I agree way less strongly here than Quintin Pope, and that the security mindset is very misleading and predicts things in ML that don’t actually happen at all, which is why I don’t think adversarial assumptions are good unless you can solve the problem in the worst case easily or just as easily as the non-adversarial cases.
    - Quintin Pope 19 Mar 2024 22:54 UTC
      24 points
      6
      Parent
      The basic issue though is that evolution doesn’t have a purpose or goal
      FWIW, I don’t think this is the main issue with the evolution analogy. The main issue is that evolution faced a series of basically insurmountable, yet evolution-specific, challenges in successfully generalizing human ‘value alignment’ to the modern environment, such as the fact that optimization over the genome can only influence within lifetime value formation theough insanely unstable Rube Goldberg-esque mechanisms that rely on steps like “successfully zero-shot directing an organism’s online learning processes through novel environments via reward shaping”, or the fact that accumulated lifetime value learning is mostly reset with each successive generation without massive fixed corpuses of human text / RLHF supervisors to act as an anchor against value drift, or evolution having a massive optimization power overhang in the inner loop of its optimization process.
      
      These issues fully explain away the ‘misalignment’ humans have with IGF and other intergenerational value instability. If we imagine a deep learning optimization process with an equivalent structure to evolution, then we could easily predict similar stability issues would arise due to that unstable structure, without having to posit an additional “general tendency for inner misalignment” in arbitrary optimization processes, which is the conclusion that Yudkowsky and others typically invoke evolution to support.
      
      In other words, the issues with evolution as an analogy have little to do with the goals we might ascribe to DL/evolutionary optimization processes, and everything to do with simple mechanistic differences in structure between those processes.
      - Daniel Kokotajlo 20 Mar 2024 19:38 UTC
        2 points
        −1
        Parent
        I’m curious to hear more about this. Reviewing the analogy:
        
        Evolution, ‘trying’ to get general intelligences that are great at reproducing <--> The AI Industry / AI Corporations, ‘trying’ to get AGIs that are HHH
        Genes, instructing cells on how to behave and connect to each other and in particular how synapses should update their ‘weights’ in response to the environment <--> Code, instructing GPUs on how to behave and in particular how ‘weights’ in the neural net should update in response to the environment
        Brains, growing and learning over the course of lifetime <--> Weights, changing and learning over the course of training
        
        Now turning to your three points about evolution:
        Optimizing the genome indirectly influences value formation within lifetime, via this unstable Rube Goldberg mechanism that has to zero-shot direct an organism’s online learning processes through novel environments via reward shaping --> translating that into the analogy, it would be “optimizing the code indirectly influences value formation over the course of training, via this unstable Rube Goldberg mechanism that has to zero-shot direct the model’s learning process through novel environments vai reward shaping… yep seems to check out. idk. What do you think?
        Accumulated lifetime value learning is mostly reset with each successive generation without massive fixed corpuses of human text / RLHF supervisors --> Accumulated learning in the weights is mostly reset when new models are trained since they are randomly initialized; fortunately there is a lot of overlap in training environment (internet text doesn’t change that much from model to model) and also you can use previous models as RLAIF supervisors… (though isn’t that also analogous to how humans generally have a lot of shared text and culture that spans generations, and also each generation of humans literally supervises and teaches the next?)
        Massive optimization power overhang in the inner loop of its optimization process --> isn’t this increasingly true of AI too? Maybe I don’t know what you mean here. Can you elaborate?
  - tailcalled 19 Mar 2024 16:49 UTC
    2 points
    0
    Parent
    Can people who vote disagree also mark the parts they disagree with using reacts or something?
  - Tapatakt 19 Mar 2024 17:18 UTC
    1 point
    0
    Parent
    Do you think that if someone filtered and steelmanned Quintin’s criticism, it would be valuable? (No promises)
    - tailcalled 19 Mar 2024 20:02 UTC
      4 points
      2
      Parent
      Yes.
      
      Filtering away mistakes, unimportant points, unnecessary complications, etc., from preexisting ideas is (as long as the core idea one extracts is good) a very general way to contribute value, because it makes the ideas involved easier to understand.
      
      Adding stronger arguments, more informative and accessible examples, etc. contributes value because then it shows what is more robust and gives more material to dig down into understanding it, and also because it clarifies why some people may find the idea attractive.
      
      Explanations for the changes, especially for the dropped things, can build value because it clarifies the consensus about what parts were wrong, and if Quintin disagrees with the removals, it provides signals to him about what he didn’t clarify well enough.
      
      When these are done on a sufficiently important point, with sufficiently much skill, and maybe also with sufficiently much luck, this can in principle provide a ton of value, both because information in general is high-leverage due to being easily shareable, and because this particular form of information can help resolve conflicts and rebuild trust.
- Zack_M_Davis 18 Mar 2024 20:57 UTC
  14 points
  3
  Parent
  
  saddened (but unsurprised) to see few others decrying the obvious strawmen
  
  In general, the “market” for criticism just doesn’t seem very efficient at all! You might have hoped that people would mostly agree about what constitutes a flaw, critics would compete to find flaws in order to win status, and authors would learn not to write posts with flaws in them (in order to not lose status to the critics competing to point out flaws).
  
  I wonder which part of the criticism market is failing: is it more that people don’t agree about what constitutes a flaw, or that authors don’t have enough of an incentive to care, or something else? We seem to end up with a lot of critics who specialize in detecting a specific kind of flaw (“needs examples” guy, “reward is not the optimization target” guy, “categories aren’t arbitrary” guy, &c.), with very limited reaction from authors or imitation by other potential critics.
  - kave 18 Mar 2024 22:18 UTC
    5 points
    1
    Parent
    My quick guess is that people don’t agree about what constitutes a (relevant) flaw. (And there are lots of irrelevant flaws so you can’t just check for the existence of any flaws at all).
    I think if people could agree, the authorial incentives would follow. I’m fairly sympathetic to the idea that readers aren’t incentivised to correctly agree on what consitutes a flaw.
- Eliezer Yudkowsky 20 Mar 2024 0:08 UTC
  12 points
  4
  Parent
  If Quintin hasn’t yelled “Empiricism!” then it’s not about him. This is more about (some) e/accs.
- Eli Tyre 12 Aug 2024 18:30 UTC
  2 points
  0
  Parent
  This is a definitely a tangent, and I don’t want to detract from your more substantive points (about which I don’t have as strong an opinion one way or the other).
  Or “hopesters”, as Eliezer recently called them, which I think is a play on “huckster” (and which accords with this essay analogizing optimists to Ponzi scheme scammers).
  I read this as a play on the word “Doomer”, which is a term that is slightly derogatory, but mostly descriptive. My read of “hopester”, without any additional context, is the same.
- Tapatakt 19 Mar 2024 17:16 UTC
  1 point
  0
  Parent
  I think from Eliezer’s point of view it goes kinda like this:
  1. People can’t see why the arguments of other side are invalid.
  2. Eliezer tried to engage with them, but most listeners/readers can’t tell who is right in this discussions.
  3. Eliezer thinks that if he provides people with strawmenned versions of other side’s arguments and refutation of this strawmenned arguments, then the chance that this people will see why he’s right in the real discussion will go up.
  4. Eliezer writes this discussion with strawmen as a fictional parable because otherwise it would be either dishonest and rude or a quite boring text with a lot of disclaimers. Or because it’s just easier for him to write it this way.
  After reading this text at least one person (you) thinks that the goal “avoid dishonesty and rudeness” were not achieved, so text is a failure.
  After reading this text at least one person (me) thinks that 1. I got some useful ideas and models. 2. Of course, at least the smartest opponents of Eliezer have better arguments and I don’t think Eliezer would disagree with that, so text is a success.
  Ideally, Eliezer should update his strategy of writing texts based on both pieces of evidence.
  I can be wrong, of course.