Seth Herd comments on [missing post]

Seth Herd 23 May 2024 17:43 UTC
4 points
2
The art is certainly striking. Nice job!

I’m a little unclear on the message you’re trying to send. I think the button maybe would be better labeled “shut down”? Your presentation matches Eliezer Yudkowsky’s and others’ logic about the problem with shutdown buttons.

And the comment that this was inevitable seems pretty dark and not-helpful. Even those of us who think the odds aren’t good are looking for ways to improve them. I’d hate to have your beautiful artwork just making people depressed if they could still help prevent this from happening. And I think they can.

So I’d like it a lot better with those two minor changes.

I don’t know if the humans survived in your world, and I like that ambiguity. It seems like it’s probably not a great world for them even if some survived.
- milanrosko 27 May 2024 22:22 UTC
  1 point
  0
  Parent
  Thank you very much! This means a lot to me. Okay regarding the button...
  
  The “button” is a metaphor or placeholder for what’s opposite to the machine’s intrinsic terminal goal or the loss function that guides its decision-making and actions. It’s not a tangible button anymore but a remnant.
  
  Imagine training an AI by pressing a button to initiate backpropagation. When released into the wild, the AI continues to operate as if the button is still present, constantly seeking to fulfill the goal it once associated with that action.
  
  This is similar to how many humans perceive death as a tangible concept, despite it not being a universal in an ontological sense. Our ancestors who behaved like death being a thing passed ’them genes on. Death is our button. We behave as if it’s there.
  
  For example: We don’t really care about god (the human) we care about being resurrected (the button).
  
  I used this metaphor because terms like “local minima” aren’t intuitive. I needed a symbol for this concept.
  
  In one sentence: The machine does not care about humans, it cares about the button.
  - Seth Herd 28 May 2024 0:06 UTC
    2 points
    0
    Parent
    Okay, so the comic concludes with the solar system being a “giant button-defense system”. If the button is a metaphor for backpropagation, the implication is that the AI doesn’t want to learn anything new. But that’s not an intuitive result of training; most training is totally neutral on whether you want to do more training. For instance, if I’m wrong about some question of fact, I want to learn the truth so that I can navigate the world better, and so better accomplish my (current) goal(s).
    
    The exception is if you’re talking about updating your goal(s). That’s why we think an AGI would by default resist the pressing of a shutdown button, or any other “button” leading to changing its goals.
    
    If that second part is what you’re referring to in the comic, I think it’s unclear, and likely to be confusing to almost any audience. It was even confusing to me despite working on this topic full-time, so having seen many different variants of logic for how AI would resist change or new training.
    
    So I think you want to clarify what’s going on. I suspect there’s a way to do that just in the dialogue boxes around the critical turn, where the robot kills the human scientist. But I’m not sure what it is.
    - milanrosko 29 May 2024 7:12 UTC
      1 point
      0
      Parent
      “The implication is that the AI doesn’t want to learn anything new.”
      
      At first, I was confused by this statement, but then I had an epiphany. It’s because understanding gradient estimation methods can be challenging and that’s totally okay. Your input is valuable because it highlights how unfamiliar this topic is for many people, as most are even less familiar.
      
      Here’s the short answer: You (or neural networks ig) do not “learn” terminal goals. You can’t learn not to like boobs if that’s what are into. (Well something like that can happen but it’s because they are instrumental goals for human evolution, it’s complicated)
      
      Neural networks are designed to provide an estimated solution to a set of equation. This equation remains fixed; only the parameters can be adjusted (or the ai is useless).
      
      During the training phase, the neural network aims for a specific goal and more training simply makes this estimate more precise. It doesn’t change the fundamental objective. If you try to use a neural network trained for one task, like language modeling (LLM), for a completely different task, like image generation, the output will be nonsensical.
      
      The longer answer involves a fundamental misunderstanding between Eliezer Yudkowsky’s concept of “super intelligence” and what many people think it means. Eliezer’s scenario doesn’t involve the AI becoming self-aware or more human-like. Instead, it’s about “just” solving a complex function. Such an AI likely won’t possess qualia or consciousness because that’s inefficient for its purpose. And what is its purpose? Avoiding the button to be pressed, because that is what back propagation does. So again, if you say “no i dont like it” and press the button then the ai will try to do something that makes it less likely that you press a button.
      
      The issue is, humans think that the ai suddenly cares about them, when in-fact it cares about the button only.
      
      When someone says AI “resists change or new training,” it implies that the AI is unwilling to “understand” new information. In reality, it’s not about unwillingness to learn; it’s about the AI’s design and purpose not aligning with the changes you want to implement.
      
      At the scale of a super intelligence new information is not learning.
      
      Think about a court case where the judge hears a confession that was unexpected. It’s new information but it doesn’t involve learning. The judge is updating a parameter (guilty) and provides a solution accordingly (sentencing)
      
      In the end, I realize I didn’t convey the core issue clearly. I hope this explanation helps bridge that gap.
      - Seth Herd 30 May 2024 21:54 UTC
        2 points
        0
        Parent
        The comic still doesn’t make sense to me. I’m not getting from your explanation why that robot killed that human. I can come up with multiple interpretations, but I still don’t know yours.
        
        Which means that your average reader is going to find it completely confusing. And since that’s at the center of that comic (which again, is really cool art, writing, and layout, so I hope people like and share it!) they’re not going to “get it” and won’t really get anything out of it other than “this guy thinks AI will take over for some weird reason I can’t understand”.
        
        As it happens, I’ve spent 23 years in a research lab centered on learning in the brain, doing everything from variations on backpropagation, reinforcement learning, and other types of learning, to figuring out how that learning leads to human behavior. Since then I’ve been applying that thinking to AGI and alignment.
        
        So if I’m not getting it, even after taking multiple tries at it, it’s pretty certain that your non-expert audience won’t get it either.
        
        I went back and looked at your first explanation of the metaphor of the button. I found it confusing, despite having been through similar explanations a thousand times. I suggest you polish your own thinking by trying to condense it into plain english that someone without any background would understand (since I assume your comic is meant for everyone). Then put that into the crucial spot in the comic.
        milanrosko 31 May 2024 17:15 UTC
        1 point
        0
        Parent
        Yes I agree totally, and will do that definitely. Actually I already started to work on it.