lc comments on AGI Ruin: A List of Lethalities

lc Jun 6, 2022, 6:58 AM
7 points
−1
So, there are five possibilities here:
1. MIRI’s top researchers don’t understand, or can’t explain, why having incorrect maps makes it harder to navigate the territory and leads to more incorrect beliefs. Something I find very hard to believe even if you’re being totally forthright.
2. You asked some random people near you who don’t represent the top crust of alignment researchers, which is obviously irrelevant.
3. There’s some very subtle ambiguity to this that I’m completely unaware of.
4. You asked people in a way that heavily implied it was some sort of trick question and they should get more information, then assumed they were stupid because they asked followup questions.
5. This comment is written almost deliberately misleadingingly. You’re just explaining a random story about how you ran out of energy to ask Nate Soares to write a post.
I guarantee you that most reasonably intelligent people, if asked this question after reading the sequences in a way that they didn’t expect was designed to trip them up, would get it correctly. I simply do not believe that everyone around you is as stupid as you are implying, such that you should have shelved the effort.
EDIT: 😭
- Eliezer Yudkowsky Jun 6, 2022, 8:40 PM
  14 points
  6
  Parent
  You didn’t get the answer correct yourself.
  - lc Jun 6, 2022, 8:56 PM
    3 points
    0
    Parent
    Damn aight. Would you be willing to explain for the sake of my own curiosity? I don’t have the gears to understand why that wouldn’t be at least one reason.
    - Catnee Jun 6, 2022, 11:55 PM
      2 points
      −1
      Parent
      If this is “kind of a test for capable people” i think it should be remained unanswered, so anyone else could try. My take would be: because if 222+222=555 then 446=223+223 = 222+222+1+1=555+1+1=557. With this trick “+” and “=” stops meaning anything, any number could be equal to any other number. If you truly believe in one such exeption, the whole arithmetic cease to exist because now you could get any result you want following simple loopholes, and you will either continue to be paralyzed by your own beliefs, or will correct yourself
      - lc Jun 7, 2022, 12:13 AM
        5 points
        3
        Parent
        This is what I meant by “leads to other incorrect beliefs”, so apparently not.
- Vaniver Jun 6, 2022, 2:50 PM
  11 points
  7
  Parent
  Ok, so here’s my take on the “222 + 222 = 555” question.
  
  First, suppose you want your AI to not be durably wrong, so it should update on evidence. This is probably implemented by some process that notices surprises, goes back up the cognitive graph, and applies pressure to make it have gone the right way instead.
  Now as it bops around the world, it will come across evidence about what happens when you add those numbers, and its general-purpose “don’t be durably wrong” machinery will come into play. You need to not just sternly tell it “222 + 222 = 555″ once, but have built machinery that will protect that belief from the update-on-evidence machinery, and which will also protect itself from the update-on-evidence machinery.
  Second, suppose you want your AI to have the ability to discover general principles. This is probably implemented by some process that notices patterns / regularities in the environment, and builds some multi-level world model out of it, and then makes plans in that multi-level world model. Now you also have some sort of ‘consistency-check’ machinery, which scans thru the map looking for inconsistencies between levels, goes back up the cognitive graph, and applies pressure to make them consistent instead. [This pressure can both be ‘think different things’ and ‘seek out observations / run experiments.’]
  Now as it bops around the world, it will come across more remote evidence that bears on this question. “How can 222 + 222 = 555, and 2 + 2 = 4?” it will ask itself plaintively. “How can 111 + 111 = 222, and 111 + 111 + 111 + 111 = 444, and 222 + 222 = 555?” it will ask itself with a growing sense of worry.
  Third, what did you even want out of it believing that 222 + 222 = 555? Are you just hoping that it has some huge mental block and crashes whenever it tries to figure out arithmetic? Probably not (tho it seems like that’s what you’ll get), but now you might be getting into a situation where it is using the correct arithmetic in its mind but has constructed some weird translation between mental numbers and spoken numbers. “Humans are silly,” it thinks it itself, “and insist that if you ask this specific question, it’s a memorization game instead of an arithmetic game,” and satisfies its operator’s diagnostic questions and its internal sense of consistency. And then it goes on to implement plans as if 222 + 222 = 444, which is what you were hoping to avoid with that patch.
  - lc Jun 8, 2022, 8:15 AM
    2 points
    0
    Parent
    No one is going to believe me, but when I originally wrote that comment, my brain read something like “why would an AI that believed 222 + 222 = 555 have a hard time”. Only figured it out now after reading your reply.
    Part one of this is what I would’ve come up with, though I’m not particularly certain it’s correct.
- Ben Pace Jun 6, 2022, 7:21 AM
  9 points
  2
  Parent
  
  I guarantee you that most reasonably intelligent people, if asked this question after reading the sequences in a way that they didn’t expect was designed to trip them up, would get it correctly.
  
  Sounds like the beginnings of a bet.
  - lc Jun 6, 2022, 7:23 AM
    21 points
    13
    Parent
    I will absolutely 100% do it in the spirit of good epistemics.
    Edit: I’m glad Eliezer didn’t take me up on this lol
- Rob Bensinger Jun 6, 2022, 7:12 AM
  5 points
  3
  Parent
  why having incorrect maps makes it harder to navigate the territory
  I’d have guessed the disagreement wasn’t about whether “222 + 222 = 555” is an incorrect map, or about whether incorrect maps often make it harder to navigate the territory, but about something else. (Maybe ‘I don’t want to think about this because it seems irrelevant/disanalogous to alignment work’?)
  And I’d have guessed the answer Eliezer was looking for was closer to ‘the OP’s entire Section B’ (i.e., a full attempt to explain all the core difficulties), not a one-sentence platitude establishing that there’s nonzero difficulty? But I don’t have inside info about this experiment.
  - lc Jun 6, 2022, 7:19 AM
    5 points
    6
    Parent
    I’d have guessed the disagreement wasn’t about whether “222 + 222 = 555” is an incorrect map, or about whether incorrect maps often make it harder to navigate the territory, but about something else. (Maybe ‘I don’t want to think about this because it seems irrelevant/disanalogous to alignment work’?)
    I’d have guessed that too, which is why I would have preferred him to say that they disagreed on |whatever meta question he’s actually talking about| instead of implying disagreement on |other thing that makes his disappointment look more reasonable|.
    And I’d have guessed the answer Eliezer was looking for was closer to ‘the OP’s entire Section B’ (i.e., a full attempt to explain all the core difficulties), not a one-sentence platitude establishing that there’s nonzero difficulty? But I don’t have inside info about this experiment.
    That story sounds much more cogent, but it’s not the primary interpretation of “I asked them a single question” followed by the quoted question. Most people don’t go on 5 paragraph rants in response to single questions, and when they do they tend to ask clarifying details regardless of how well they understand the prompt, so they know they’re responding as intended.