mike_hawke comments on Dreams of AI alignment: The danger of suggestive names

mike_hawke Feb 13, 2024, 10:45 PM
7 points
2
Long comment, points ordered randomly, skim if you want.

1)
Can you give a few more examples of when the word “optimal” is/isn’t distorting someone’s thinking? People sometimes challenge each other’s usage of that word even when just talking about simple human endeavors like sports, games, diet, finance, etc. but I don’t get the sense that the word is the biggest danger in those domains. (Semi-related, I am reminded of this post.)
2)
When I try to point out such (perceived) mistakes, I feel a lot of pushback, and somehow it feels combative. I do get somewhat combative online sometimes (and wish I didn’t, and am trying different interventions here), and so maybe people combat me in return. But I perceive defensiveness even to the critiques of Matthew Barnett, who seems consistently dispassionate.
Maybe it’s because people perceive me as an Optimist and therefore my points must be combated at any cost.
Maybe people really just naturally and unbiasedly disagree this much, though I doubt it.
When you put it like this, it sounds like the problem runs much deeper than sloppy concepts. When I think my opponents are mindkilled, I see only extreme options available, such as giving up on communicating, or budgeting huge amounts of time & effort to a careful double-crux. What you’re describing starts to feel not too dissimilar from questions like, “How do I talk my parents out of their religion so that they’ll sign up for cryonics?” In most cases it’s either hopeless or a massive undertaking, worthy of multiple sequences all on it’s own, most of which are not simply about suggestive names. Not that I expect you to write a whole new sequence in your spare time, but I do wonder if this makes you more interested in erisology and basic rationality.
3)
‘The behaviorists ruined words like “behavior”, “response”, and, especially, “learning”. They now play happily in a dream world, internally consistent but lost to science.’
I myself don’t know anything about the behaviorists except that they allegedly believed that internal mental states did not exist. I certainly don’t want to make that kind of mistake. Can someone bring me up to speed on what exactly they did to the words “behavior”, “response”, and “learning”? Are those words still ruined? Was the damage ever undone?
4)
perhaps implying an expectation and inner consciousness on the part of the so-called “agent”
That reminds me of this passage from EY’s article in Time:
None of this danger depends on whether or not AIs are or can be conscious; it’s intrinsic to the notion of powerful cognitive systems that optimize hard and calculate outputs that meet sufficiently complicated outcome criteria. With that said, I’d be remiss in my moral duties as a human if I didn’t also mention that we have no idea how to determine whether AI systems are aware of themselves—since we have no idea how to decode anything that goes on in the giant inscrutable arrays—and therefore we may at some point inadvertently create digital minds which are truly conscious and ought to have rights and shouldn’t be owned.
The rule that most people aware of these issues would have endorsed 50 years earlier, was that if an AI system can speak fluently and says it’s self-aware and demands human rights, that ought to be a hard stop on people just casually owning that AI and using it past that point. We already blew past that old line in the sand. And that was probably correct; I agree that current AIs are probably just imitating talk of self-awareness from their training data. But I mark that, with how little insight we have into these systems’ internals, we do not actually know.
I’m curious if you think this passage is also mistaken, or if it is correctly describing a real problem with current trajectories. EY usually doesn’t bring up consciousness because it is not a crux for him, but I wonder if you think he’s wrong in this recent time that he did bring it up.
- TurnTrout Feb 19, 2024, 7:24 PM
  2 points
  0
  Parent
  > perhaps implying an expectation and inner consciousness on the part of the so-called “agent”
  I’m curious if you think this passage is also mistaken, or if it is correctly describing a real problem with current trajectories. EY usually doesn’t bring up consciousness because it is not a crux for him, but I wonder if you think he’s wrong in this recent time that he did bring it up.
  I didn’t mean to claim that this “consciousness” insinuation has or is messing up this community’s reasoning about AI alignment, just that the insinuation exists—and to train the skill of spotting possible mistakes before (and not after) they occur.
  I do think that “‘expectation’ insinuates inner beliefs” matters, as it helps prop up the misconception of “agents maximize expected reward” (by adding another “supporting detail” to that story).