Viliam comments on The Monster in Our Heads

Viliam 26 Jan 2025 16:57 UTC
4 points
2
I get a feeling like you are trying to suggest that the AI is only dangerous because we are afraid of it. Like it’s somehow our fear incarnated, and the more we fear it, the worse it will be. So the solution is to relax, and realize that this was all just a big cosmic joke. And then the AI will laugh together with us.
In other words, all you need to do for the abyss to disappear is to stop gazing in it.
I think this is not how AI works (and neither do abysses). The AI either is dangerous, or it is not; that is independent on what we think about it. It might be the case that we were fundamentally wrong about something, and actually everything will be okay. But whether that happens to be the case, doesn’t depend on how relaxed we are. We could be scared shitless and ultimately nothing happens and everyone else will laugh at us. We could relax… and then drop dead at a random moment, not knowing what got us. Or any other combination.
The simultaneous attraction and rejection that a shadow-self causes is in my mind a good explanation for why so much of the rationalist community seems to converge upon the idea of “good AI to beat bad AI” (instead of, say, protesting AI companies, or lobbying to shut them down, or more direct paths to halting development).
But some people are protesting the AI companies. And I thought that the consensus of the rationalist community was that we don’t know how to build the “good AI”, and that we need more time to figure this out, and that everyone would benefit if we slowed down until we can reliably figure this out.
Out of the four companies you mention, three of them (Google, Meta, Microsoft/OpenAI) are big tech companies doing their business as usual: there is a new trend, they don’t want to stay behind. Only Anthropic matches the pattern of “building good AI to stop bad AI”.
My working definition of love is an extension of the markov blanket for the self-concept in your head to cover other conceptual objects. A thing that you love is something that you take into your self-identity. If it does well, you do well. If it is hurt, you are hurt. This explains, for example, how you can love your house or your possessions even if they are obviously non-sentient, and losing your favourite pen feels bad even if it makes no sense and the pen is clearly just a mass-produced plastic object.
Thanks for providing a specific proposal. Two problems. First, we have no idea how to make the AI love itself (in a human-like way). If the AI doesn’t love itself, then it won’t help much if it perceives us as parts of itself. Second, we don’t actually love everything we perceive as parts of ourselves. People sometimes try to get rid of their bad habits, or trim their nails, or throw away cheap plastic objects when they outlived their purpose.
- testingthewaters 27 Jan 2025 0:15 UTC
  7 points
  4
  Parent
  Hey, thanks for the reply. I think this is a very valuable response because there are certain things I would want to point out that I can now elucidate more clearly thanks to your push back.
  
  First, I don’t suggest that if we all just laughed and went about our lives everything would be okay. Indeed, if I thought that our actions were counterproductive at best, I’d advocate for something more akin to “walking away” as in Valentine’s exit. There is a lot of work to be done and (yes) very little time to do it.
  
  Second, the pattern I am noticing is something more akin to Rhys Ward’s point about AI personhood. AI is not some neutral fact of our future that will be born “as is” no matter how hard we try one way or another. In our search for control and mastery over AI, we risk creating the things we fear the most. We fear AIs that are autonomous, ruthless, and myopic, but in trying to make controlled systems that pursue goals reliably without developing ideas of their own we end up creating autonomous, ruthless, and myopic systems. It’s somewhat telling, for example, that AI safety really started to heat up when RL became a mainstream technique (raising fears about paperclip optimisers etc.), and yet the first alignment efforts for LLMs (which were manifestly not goal seeking or myopic) was to… add RL back to them, in the form of a value-agnostic technique (PPO/RLHF) that can be used to create anti aligned agents just as easily as it can be used to create aligned agents. Rhys Ward similarly talks about how personhood may be less risky from an x-risk perspective but also makes alignment more ethically questionable. The “good” and the “bad” visions for AI in this community are entwined.
  
  As a smaller point, OpenAI definitely started as a “build the good AI” startup when Deepmind started taking off. Deepmind also started as a startup and Demis is very connected to the AI safety memeplex.
  
  Finally, love as humans execute it is (in my mind) an imperfect instantation of a higher idea. It is true, we don’t practice true omnibenevolence or universal love, or even love ourselves in a meaningful way a lot of the time, but I treat it as a direction to aim for, one that inspires us to do what we find most beautiful and meaningful rather than do what is most hateful and ugly.
  
  P.S. sorry for not replying to all the other valuable comments in this section, I’ve been rather busy as of late, trying to do the things I preach etc.