testingthewaters comments on The Monster in Our Heads

testingthewaters Jan 27, 2025, 12:15 AM
7 points
4
Hey, thanks for the reply. I think this is a very valuable response because there are certain things I would want to point out that I can now elucidate more clearly thanks to your push back.

First, I don’t suggest that if we all just laughed and went about our lives everything would be okay. Indeed, if I thought that our actions were counterproductive at best, I’d advocate for something more akin to “walking away” as in Valentine’s exit. There is a lot of work to be done and (yes) very little time to do it.

Second, the pattern I am noticing is something more akin to Rhys Ward’s point about AI personhood. AI is not some neutral fact of our future that will be born “as is” no matter how hard we try one way or another. In our search for control and mastery over AI, we risk creating the things we fear the most. We fear AIs that are autonomous, ruthless, and myopic, but in trying to make controlled systems that pursue goals reliably without developing ideas of their own we end up creating autonomous, ruthless, and myopic systems. It’s somewhat telling, for example, that AI safety really started to heat up when RL became a mainstream technique (raising fears about paperclip optimisers etc.), and yet the first alignment efforts for LLMs (which were manifestly not goal seeking or myopic) was to… add RL back to them, in the form of a value-agnostic technique (PPO/RLHF) that can be used to create anti aligned agents just as easily as it can be used to create aligned agents. Rhys Ward similarly talks about how personhood may be less risky from an x-risk perspective but also makes alignment more ethically questionable. The “good” and the “bad” visions for AI in this community are entwined.

As a smaller point, OpenAI definitely started as a “build the good AI” startup when Deepmind started taking off. Deepmind also started as a startup and Demis is very connected to the AI safety memeplex.

Finally, love as humans execute it is (in my mind) an imperfect instantation of a higher idea. It is true, we don’t practice true omnibenevolence or universal love, or even love ourselves in a meaningful way a lot of the time, but I treat it as a direction to aim for, one that inspires us to do what we find most beautiful and meaningful rather than do what is most hateful and ugly.

P.S. sorry for not replying to all the other valuable comments in this section, I’ve been rather busy as of late, trying to do the things I preach etc.