Most of my posts and comments are about AI and alignment. Posts I’m most proud of, which also provide a good introduction to my worldview:
Without a trajectory change, the development of AGI is likely to go badly
Steering systems, and a follow up on corrigibility.
I also created Forum Karma, and wrote a longer self-introduction here.
PMs and private feedback are always welcome.
NOTE: I am not Max Harms, author of Crystal Society. I’d prefer for now that my LW postings not be attached to my full name when people Google me for other reasons, but you can PM me here or on Discord (m4xed) if you want to know who I am.
Describing misaligned AIs as evil feels slightly off. Even “bad goals” makes me think there’s a missing mood somewhere. Separately, describing other peoples’ writing about misalignment this way is kind of straw.
Current AIs mostly can’t take any non-fake responsibility for their actions, even if they’re smart enough to understand them. An AI advising someone to e.g. hire a hitman to kill their husband is a bad outcome if there’s a real depressed person and a real husband who are actually harmed. An AI system would be responsible (descriptively / causally, not normatively) for that harm to the degree that it acts spontaneously and against its human deployers’ wishes, in a way that is differentially dependent on its actual circumstances (e.g. being monitored / in a lab vs. not).
Unlike current AIs, powerful, autonomous, situationally-aware AI could cause harm for strategic reasons or as a side effect of executing large-scale, transformative plans that are indifferent (rather than specifically opposed) to human flourishing. A misaligned AI that wipes out humanity in order to avoid shutdown is a tragedy, but unless the AI is specifically spiteful or punitive in how it goes about that, it seems kind of unfair to call the AI itself evil.