Iff LLM simulacra resemble humans but are misaligned, that doesn’t bode well for S-risk chances.
Waluigi effect also seems bad for s-risk. “Optimize for pleasure, …” → “Optimize for suffering, …”.
Iff LLM simulacra resemble humans but are misaligned, that doesn’t bode well for S-risk chances.
Waluigi effect also seems bad for s-risk. “Optimize for pleasure, …” → “Optimize for suffering, …”.