Lukas_Gloor comments on Making AIs less likely to be spiteful

Lukas_Gloor 27 Sep 2023 8:07 UTC
12 points
4
Great post; I suspect this to be one of the most tractable areas for reducing s-risks!
I also like the appendix. Perhaps this is too obvious to state explicitly, but I think one reason why spite in AIs seems like a real concern is because it did evolve somewhat prominently in (some) humans. (And I’m sympathetic to the shard theory approach to understanding AI motivations, so it seems to me that the human evolutionary example is relevant. (That said, it’s only analogous if there’s a multi-agent-competition phase in the training, which isn’t the case with LLM training so far, for instance.))
What links here?
- David Althaus's comment on Making AIs less likely to be spiteful by Nicolas Macé (29 Sep 2023 12:19 UTC; 11 points)