Seth Herd comments on MIRI 2024 Communications Strategy

Seth Herd 31 May 2024 18:38 UTC
2 points
0
You’re right. I didn’t mean to say that kindness is arcane. I was referring to acausal trade or other strange reasons to keep some humans around for possible future use.

Kindness is normal in our world, but I wouldn’t assume it will exist in every or even most situations with intelligent beings. Humans are instinctively kind (except for sociopathic and sadistic people), because that is good game theory for our situation: interactions with peers, in which collaboration/teamwork is useful.

A being capable of real recursive self-improvement, let alone duplication and creation of subordinate minds is not in that situation. They may temporarily be dealing with peers, but they might reasonably expect to have no need of collaborators in the near future. Thus, kindness isn’t rational for that type of being.

The exception would be if they could make a firm commitment to kindness while they do have peers and need collaborators. They might have kindness merely as an instrumental goal, in which case it would be abandoned as soon as it was no longer useful.

Or they might display kindness more instinctively, as a tendency in their thought or behavior. They might even have it engineered as an innate goal, as Steve hopes to engineer. In those last two cases, I think it’s possible that reflexive stability would keep that kindness in place as the AGI continued to grow, but I wouldn’t bet on it unless kindness was their central goal. If it was merely a tendency and not an explicit and therefore self-endorsed goal, I’d expect it to be dropped like the bad habit it effectively is. If it was an innate goal but not the strongest one, I don’t know but wouldn’t bet on it being long-term reflexively stable under deliberate self-modification.

(As far as I know, nobody has tried hard to work through the logic of reflexive stability of multiple goals. I tried, and gave it up as too vague and less urgent than other alignment questions. My tentative answer was maybe multiple goals would be reflectively stable; it depends on the exact structure of the decision-making process in that AGI/mind).