Daniel Kokotajlo comments on Metaphors for AI, and why I don’t like them

Daniel Kokotajlo 4 Oct 2023 16:41 UTC
2 points
0
monomaniacally pursue any particular goal.

I think this is a misunderstanding of what the claim is. E.g. Yudkowsky, and certainly myself, doesn’t think that the ASIs’ goals will be Just One Thing; it’ll have a ton of shards of desire just like humans probably. But (a) this doesn’t change the strategic picture much unless you can establish that one of those shards of desire is a miniature replica of our own human desires, which some have argued (see e.g. TurnTrout’s diamond maximizer post) and (b) in some sense it’s still valid to describe it as one goal, as shorthand for one utility function, one point in goal-space, etc. The whole bag of shards can be thought of as one complicated goal, instead of as a bag of many different simple goals.

I do think we have a substantive disagreement about coherence though; I disagree with the hot mess hypothesis.