ESRogs comments on What are your strategies for avoiding micro-mistakes?

ESRogs 6 Oct 2019 23:09 UTC
9 points
Incidentally, a similar consideration leads me to want to avoid re-using old metaphors when explaining things. If you use multiple metaphors you can triangulate on the meaning—errors in the listener’s understanding will interfere destructively, leaving something closer to what you actually meant.
For this reason, I’ve been frustrated that we keep using “maximize paperclips” as the stand-in for a misaligned utility function. And I think reusing the exact same example again and again has contributed to the misunderstanding Eliezer describes here:
Original usage and intended meaning: The problem with turning the future over to just any superintelligence is that its utility function may have its attainable maximum at states we’d see as very low-value, even from the most cosmopolitan standpoint.
Misunderstood and widespread meaning: The first AGI ever to arise could show up in a paperclip factory (instead of a research lab specifically trying to do that). And then because AIs just mechanically carry out orders, it does what the humans had in mind, but too much of it.
If we’d found a bunch of different ways to say the first thing, and hadn’t just said, “maximize paperclips” every time, then I think the misunderstanding would have been less likely.