One mini-habit I have is to try to check my work in a different way from the way I produced it.
For example, if I’m copying down a large number (or string of characters, etc.), then when I double-check it, I read off the transcribed number backwards. I figure this way my brain is less likely to go “Yes yes, I’ve seen this already” and skip over any discrepancy.
And in general I look for ways to do the same kind of thing in other situations, such that checking is not just a repeat of the original process.
Incidentally, a similar consideration leads me to want to avoid re-using old metaphors when explaining things. If you use multiple metaphors you can triangulate on the meaning—errors in the listener’s understanding will interfere destructively, leaving something closer to what you actually meant.
For this reason, I’ve been frustrated that we keep using “maximize paperclips” as the stand-in for a misaligned utility function. And I think reusing the exact same example again and again has contributed to the misunderstanding Eliezer describes here:
Original usage and intended meaning: The problem with turning the future over to just any superintelligence is that its utility function may have its attainable maximum at states we’d see as very low-value, even from the most cosmopolitan standpoint.
Misunderstood and widespread meaning: The first AGI ever to arise could show up in a paperclip factory (instead of a research lab specifically trying to do that). And then because AIs just mechanically carry out orders, it does what the humans had in mind, but too much of it.
If we’d found a bunch of different ways to say the first thing, and hadn’t just said, “maximize paperclips” every time, then I think the misunderstanding would have been less likely.
One mini-habit I have is to try to check my work in a different way from the way I produced it.
For example, if I’m copying down a large number (or string of characters, etc.), then when I double-check it, I read off the transcribed number backwards. I figure this way my brain is less likely to go “Yes yes, I’ve seen this already” and skip over any discrepancy.
And in general I look for ways to do the same kind of thing in other situations, such that checking is not just a repeat of the original process.
Incidentally, a similar consideration leads me to want to avoid re-using old metaphors when explaining things. If you use multiple metaphors you can triangulate on the meaning—errors in the listener’s understanding will interfere destructively, leaving something closer to what you actually meant.
For this reason, I’ve been frustrated that we keep using “maximize paperclips” as the stand-in for a misaligned utility function. And I think reusing the exact same example again and again has contributed to the misunderstanding Eliezer describes here:
If we’d found a bunch of different ways to say the first thing, and hadn’t just said, “maximize paperclips” every time, then I think the misunderstanding would have been less likely.