Well, I don’t think we really know the answer to that question right now. My hope is that myopia will turn out to be a pretty easy to verify property—certainly my guess is that it’ll be easier to verify than non-deception. Until we get better transparency tools, a better understanding of what algorithms our models are actually implementing, and better definitions of myopia that make sense in that context, however, we don’t really know how easy verifying it will be. Maybe it can be done mechanically, maybe it’ll require a human—we still really just don’t know.
Well, I don’t think we really know the answer to that question right now. My hope is that myopia will turn out to be a pretty easy to verify property—certainly my guess is that it’ll be easier to verify than non-deception. Until we get better transparency tools, a better understanding of what algorithms our models are actually implementing, and better definitions of myopia that make sense in that context, however, we don’t really know how easy verifying it will be. Maybe it can be done mechanically, maybe it’ll require a human—we still really just don’t know.