The post anticipates some of the most likely failure modes, such as failure to correctly time the transition from selfish to altruistic learning, or out of distribution failures for proxy matching. I anticipate that for proxy matching in particular we may end up employing multiple stages. I also agree that simplified and over-generalized notions of altruism may be easier to maintain long term, and I see some indications that this already occurs in at least some humans.
The low-complexity most general form of altruism is probably something like “empower the world’s external agency”, which seems pretty general. But then it may also need game-theoretic adjustments (empower other altruists more, disempower dangerous selfish agents, etc), considerations of suffering, etc.
I don’t see why/how learning altruism/alignment (external empowerment) is different from other learning objectives (such as internal empowerment) such that formal verification is important for the latter but not the former. So for me the strongest evidence for important of formal verification would be evidence of it’s utility/importance across ML in general, which I don’t really see yet.
The post anticipates some of the most likely failure modes, such as failure to correctly time the transition from selfish to altruistic learning, or out of distribution failures for proxy matching. I anticipate that for proxy matching in particular we may end up employing multiple stages. I also agree that simplified and over-generalized notions of altruism may be easier to maintain long term, and I see some indications that this already occurs in at least some humans.
The low-complexity most general form of altruism is probably something like “empower the world’s external agency”, which seems pretty general. But then it may also need game-theoretic adjustments (empower other altruists more, disempower dangerous selfish agents, etc), considerations of suffering, etc.
I don’t see why/how learning altruism/alignment (external empowerment) is different from other learning objectives (such as internal empowerment) such that formal verification is important for the latter but not the former. So for me the strongest evidence for important of formal verification would be evidence of it’s utility/importance across ML in general, which I don’t really see yet.