TurnTrout is obviously correct that “robust grading is… extremely hard and unnatural” and that loss functions “chisel circuits into networks” and don’t directly determine the target of the product AI. Where he loses me is the part where he suggests that this makes alignment easier and not harder. I think that all this just means we have even less control over the policy of the resulting AI, the default end case being some bizarre construction in policyspace with values very hard to determine based on the recipe. I don’t understand what point he’s making in the above post that contradicts this.
TurnTrout is obviously correct that “robust grading is… extremely hard and unnatural” and that loss functions “chisel circuits into networks” and don’t directly determine the target of the product AI. Where he loses me is the part where he suggests that this makes alignment easier and not harder. I think that all this just means we have even less control over the policy of the resulting AI, the default end case being some bizarre construction in policyspace with values very hard to determine based on the recipe. I don’t understand what point he’s making in the above post that contradicts this.