It might not be possible to make such a machine learner into an AGI, which is what I had in mind—narrow AIs only have “goals” and “values” and so forth in an analogical sense. Cf. derived intentionality. If it is that easy to create such an AGI, then I think I’m wrong, e.g. maybe I’m thinking about the symbol grounding problem incorrectly. I still think that in the limit of intelligence/rationality, though, specifying goals like “maximize paperclips” becomes impossible, and this wouldn’t be falsified if a zealous paperclip company were able to engineer a superintelligent paperclip maximizer that actually maximized paperclips in some plausibly commonsense fashion. In fact I can’t actually think of a way to falsify my theory in practice—I guess you’d have to somehow physically show that the axioms of algorithmic information theory and maybe updateless-like decision theories are egregiously incoherent… or something.
(Also your meta-algorithm isn’t quite what I had in mind—what I had in mind is a lot more theoretically elegant and doesn’t involve weird vague things like “extrapolation”—but I don’t think that’s the primary source of our disagreement.)
It might not be possible to make such a machine learner into an AGI, which is what I had in mind—narrow AIs only have “goals” and “values” and so forth in an analogical sense. Cf. derived intentionality. If it is that easy to create such an AGI, then I think I’m wrong, e.g. maybe I’m thinking about the symbol grounding problem incorrectly. I still think that in the limit of intelligence/rationality, though, specifying goals like “maximize paperclips” becomes impossible, and this wouldn’t be falsified if a zealous paperclip company were able to engineer a superintelligent paperclip maximizer that actually maximized paperclips in some plausibly commonsense fashion. In fact I can’t actually think of a way to falsify my theory in practice—I guess you’d have to somehow physically show that the axioms of algorithmic information theory and maybe updateless-like decision theories are egregiously incoherent… or something.
(Also your meta-algorithm isn’t quite what I had in mind—what I had in mind is a lot more theoretically elegant and doesn’t involve weird vague things like “extrapolation”—but I don’t think that’s the primary source of our disagreement.)