However, I’ve come to realise that a prior over human preferences is of little use. The real key is figuring out how to update it, and that contains almost the entirety of the problem.
This is a great point. It summarizes something important really well.
If this is the main idea, then why the title? (As opposed to “How should AIs update a prior over human preferences?”)
and also what counts as H versus some brain-surgeried replacement of them.
Also the heroin thing. (Which seems reversible at first glance, but that might not be entirely true (absent brain surgery, which has it’s own irreversible element...).)
Once we know that, we get a joint prior p over R×ΠH, the human reward functions and the human’s policy[^huid].
This sentence is a little ambiguous, as is the usage of “[^huid]”. (Expected meaning: AI learns about what people want, and what people are doing, both in general and specifically. This phrasing evokes the question of what the AI will think of what people want(/are doing) as groups rather than individually.)
So instead of a “prior over preferences” and a “update bridging law”, we need a joint object that does both.
The thesis makes sense. (And seems close to a good title.)
This is a great point. It summarizes something important really well.
If this is the main idea, then why the title? (As opposed to “How should AIs update a prior over human preferences?”)
Also the heroin thing. (Which seems reversible at first glance, but that might not be entirely true (absent brain surgery, which has it’s own irreversible element...).)
This sentence is a little ambiguous, as is the usage of “[^huid]”. (Expected meaning: AI learns about what people want, and what people are doing, both in general and specifically. This phrasing evokes the question of what the AI will think of what people want(/are doing) as groups rather than individually.)
The thesis makes sense. (And seems close to a good title.)
thanks! Changed title (and corrected badly formatted footnote)