However, that only works if we have the right prior. We could try to learn the prior from humans, which gets us 99% of the way there… but as I’ve mentioned earlier, human imitation does not get us all the way. Humans don’t perfectly endorse their own reactions.
Note that Learning the Prior uses an amplified human (ie, a human with access to a model trained via IDA/Debate/RRM). So we can do a bit better than a base human—e.g. could do something like having an HCH tree where many humans generate possible feedback and other humans look at the feedback and decide how much they endorse it. I think the target is not to get normativity ‘correct’, but to design a mechanism such that we can’t expect to find any mechanism that does better.
Right, I agree. I see myself as trying to construct a theory of normativity which gets that “by construction”, IE, we can’t expect to find any mechanism which does better because if we could say anything about what that mechanism does better then we could tell it to the system, and the system would take it into account.
HCH isn’t such a theory; it does provide a somewhat reasonable notion of amplification, but if we noticed systematic flaws with how HCH reasons, we would not be able to systematically correct them.
I see myself as trying to construct a theory of normativity which gets that “by construction”, IE, we can’t expect to find any mechanism which does better because if we could say anything about what that mechanism does better then we could tell it to the system, and the system would take it into account.
Nice, this is what I was trying to say but was struggling to phrase it. I like this.
I guess I usually think of HCH as having this property, as long as the thinking time for each human is long enough, the tree is deep enough, and we’re correct about the hope that natural language is sufficiently universal. It’s quite likely I’m either confused or being sloppy though.
You could put ‘learning the prior’ inside HCH I think, it would just be inefficient—for every claim, you’d ask your HCH tree how much you should believe it, and HCH would think about the correct way to do bayesian reasoning, what the prior on that claim should be, and how well it predicted every piece of data you’d seen so far, in conjunction with everything else in your prior. I think one view of learning the prior is just making this process more tractable/practical, and saving you from having to revisit all your data points every time you ask any question—you just do all the learning from data once, then use the result of that to answer any subsequent questions.
Note that Learning the Prior uses an amplified human (ie, a human with access to a model trained via IDA/Debate/RRM). So we can do a bit better than a base human—e.g. could do something like having an HCH tree where many humans generate possible feedback and other humans look at the feedback and decide how much they endorse it.
I think the target is not to get normativity ‘correct’, but to design a mechanism such that we can’t expect to find any mechanism that does better.
Right, I agree. I see myself as trying to construct a theory of normativity which gets that “by construction”, IE, we can’t expect to find any mechanism which does better because if we could say anything about what that mechanism does better then we could tell it to the system, and the system would take it into account.
HCH isn’t such a theory; it does provide a somewhat reasonable notion of amplification, but if we noticed systematic flaws with how HCH reasons, we would not be able to systematically correct them.
Nice, this is what I was trying to say but was struggling to phrase it. I like this.
I guess I usually think of HCH as having this property, as long as the thinking time for each human is long enough, the tree is deep enough, and we’re correct about the hope that natural language is sufficiently universal. It’s quite likely I’m either confused or being sloppy though.
You could put ‘learning the prior’ inside HCH I think, it would just be inefficient—for every claim, you’d ask your HCH tree how much you should believe it, and HCH would think about the correct way to do bayesian reasoning, what the prior on that claim should be, and how well it predicted every piece of data you’d seen so far, in conjunction with everything else in your prior. I think one view of learning the prior is just making this process more tractable/practical, and saving you from having to revisit all your data points every time you ask any question—you just do all the learning from data once, then use the result of that to answer any subsequent questions.