Really enjoying your posts on normativity! The way I summarize it internally is “Thinking about fixed-points for the meta aspect of human reasoning”. How fixed-point-y do you think solutions are likely to be?
We could never entirely pin down the concept of human values, but at some point, the system would be reasoning so much like us (or rather, so much like we would want to reason) that this wouldn’t be a concern.
I’m confused about this sentence, because it seems to promote an idea in contradiction with your other writing on normativity (and even earlier sections in this post). Because the quote says that at some level you could stop caring (which means we can keep going meta until there’s not significant improvement, and stop there), while the rest of your writing says that we should deal with the whole hierarchy at once.
Because the quote says that at some level you could stop caring (which means we can keep going meta until there’s not significant improvement, and stop there)
Hmmm, that’s not quite what I meant. It’s not about stopping at some meta-level, but rather, stopping at some amount of learning in the system. The system should learn not just level-specific information, but also cross-level information (like overall philosophical heuristics), which means that even if you stop teaching the machine at some point, it can still produce new reasoning at higher levels which should be similar to feedback you might have given.
The point is that human philosophical taste isn’t perfectly defined, and even if we also teach the machine everything we can about how to interpret human philosophical taste, that’ll still be true. However, at some point our uncertainty and the machine’s uncertainty will be close enough to the same that we don’t care. (Note: what it even means for them to be closely matched depends on the question of what it means for humans to have specific philosophical taste, which, if we could answer, we would have perfectly defined human philosophical taste—the thing we can’t do. Yet, in some good-enough sense, our own uncertainty eventually becomes well-represented by the machine’s uncertainty. That’s the stopping point at which we no longer need to provide additional explicit feedback to the machine.)
Hmmm, that’s not quite what I meant. It’s not about stopping at some meta-level, but rather, stopping at some amount of learning in the system. The system should learn not just level-specific information, but also cross-level information (like overall philosophical heuristics), which means that even if you stop teaching the machine at some point, it can still produce new reasoning at higher levels which should be similar to feedback you might have given.
Interesting. So the point is to learn how to move up the hierarchy? I mean, that makes a lot of sense. It is a sort of fixed point description, because then the AI can keep moving up the hierarchy as far as it wants, which mean the whole hierarchy is encoded by it’s behavior. It’s just a question of how far up it needs to go to get satisfying answers.
Right. I mean, I would clarify that the whole point isn’t to learn to go up the hierarchy; in some sense, most of the point is learning at a few levels. But yeah.
Really enjoying your posts on normativity! The way I summarize it internally is “Thinking about fixed-points for the meta aspect of human reasoning”. How fixed-point-y do you think solutions are likely to be?
I’m confused about this sentence, because it seems to promote an idea in contradiction with your other writing on normativity (and even earlier sections in this post). Because the quote says that at some level you could stop caring (which means we can keep going meta until there’s not significant improvement, and stop there), while the rest of your writing says that we should deal with the whole hierarchy at once.
Hmmm, that’s not quite what I meant. It’s not about stopping at some meta-level, but rather, stopping at some amount of learning in the system. The system should learn not just level-specific information, but also cross-level information (like overall philosophical heuristics), which means that even if you stop teaching the machine at some point, it can still produce new reasoning at higher levels which should be similar to feedback you might have given.
The point is that human philosophical taste isn’t perfectly defined, and even if we also teach the machine everything we can about how to interpret human philosophical taste, that’ll still be true. However, at some point our uncertainty and the machine’s uncertainty will be close enough to the same that we don’t care. (Note: what it even means for them to be closely matched depends on the question of what it means for humans to have specific philosophical taste, which, if we could answer, we would have perfectly defined human philosophical taste—the thing we can’t do. Yet, in some good-enough sense, our own uncertainty eventually becomes well-represented by the machine’s uncertainty. That’s the stopping point at which we no longer need to provide additional explicit feedback to the machine.)
Interesting. So the point is to learn how to move up the hierarchy? I mean, that makes a lot of sense. It is a sort of fixed point description, because then the AI can keep moving up the hierarchy as far as it wants, which mean the whole hierarchy is encoded by it’s behavior. It’s just a question of how far up it needs to go to get satisfying answers.
Is that correct?
Right. I mean, I would clarify that the whole point isn’t to learn to go up the hierarchy; in some sense, most of the point is learning at a few levels. But yeah.