Firstly, thanks for reading the post! I think you’re referring mainly to realisability here which I’m not that clued up on tbh, but I’ll give you my two cents because why not.
I’m not sure to what extent we should focus on unrealisability when aligning systems. I think I have a similar intuition to you that the important question is probably “how can we get good abstractions of the world, given that we cannot perfectly model it”. However, I think better arguments for why unrealisability is a core problem in alignment than I have laid out probably do exist, I just haven’t read that much into it yet. I’ll link again to this video series on IB (which I’m yet to finish) as I think there are probably some good arguments here.
Firstly, thanks for reading the post! I think you’re referring mainly to realisability here which I’m not that clued up on tbh, but I’ll give you my two cents because why not.
I’m not sure to what extent we should focus on unrealisability when aligning systems. I think I have a similar intuition to you that the important question is probably “how can we get good abstractions of the world, given that we cannot perfectly model it”. However, I think better arguments for why unrealisability is a core problem in alignment than I have laid out probably do exist, I just haven’t read that much into it yet. I’ll link again to this video series on IB (which I’m yet to finish) as I think there are probably some good arguments here.