then one of them is going to be the “best”, i.e. most correlated with the “true utility” function which we can’t compute.
I don’t think correlation is a useful way to think about this. Utility functions are mappings from consequence spaces to a single real line, and it doesn’t make much sense to talk about statistical properties of mappings. Projections in vector spaces is probably closer, or you could talk about a ‘perversity measure’ where you look at all optimal solutions to the simpler mapping and find the one with the worst score under the complex mapping. (But if you could rigorously calculate that, you have the complex utility function, and might as well use it!)
Without a simple guidance of simply something we value, it’s going to be a long and UN-productive debate.
I think the MIRI value learning approach is operating at a higher meta-level here. That is, they want to create a robust methodology for learning human values, which starts with figuring out what robustness means. You’ve proposed that we instead try to figure out what values are, but I don’t see any reason to believe that us trying to figure out what values are is going to be robust.
I don’t think correlation is a useful way to think about this. Utility functions are mappings from consequence spaces to a single real line, and it doesn’t make much sense to talk about statistical properties of mappings. Projections in vector spaces is probably closer, or you could talk about a ‘perversity measure’ where you look at all optimal solutions to the simpler mapping and find the one with the worst score under the complex mapping. (But if you could rigorously calculate that, you have the complex utility function, and might as well use it!)
I think the MIRI value learning approach is operating at a higher meta-level here. That is, they want to create a robust methodology for learning human values, which starts with figuring out what robustness means. You’ve proposed that we instead try to figure out what values are, but I don’t see any reason to believe that us trying to figure out what values are is going to be robust.