life expectancy(DALY or QALY), since to me, it is easier to measure than happiness.
Whoa, how are you measuring the disability/quality adjustment? That sounds like sneaking in ‘happiness’ measurements, and there are a bunch of challenges: we already run into issues where people who have a condition rate it as less bad than people who don’t have it. (For example, sighted people rate being blind as worse than blind people rate being blind.)
if you could be born in any society on earth today, what one number would be most congruent with your preference? Average life expectancy captures very well which societies are good to be born at.
There’s a general principle in management that really ought to be a larger part of the discussion of value learning: Goodhart’s Law. Right now, life expectancy is higher in better places, because good things are correlated. But if you directed your attention to optimizing towards life expectancy, you could find many things that make life less good but longer (or your definition of “QALY” needs to include the entirety of what goodness is, in which case we have made the problem no easier).
However, i’d rather have an approximate starting point for direct specification, rather than give up on the approach all-together.
But here’s where we come back to Goodhart’s Law: regardless of what simple measure you pick, it will be possible to demonstrate a perverse consequence of optimizing for that measure, because simplicity necessarily cuts out complexity that we don’t want to lose. (If you didn’t cut out the complexity, it’s not simple!)
Well, i get where you are coming from with Goodhart’s Law, but that’s not the question. Formally speaking, if we take the set of all utility functions with complexity < N = FIXED complexity number, then one of them is going to be the “best”, i.e. most correlated with the “true utility” function which we can’t compute.
As you point out, with we are selecting utilities that are too simple, such as straight up life expectancy, then even the “best” function is not “good enough” to just punch into an AGI because it will likely overfit and produce bad consequences. However we can still reason about “better” or “worse” measures of societies. People might complain about un-employment rate, but it’s a crappy metric to base your decision about which societies are over-all better than others, plus it’s easier to game.
The use of at least “trying” to formalize values means we can at least have a set of metrics, that’s not too large that we might care about in arguments like: “but the AGI reduced GDP, well it also reduced suicide rate”? Which is more important? Without a simple guidance of simply something we value, it’s going to be a long and UN-productive debate.
then one of them is going to be the “best”, i.e. most correlated with the “true utility” function which we can’t compute.
I don’t think correlation is a useful way to think about this. Utility functions are mappings from consequence spaces to a single real line, and it doesn’t make much sense to talk about statistical properties of mappings. Projections in vector spaces is probably closer, or you could talk about a ‘perversity measure’ where you look at all optimal solutions to the simpler mapping and find the one with the worst score under the complex mapping. (But if you could rigorously calculate that, you have the complex utility function, and might as well use it!)
Without a simple guidance of simply something we value, it’s going to be a long and UN-productive debate.
I think the MIRI value learning approach is operating at a higher meta-level here. That is, they want to create a robust methodology for learning human values, which starts with figuring out what robustness means. You’ve proposed that we instead try to figure out what values are, but I don’t see any reason to believe that us trying to figure out what values are is going to be robust.
Whoa, how are you measuring the disability/quality adjustment? That sounds like sneaking in ‘happiness’ measurements, and there are a bunch of challenges: we already run into issues where people who have a condition rate it as less bad than people who don’t have it. (For example, sighted people rate being blind as worse than blind people rate being blind.)
There’s a general principle in management that really ought to be a larger part of the discussion of value learning: Goodhart’s Law. Right now, life expectancy is higher in better places, because good things are correlated. But if you directed your attention to optimizing towards life expectancy, you could find many things that make life less good but longer (or your definition of “QALY” needs to include the entirety of what goodness is, in which case we have made the problem no easier).
But here’s where we come back to Goodhart’s Law: regardless of what simple measure you pick, it will be possible to demonstrate a perverse consequence of optimizing for that measure, because simplicity necessarily cuts out complexity that we don’t want to lose. (If you didn’t cut out the complexity, it’s not simple!)
Well, i get where you are coming from with Goodhart’s Law, but that’s not the question. Formally speaking, if we take the set of all utility functions with complexity < N = FIXED complexity number, then one of them is going to be the “best”, i.e. most correlated with the “true utility” function which we can’t compute.
As you point out, with we are selecting utilities that are too simple, such as straight up life expectancy, then even the “best” function is not “good enough” to just punch into an AGI because it will likely overfit and produce bad consequences. However we can still reason about “better” or “worse” measures of societies. People might complain about un-employment rate, but it’s a crappy metric to base your decision about which societies are over-all better than others, plus it’s easier to game.
The use of at least “trying” to formalize values means we can at least have a set of metrics, that’s not too large that we might care about in arguments like: “but the AGI reduced GDP, well it also reduced suicide rate”? Which is more important? Without a simple guidance of simply something we value, it’s going to be a long and UN-productive debate.
I don’t think correlation is a useful way to think about this. Utility functions are mappings from consequence spaces to a single real line, and it doesn’t make much sense to talk about statistical properties of mappings. Projections in vector spaces is probably closer, or you could talk about a ‘perversity measure’ where you look at all optimal solutions to the simpler mapping and find the one with the worst score under the complex mapping. (But if you could rigorously calculate that, you have the complex utility function, and might as well use it!)
I think the MIRI value learning approach is operating at a higher meta-level here. That is, they want to create a robust methodology for learning human values, which starts with figuring out what robustness means. You’ve proposed that we instead try to figure out what values are, but I don’t see any reason to believe that us trying to figure out what values are is going to be robust.