If we can extract utility in a purer fashion, I think we should. At the bare minimum, it would be much more run-time efficient. That said, trying to do so opens up a whole can of worms of really hard problems. This proposal, provided you’re careful about how you set it up, pretty much dodges all of that, as far as I can tell. Which means we could implement it faster, should that be necessary. I mean, yes, AGI is still very hard problem, but I think this reduces the F part of FAI to a manageable level, even given the impoverished understanding we have right now. And, assuming a properly modular code base, it would not be too difficult to swap out ‘get utility by asking questions’ with ‘get utility by analyzing model directly.’ Actually, the thing might even do that itself, since it might better maximize its utility function.
Not quite. It actually replaces it with the problem of maximizing people’s expected reported life satisfaction. If you wanted to choose to try heroin, this system would be able to look ahead, see that that choice will probably drastically reduce your long-term life satisfaction (more than the annoyance at the intervention), and choose to intervene and stop you.
I’m not convinced ‘what’s best for people’ with no asterisk is a coherent problem description in the first place.
If we can extract utility in a purer fashion, I think we should. At the bare minimum, it would be much more run-time efficient. That said, trying to do so opens up a whole can of worms of really hard problems. This proposal, provided you’re careful about how you set it up, pretty much dodges all of that, as far as I can tell. Which means we could implement it faster, should that be necessary. I mean, yes, AGI is still very hard problem, but I think this reduces the F part of FAI to a manageable level, even given the impoverished understanding we have right now. And, assuming a properly modular code base, it would not be too difficult to swap out ‘get utility by asking questions’ with ‘get utility by analyzing model directly.’ Actually, the thing might even do that itself, since it might better maximize its utility function.
Well, it replaces it with a more manageable problem, anyway.
More specifically, it replaces the question “what’s best for people?” with the question “what would people choose, given a choice?”
Of course, if I’m concerned that those questions might have different answers, I might be reluctant to replace the former with the latter.
Not quite. It actually replaces it with the problem of maximizing people’s expected reported life satisfaction. If you wanted to choose to try heroin, this system would be able to look ahead, see that that choice will probably drastically reduce your long-term life satisfaction (more than the annoyance at the intervention), and choose to intervene and stop you.
I’m not convinced ‘what’s best for people’ with no asterisk is a coherent problem description in the first place.
Sure, I accept the correction.
And, sure, I’m not convinced of that either.