Sorry, I meant something like “whether there is a relatively simple decision algorithm with consistent preferences that we can extrapolate from that mess without discarding too much”. If not, then a superintelligence might be able to extrapolate us, but until then we’ll be stymied in our attempts to think rationally about large unfamiliar decisions.
Fair enough. Note that the superintelligence itself must be a simple decision algorithm for it to be knowably good, if that’s at all possible (at the outset, before starting to process the particular data from observations), which kinda defeats the purpose of your statement. :-)
Well, the code for the seed should be pretty simple, at least. But I don’t see how that defeats the purpose of my statement; it may be that short of enlisting a superintelligence to help, all current attempts to approximate and extrapolate human preferences in a consistent fashion (e.g. explicit ethical or political theories) might be too crude to have any chance of success (by the standard of actual human preferences) in novel scenarios. I don’t believe this will be the case, but it’s a possibility worth keeping an eye on.
Sorry, I meant something like “whether there is a relatively simple decision algorithm with consistent preferences that we can extrapolate from that mess without discarding too much”. If not, then a superintelligence might be able to extrapolate us, but until then we’ll be stymied in our attempts to think rationally about large unfamiliar decisions.
Fair enough. Note that the superintelligence itself must be a simple decision algorithm for it to be knowably good, if that’s at all possible (at the outset, before starting to process the particular data from observations), which kinda defeats the purpose of your statement. :-)
Well, the code for the seed should be pretty simple, at least. But I don’t see how that defeats the purpose of my statement; it may be that short of enlisting a superintelligence to help, all current attempts to approximate and extrapolate human preferences in a consistent fashion (e.g. explicit ethical or political theories) might be too crude to have any chance of success (by the standard of actual human preferences) in novel scenarios. I don’t believe this will be the case, but it’s a possibility worth keeping an eye on.