Suppose we have a bunch of short natural language descriptions of what we would want the AI to value. Can we simply give the AI a list of these, and tell it to maximize all of these values given some kind of equal weighting? It seems to me that, much more than in other areas of superintelligence design, the things we come up with are likely to point to what we want, and so aggregating a bunch of these descriptions is more likely to lead to what we want than picking any description individually. Does it seem like this would work? Is there any way this can go wrong?
Suppose we have a bunch of short natural language descriptions of what we would want the AI to value. Can we simply give the AI a list of these, and tell it to maximize all of these values given some kind of equal weighting? It seems to me that, much more than in other areas of superintelligence design, the things we come up with are likely to point to what we want, and so aggregating a bunch of these descriptions is more likely to lead to what we want than picking any description individually. Does it seem like this would work? Is there any way this can go wrong?