Thanks for explaining, your position makes more sense now. I think I agree with your overall point that there isn’t a “platonic Want” than can be directly inferred from physical state, at least without substantial additional psychology/philosophy investigation (which could, among other things, define bargaining solutions among the different wants).
So, there are at least a few different issues here for contingent wants:
Wants vary over time.
OK, so add a time parameter, and do what I want right now.
People could potentially use different “wanting” models for themselves.
Yes, but some models are better than others. (There’s a discussion of arbitrariness of models here which seems relevant)
In practice the brain is going to use some weighting procedure between them. If this procedure isn’t doing necessary messy work (it’s really not clear if it is), then it can be replaced with an algorithm. If it is, then perhaps the top priority for value learning is “figure out what this thingy is doing and form moral opinions about it”.
“Wanting” models are fallible.
Not necessarily a problem (but see next point); the main thing with AI alignment is to do much better than the “default” policy of having aligned humans continue to take actions, using whatever brain they have, without using AGI assistance. If people manage with having fallible “wanting” models, then perhaps the machinery people use to manage this can be understood?
“Wanting” models have limited domains of applicability.
This seems like Wei’s partial utility function problem and is related to the ontology identification problem. It’s pretty serious and is also a problem independently of value learning. Solving this problem would require either directly solving the philosophical problem, or doing psychology to figure out what machinery does ontology updates (and form moral opinions about that).
Thanks for explaining, your position makes more sense now. I think I agree with your overall point that there isn’t a “platonic Want” than can be directly inferred from physical state, at least without substantial additional psychology/philosophy investigation (which could, among other things, define bargaining solutions among the different wants).
So, there are at least a few different issues here for contingent wants:
Wants vary over time.
OK, so add a time parameter, and do what I want right now.
People could potentially use different “wanting” models for themselves.
Yes, but some models are better than others. (There’s a discussion of arbitrariness of models here which seems relevant)
In practice the brain is going to use some weighting procedure between them. If this procedure isn’t doing necessary messy work (it’s really not clear if it is), then it can be replaced with an algorithm. If it is, then perhaps the top priority for value learning is “figure out what this thingy is doing and form moral opinions about it”.
“Wanting” models are fallible.
Not necessarily a problem (but see next point); the main thing with AI alignment is to do much better than the “default” policy of having aligned humans continue to take actions, using whatever brain they have, without using AGI assistance. If people manage with having fallible “wanting” models, then perhaps the machinery people use to manage this can be understood?
“Wanting” models have limited domains of applicability.
This seems like Wei’s partial utility function problem and is related to the ontology identification problem. It’s pretty serious and is also a problem independently of value learning. Solving this problem would require either directly solving the philosophical problem, or doing psychology to figure out what machinery does ontology updates (and form moral opinions about that).