3. I agree with your point, especially that u(chocolate ice cream and vanilla ice cream)≠u(chocolate ice cream)+u(vanilla ice cream) should be true.
But I think I can salvage my point by making a further distinction. When I write u(chocolate ice cream) I actually mean u(emb(chocolate ice cream)) where emb is a semantic embedding that takes sentences to vectors. Already at the level of the embedding we probably have emb(chocolate ice cream and vanilla ice cream)≠emb(chocolate ice cream)+emb(vanilla ice cream),
and that’s (potentially) a good thing! Because if we structure our embedding in such a way that emb(chocolate ice cream)+emb(vanilla ice cream) points to something that is actually comparable to the conjunction of the two, then our utility function can just be naively linear in the way I constructed it above, u(emb(chocolate ice cream and vanilla ice cream))=u(emb(chocolate ice cream))+u(emb(vanilla ice cream)).I belieeeeeve that this is what I wanted to gesture at when I said that we need to identify an appropriate basis in an appropriate space (i.e. where emb(chocolate ice cream and vanilla ice cream)=emb(chocolate ice cream)+emb(vanilla ice cream),and whatever else we might want out of the embedding). But I have a large amount of uncertainty around all of this.
I still disagree / am confused. If it’s indeed the case that emb(chocolate ice cream and vanilla ice cream)≠emb(chocolate ice cream)+emb(vanilla ice cream), then why would we expect u(emb(chocolate ice cream and vanilla ice cream))=u(emb(chocolate ice cream))+u(emb(vanilla ice cream))? (Also, in the second-to-last sentence of your comment, it looks like you say the former is an equality.) Furthermore, if the latter equality is true, wouldn’t it imply that the utility we get from [chocolate ice cream and vanilla ice cream] is the sum of the utility from chocolate ice cream and the utility from vanilla ice cream? Isn’t u(emb(X)) supposed to be equal to the utility of X?
My current best attempt to understand/steelman this is to accept emb(chocolate ice cream and vanilla ice cream)≠emb(chocolate ice cream)+emb(vanilla ice cream), to reject u(emb(chocolate ice cream and vanilla ice cream))=u(emb(chocolate ice cream))+u(emb(vanilla ice cream)), and to try to think of the embedding as something slightly strange. I don’t see a reason to think utility would be linear in current semantic embeddings of natural language or of a programming language, nor do I see an appealing other approach to construct such an embedding. Maybe we could figure out a correct embedding if we had access to lots of data about the agent’s preferences (possibly in addition to some semantic/physical data), but it feels like that might defeat the idea of this embedding in the context of this post as constituting a step that does not yet depend on preference data. Or alternatively, if we are fine with using preference data on this step, maybe we could find a cool embedding, but in that case, it seems very likely that it would also just give us a one-step solution to the entire problem of computing a set of rational preferences for the agent.
A separate attempt to steelman this would be to assume that we have access to a semantic embedding pretrained on preference data from a bunch of other agents, and then to tune the utilities of the basis to best fit the preferences of the agent we are currently dealing with. That seems like it a cool idea, although I’m not sure if it has strayed too far from the spirit of the original problem.
Thank you for the thoughtful reply!
3. I agree with your point, especially that u(chocolate ice cream and vanilla ice cream)≠u(chocolate ice cream)+u(vanilla ice cream) should be true.
But I think I can salvage my point by making a further distinction. When I write u(chocolate ice cream) I actually mean u(emb(chocolate ice cream)) where emb is a semantic embedding that takes sentences to vectors. Already at the level of the embedding we probably have emb(chocolate ice cream and vanilla ice cream)≠emb(chocolate ice cream)+emb(vanilla ice cream),
and that’s (potentially) a good thing! Because if we structure our embedding in such a way that emb(chocolate ice cream)+emb(vanilla ice cream) points to something that is actually comparable to the conjunction of the two, then our utility function can just be naively linear in the way I constructed it above, u(emb(chocolate ice cream and vanilla ice cream))=u(emb(chocolate ice cream))+u(emb(vanilla ice cream)).I belieeeeeve that this is what I wanted to gesture at when I said that we need to identify an appropriate basis in an appropriate space (i.e. where emb(chocolate ice cream and vanilla ice cream)=emb(chocolate ice cream)+emb(vanilla ice cream),and whatever else we might want out of the embedding). But I have a large amount of uncertainty around all of this.
I still disagree / am confused. If it’s indeed the case that emb(chocolate ice cream and vanilla ice cream)≠emb(chocolate ice cream)+emb(vanilla ice cream), then why would we expect u(emb(chocolate ice cream and vanilla ice cream))=u(emb(chocolate ice cream))+u(emb(vanilla ice cream))? (Also, in the second-to-last sentence of your comment, it looks like you say the former is an equality.) Furthermore, if the latter equality is true, wouldn’t it imply that the utility we get from [chocolate ice cream and vanilla ice cream] is the sum of the utility from chocolate ice cream and the utility from vanilla ice cream? Isn’t u(emb(X)) supposed to be equal to the utility of X?
My current best attempt to understand/steelman this is to accept emb(chocolate ice cream and vanilla ice cream)≠emb(chocolate ice cream)+emb(vanilla ice cream), to reject u(emb(chocolate ice cream and vanilla ice cream))=u(emb(chocolate ice cream))+u(emb(vanilla ice cream)), and to try to think of the embedding as something slightly strange. I don’t see a reason to think utility would be linear in current semantic embeddings of natural language or of a programming language, nor do I see an appealing other approach to construct such an embedding. Maybe we could figure out a correct embedding if we had access to lots of data about the agent’s preferences (possibly in addition to some semantic/physical data), but it feels like that might defeat the idea of this embedding in the context of this post as constituting a step that does not yet depend on preference data. Or alternatively, if we are fine with using preference data on this step, maybe we could find a cool embedding, but in that case, it seems very likely that it would also just give us a one-step solution to the entire problem of computing a set of rational preferences for the agent.
A separate attempt to steelman this would be to assume that we have access to a semantic embedding pretrained on preference data from a bunch of other agents, and then to tune the utilities of the basis to best fit the preferences of the agent we are currently dealing with. That seems like it a cool idea, although I’m not sure if it has strayed too far from the spirit of the original problem.