I still disagree / am confused. If it’s indeed the case that emb(chocolate ice cream and vanilla ice cream)≠emb(chocolate ice cream)+emb(vanilla ice cream), then why would we expect u(emb(chocolate ice cream and vanilla ice cream))=u(emb(chocolate ice cream))+u(emb(vanilla ice cream))? (Also, in the second-to-last sentence of your comment, it looks like you say the former is an equality.) Furthermore, if the latter equality is true, wouldn’t it imply that the utility we get from [chocolate ice cream and vanilla ice cream] is the sum of the utility from chocolate ice cream and the utility from vanilla ice cream? Isn’t u(emb(X)) supposed to be equal to the utility of X?
My current best attempt to understand/steelman this is to accept emb(chocolate ice cream and vanilla ice cream)≠emb(chocolate ice cream)+emb(vanilla ice cream), to reject u(emb(chocolate ice cream and vanilla ice cream))=u(emb(chocolate ice cream))+u(emb(vanilla ice cream)), and to try to think of the embedding as something slightly strange. I don’t see a reason to think utility would be linear in current semantic embeddings of natural language or of a programming language, nor do I see an appealing other approach to construct such an embedding. Maybe we could figure out a correct embedding if we had access to lots of data about the agent’s preferences (possibly in addition to some semantic/physical data), but it feels like that might defeat the idea of this embedding in the context of this post as constituting a step that does not yet depend on preference data. Or alternatively, if we are fine with using preference data on this step, maybe we could find a cool embedding, but in that case, it seems very likely that it would also just give us a one-step solution to the entire problem of computing a set of rational preferences for the agent.
A separate attempt to steelman this would be to assume that we have access to a semantic embedding pretrained on preference data from a bunch of other agents, and then to tune the utilities of the basis to best fit the preferences of the agent we are currently dealing with. That seems like it a cool idea, although I’m not sure if it has strayed too far from the spirit of the original problem.
I still disagree / am confused. If it’s indeed the case that emb(chocolate ice cream and vanilla ice cream)≠emb(chocolate ice cream)+emb(vanilla ice cream), then why would we expect u(emb(chocolate ice cream and vanilla ice cream))=u(emb(chocolate ice cream))+u(emb(vanilla ice cream))? (Also, in the second-to-last sentence of your comment, it looks like you say the former is an equality.) Furthermore, if the latter equality is true, wouldn’t it imply that the utility we get from [chocolate ice cream and vanilla ice cream] is the sum of the utility from chocolate ice cream and the utility from vanilla ice cream? Isn’t u(emb(X)) supposed to be equal to the utility of X?
My current best attempt to understand/steelman this is to accept emb(chocolate ice cream and vanilla ice cream)≠emb(chocolate ice cream)+emb(vanilla ice cream), to reject u(emb(chocolate ice cream and vanilla ice cream))=u(emb(chocolate ice cream))+u(emb(vanilla ice cream)), and to try to think of the embedding as something slightly strange. I don’t see a reason to think utility would be linear in current semantic embeddings of natural language or of a programming language, nor do I see an appealing other approach to construct such an embedding. Maybe we could figure out a correct embedding if we had access to lots of data about the agent’s preferences (possibly in addition to some semantic/physical data), but it feels like that might defeat the idea of this embedding in the context of this post as constituting a step that does not yet depend on preference data. Or alternatively, if we are fine with using preference data on this step, maybe we could find a cool embedding, but in that case, it seems very likely that it would also just give us a one-step solution to the entire problem of computing a set of rational preferences for the agent.
A separate attempt to steelman this would be to assume that we have access to a semantic embedding pretrained on preference data from a bunch of other agents, and then to tune the utilities of the basis to best fit the preferences of the agent we are currently dealing with. That seems like it a cool idea, although I’m not sure if it has strayed too far from the spirit of the original problem.