Happy to see work to elicit utility functions with LLMs. I think the intersection of utility functions and LLMs is broadly promising.
I want to flag the grandiosity of the title though. “Utility Engineering” sounds like a pretty significant thing. But from what I understand, almost all of the paper is really about utility elicitation (not control, as it spelled out), and it’s really unclear if this represents a breakthrough significant enough for me to feel comfortable with such a name.
I feel like a whole lot of what I see from the Center For AI Safety does this. “Humanity’s Final Exam”? “Superhuman Forecasting”?
I assume that CFAS thinks that CFAS’s work is all pretty groundbreaking and incredibly significant, but I’d kindly encourage names that many other AI safety community members would also broadly agree with going forward.
Happy to see work to elicit utility functions with LLMs. I think the intersection of utility functions and LLMs is broadly promising.
I want to flag the grandiosity of the title though. “Utility Engineering” sounds like a pretty significant thing. But from what I understand, almost all of the paper is really about utility elicitation (not control, as it spelled out), and it’s really unclear if this represents a breakthrough significant enough for me to feel comfortable with such a name.
I feel like a whole lot of what I see from the Center For AI Safety does this. “Humanity’s Final Exam”? “Superhuman Forecasting”?
I assume that CFAS thinks that CFAS’s work is all pretty groundbreaking and incredibly significant, but I’d kindly encourage names that many other AI safety community members would also broadly agree with going forward.