ozziegooen comments on Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

ozziegooen 12 Feb 2025 21:31 UTC
52 points
59
Happy to see work to elicit utility functions with LLMs. I think the intersection of utility functions and LLMs is broadly promising.

I want to flag the grandiosity of the title though. “Utility Engineering” sounds like a pretty significant thing. But from what I understand, almost all of the paper is really about utility elicitation (not control, as it spelled out), and it’s really unclear if this represents a breakthrough significant enough for me to feel comfortable with such a name.

I feel like a whole lot of what I see from the Center For AI Safety does this. “Humanity’s Final Exam”? “Superhuman Forecasting”?

I assume that CFAS thinks that CFAS’s work is all pretty groundbreaking and incredibly significant, but I’d kindly encourage names that many other AI safety community members would also broadly agree with going forward.