My internal model of you is that you believe this approach would not be enough because the utility would not be defined on the internal concepts of the agent. Yet I think it doesn’t have so much to be defined on these internal concepts itself than to rely on some assumption about these internal concepts.
Yeah, this is an accurate portrayal of my views. I’d also note that the project of mapping internal concepts to mathematical formalisms was the main goal of the whole era of symbolic AI, and failed badly. (Although the analogy is a little loose, so I wouldn’t take it as a decisive objection, but rather a nudge to formulate a good explanation of what they were doing wrong that you will do right.)
I agree more and more with you that the big mistake with using utility functions/reward for thinking about goal-directedness is not so much that they are a bad abstractions, but that they are often used as if every utility function is as meaningful as any other.
I don’t think this is an accurate portrayal of my views. I am trying to say that utility functions are a bad abstraction for reasoning about AGI, for similar reasons to why health points are a bad abstraction for reasoning about livers. (I think I agree with the rest of the paragraph though.)
Yeah, this is an accurate portrayal of my views. I’d also note that the project of mapping internal concepts to mathematical formalisms was the main goal of the whole era of symbolic AI, and failed badly. (Although the analogy is a little loose, so I wouldn’t take it as a decisive objection, but rather a nudge to formulate a good explanation of what they were doing wrong that you will do right.)
My first intuition is that I expect mapping internal concept to mathematical formalisms to be easier when the end goal is deconfusion and making sense of behaviors, compared to actually improving capabilities. But I’d have to think about it some more. Thanks at least for an interesting test to try to apply to my attempt.
I don’t think this is an accurate portrayal of my views. I am trying to say that utility functions are a bad abstraction for reasoning about AGI, for similar reasons to why health points are a bad abstraction for reasoning about livers. (I think I agree with the rest of the paragraph though.)
Okay, do you mean that you agree with my paragraph but what you are really arguing about is that utility functions don’t care about the low-level/internals of the system, and that’s why they’re bad abstractions? (That’s how I understand your liver and health points example).
Yeah, this is an accurate portrayal of my views. I’d also note that the project of mapping internal concepts to mathematical formalisms was the main goal of the whole era of symbolic AI, and failed badly. (Although the analogy is a little loose, so I wouldn’t take it as a decisive objection, but rather a nudge to formulate a good explanation of what they were doing wrong that you will do right.)
I don’t think this is an accurate portrayal of my views. I am trying to say that utility functions are a bad abstraction for reasoning about AGI, for similar reasons to why health points are a bad abstraction for reasoning about livers. (I think I agree with the rest of the paragraph though.)
My first intuition is that I expect mapping internal concept to mathematical formalisms to be easier when the end goal is deconfusion and making sense of behaviors, compared to actually improving capabilities. But I’d have to think about it some more. Thanks at least for an interesting test to try to apply to my attempt.
Okay, do you mean that you agree with my paragraph but what you are really arguing about is that utility functions don’t care about the low-level/internals of the system, and that’s why they’re bad abstractions? (That’s how I understand your liver and health points example).