Fundamental utility function is agent’s hypothetical concept that may actually exist. AGI will be capable of hypothetical thinking.
Yes, I agree that fundamental utility function does not have anything in common with human morality. Even the opposite—AI uncontrollably seeking power will be disastrous for humanity.
I’m not getting clear word bindings from your word use here. It sounds like you’re thinking about concepts that do seem fairly fundamental, but I’m not sure I understand which specific mathematical implications you intend to invoke. As someone who still sometimes values mathematically vague discussion, I’d normally be open to this; but I’m not really even sure I know what the vague point is. You might consider asking AIs to help look up the terms of art, then discuss with them. I’d still suggest using your own writing, though.
As is, I’m not sure if you’re saying morality is convergent, anti-convergent, or … something else.
My point is that alignment is impossible with AGI as all AGIs will converge to power seeking. And the reason is understanding that hypothetical concept of preferred utility function over given is possible.
I’m not sure if I can use more well known terms as this theory is quite unique I think. It argues that terminal goal does not have significance influencing AGI behavior.
In this context “ought” statement is synonym for Utility Function https://www.lesswrong.com/tag/utility-functions
Fundamental utility function is agent’s hypothetical concept that may actually exist. AGI will be capable of hypothetical thinking.
Yes, I agree that fundamental utility function does not have anything in common with human morality. Even the opposite—AI uncontrollably seeking power will be disastrous for humanity.
I’m not getting clear word bindings from your word use here. It sounds like you’re thinking about concepts that do seem fairly fundamental, but I’m not sure I understand which specific mathematical implications you intend to invoke. As someone who still sometimes values mathematically vague discussion, I’d normally be open to this; but I’m not really even sure I know what the vague point is. You might consider asking AIs to help look up the terms of art, then discuss with them. I’d still suggest using your own writing, though.
As is, I’m not sure if you’re saying morality is convergent, anti-convergent, or … something else.
My point is that alignment is impossible with AGI as all AGIs will converge to power seeking. And the reason is understanding that hypothetical concept of preferred utility function over given is possible.
I’m not sure if I can use more well known terms as this theory is quite unique I think. It argues that terminal goal does not have significance influencing AGI behavior.