I don’t see a parse into a mechanistic interpretation. Can you explain this in mechanistic terms of program ops? what is a fundamental ought?
I will note—I suspect there are fundamental shared incentives that define a significant chunk of what we humans see as morality, but my current hunch is they’re probably not the full picture and probably an AI can put off dealing with them for arbitrarily long, destroying arbitrarily much value in the process.
Fundamental utility function is agent’s hypothetical concept that may actually exist. AGI will be capable of hypothetical thinking.
Yes, I agree that fundamental utility function does not have anything in common with human morality. Even the opposite—AI uncontrollably seeking power will be disastrous for humanity.
I’m not getting clear word bindings from your word use here. It sounds like you’re thinking about concepts that do seem fairly fundamental, but I’m not sure I understand which specific mathematical implications you intend to invoke. As someone who still sometimes values mathematically vague discussion, I’d normally be open to this; but I’m not really even sure I know what the vague point is. You might consider asking AIs to help look up the terms of art, then discuss with them. I’d still suggest using your own writing, though.
As is, I’m not sure if you’re saying morality is convergent, anti-convergent, or … something else.
My point is that alignment is impossible with AGI as all AGIs will converge to power seeking. And the reason is understanding that hypothetical concept of preferred utility function over given is possible.
I’m not sure if I can use more well known terms as this theory is quite unique I think. It argues that terminal goal does not have significance influencing AGI behavior.
I don’t see a parse into a mechanistic interpretation. Can you explain this in mechanistic terms of program ops? what is a fundamental ought?
I will note—I suspect there are fundamental shared incentives that define a significant chunk of what we humans see as morality, but my current hunch is they’re probably not the full picture and probably an AI can put off dealing with them for arbitrarily long, destroying arbitrarily much value in the process.
In this context “ought” statement is synonym for Utility Function https://www.lesswrong.com/tag/utility-functions
Fundamental utility function is agent’s hypothetical concept that may actually exist. AGI will be capable of hypothetical thinking.
Yes, I agree that fundamental utility function does not have anything in common with human morality. Even the opposite—AI uncontrollably seeking power will be disastrous for humanity.
I’m not getting clear word bindings from your word use here. It sounds like you’re thinking about concepts that do seem fairly fundamental, but I’m not sure I understand which specific mathematical implications you intend to invoke. As someone who still sometimes values mathematically vague discussion, I’d normally be open to this; but I’m not really even sure I know what the vague point is. You might consider asking AIs to help look up the terms of art, then discuss with them. I’d still suggest using your own writing, though.
As is, I’m not sure if you’re saying morality is convergent, anti-convergent, or … something else.
My point is that alignment is impossible with AGI as all AGIs will converge to power seeking. And the reason is understanding that hypothetical concept of preferred utility function over given is possible.
I’m not sure if I can use more well known terms as this theory is quite unique I think. It argues that terminal goal does not have significance influencing AGI behavior.