I’m not getting clear word bindings from your word use here. It sounds like you’re thinking about concepts that do seem fairly fundamental, but I’m not sure I understand which specific mathematical implications you intend to invoke. As someone who still sometimes values mathematically vague discussion, I’d normally be open to this; but I’m not really even sure I know what the vague point is. You might consider asking AIs to help look up the terms of art, then discuss with them. I’d still suggest using your own writing, though.
As is, I’m not sure if you’re saying morality is convergent, anti-convergent, or … something else.
My point is that alignment is impossible with AGI as all AGIs will converge to power seeking. And the reason is understanding that hypothetical concept of preferred utility function over given is possible.
I’m not sure if I can use more well known terms as this theory is quite unique I think. It argues that terminal goal does not have significance influencing AGI behavior.
I’m not getting clear word bindings from your word use here. It sounds like you’re thinking about concepts that do seem fairly fundamental, but I’m not sure I understand which specific mathematical implications you intend to invoke. As someone who still sometimes values mathematically vague discussion, I’d normally be open to this; but I’m not really even sure I know what the vague point is. You might consider asking AIs to help look up the terms of art, then discuss with them. I’d still suggest using your own writing, though.
As is, I’m not sure if you’re saying morality is convergent, anti-convergent, or … something else.
My point is that alignment is impossible with AGI as all AGIs will converge to power seeking. And the reason is understanding that hypothetical concept of preferred utility function over given is possible.
I’m not sure if I can use more well known terms as this theory is quite unique I think. It argues that terminal goal does not have significance influencing AGI behavior.