I’d be interested in more investigation into what environments/objective functions select for coherence and to what degree said selection occurs.
Basically, such environments are called “intelligent systems”. I mean that if we train shard-agent to some lowest-necessary level of superintelligence, it will look at itself and say “Whoa, what a mess of context-dependent decision-making influencers I got here! I should refine it into something more like utility function if I want to get more things that I want” and I see literally no reason for it to not do it. You can’t draw any parallels with actual intelligent systems here because actual intelligent systems don’t have ability to self-modify.
(Please, let’s not conduct experiments with self-modifying ML systems to check it?)
Secondly, I disagree with framing of “strong coherence”, because it seems to me that it implies some qualitative difference in coherence. It’s not that there are “coherent” and “not coherent” systems, there are more and less coherent systems. Fully coherent systems are likely impossible in our world, because full coherence is computationally intractable. It’s expected that future superintelligent systems will be incoherent and exploitable from POV of platonic unbounded agents, it just doesn’t change anything from our perspective—we still should treat them as helluva coherent. It doesn’t make a difference that superintelligence will predictably (for platonic unbounded agents) choose strategy with the probability of turning universe into paperclips one bazillionth less than optimal.
I can agree that direct result of SGD that produce first superintelligent system can be not very much more coherent than human. I don’t see reason for it to stay at that level of coherence.
Basically, such environments are called “intelligent systems”. I mean that if we train shard-agent to some lowest-necessary level of superintelligence, it will look at itself and say “Whoa, what a mess of context-dependent decision-making influencers I got here! I should refine it into something more like utility function if I want to get more things that I want” and I see literally no reason for it to not do it. You can’t draw any parallels with actual intelligent systems here because actual intelligent systems don’t have ability to self-modify.
Ridiculously strong levels of assuming your conclusion and failing to engage with the argument. Like this does not engage at all with the core contention that strong coherence may be anti-natural to generally intelligent systems in our universe.
Why did actual intelligent systems not develop as maximisers of a simple unitary utility function if that was actually optimal?
Why did evolution converge to agents that developed terminal values for drives that were instrumental for raising inclusive genetic fitness in their environment of evolutionary adaptedness?
I don’t see reason for it to stay at that level of coherence.
You have not actually justified the assumption that it would self modify into strong coherence.
That strong coherence is optimal.
That systems with malleable values would necessarily want to self modify into immutable terminal goals.
I would NOT take a pill to turn myself into an expected utility maximiser.
Basically, such environments are called “intelligent systems”. I mean that if we train shard-agent to some lowest-necessary level of superintelligence, it will look at itself and say “Whoa, what a mess of context-dependent decision-making influencers I got here! I should refine it into something more like utility function if I want to get more things that I want” and I see literally no reason for it to not do it. You can’t draw any parallels with actual intelligent systems here because actual intelligent systems don’t have ability to self-modify.
(Please, let’s not conduct experiments with self-modifying ML systems to check it?)
Secondly, I disagree with framing of “strong coherence”, because it seems to me that it implies some qualitative difference in coherence. It’s not that there are “coherent” and “not coherent” systems, there are more and less coherent systems. Fully coherent systems are likely impossible in our world, because full coherence is computationally intractable. It’s expected that future superintelligent systems will be incoherent and exploitable from POV of platonic unbounded agents, it just doesn’t change anything from our perspective—we still should treat them as helluva coherent. It doesn’t make a difference that superintelligence will predictably (for platonic unbounded agents) choose strategy with the probability of turning universe into paperclips one bazillionth less than optimal.
I can agree that direct result of SGD that produce first superintelligent system can be not very much more coherent than human. I don’t see reason for it to stay at that level of coherence.
Ridiculously strong levels of assuming your conclusion and failing to engage with the argument. Like this does not engage at all with the core contention that strong coherence may be anti-natural to generally intelligent systems in our universe.
Why did actual intelligent systems not develop as maximisers of a simple unitary utility function if that was actually optimal?
Why did evolution converge to agents that developed terminal values for drives that were instrumental for raising inclusive genetic fitness in their environment of evolutionary adaptedness?
You have not actually justified the assumption that it would self modify into strong coherence.
That strong coherence is optimal.
That systems with malleable values would necessarily want to self modify into immutable terminal goals.
I would NOT take a pill to turn myself into an expected utility maximiser.