It sounds like the difference between one or a few shards dominating each decision, vs a large ensemble, is very central and cruxy to you. And I still don’t see why that matters, so maybe that’s the main place to focus.
The extremely basic intuition is that all else equal, the more interests present at a bargaining table, the greater the chance that some of the interests are aligned.
My values are also risk-averse (I’d much rather take a 100% chance of 10% of the lightcone than a 20% chance of 100% of the lightcone), and my best guess is that internal values handshakes are ~linear in “shard strength” after some cutoff where the shards are at all reflectively endorsed (my avoid-spiders shard might not appreciably shape my final reflectively stable values). So more subshards seems like great news to me, all else equal, with more shard variety increasing the probability that part of the system is motivated the way I want it to be.
(This isn’t fully expressing my intuition, here, but I figured I’d say at least a little something to your comment right now)
I’m not going to go into most of the rest now, but:
For the coffee/bitter enemies thing, this doesn’t seem to me like a phenomenon which has anything to do with shards, it’s just a matter of type-signatures. A person who “likes coffee” likes to drink coffee; they don’t particularly want to fill the universe with coffee, they don’t particularly care whether anyone else likes to drink coffee (and nobody else cares whether they like to drink coffee) so there’s not really much reason for that preference to generate conflict. It’s not a disagreement over what-the-world-should-look-like; that’s not the type-signature of the preference.
I think that that does have to do with shards. Liking to drink coffee is the result of a shard, of a contextual influence on decision-making (the influence to drink coffee), and in particular activates in certain situations to pull me into a future in which I drank coffee.
I’m also fine considering “A person who is OK with other people drinking coffee” and anti-C: “a person with otherwise the same values but who isn’t OK with other people drinking coffee.” I think that the latter would inconvenience the former (to the extent that coffee was important to the former), but that they wouldn’t become bitter enemies, that anti-C wouldn’t kill the pro-coffee person because the value function was imperfectly aligned, that the pro-coffee person would still derive substantial value from that universe.
Possibly the anti-coffee value would even be squashed by the rest of anti-C’s values, because the anti-coffee value wasn’t reflectively endorsed by the rest of anti-C’s values. That’s another way in which I think anti-C can be “close enough” and things work out fine.
The extremely basic intuition is that all else equal, the more interests present at a bargaining table, the greater the chance that some of the interests are aligned.
My values are also risk-averse (I’d much rather take a 100% chance of 10% of the lightcone than a 20% chance of 100% of the lightcone), and my best guess is that internal values handshakes are ~linear in “shard strength” after some cutoff where the shards are at all reflectively endorsed (my avoid-spiders shard might not appreciably shape my final reflectively stable values). So more subshards seems like great news to me, all else equal, with more shard variety increasing the probability that part of the system is motivated the way I want it to be.
(This isn’t fully expressing my intuition, here, but I figured I’d say at least a little something to your comment right now)
I’m not going to go into most of the rest now, but:
I think that that does have to do with shards. Liking to drink coffee is the result of a shard, of a contextual influence on decision-making (the influence to drink coffee), and in particular activates in certain situations to pull me into a future in which I drank coffee.
I’m also fine considering “A person who is OK with other people drinking coffee” and anti-C: “a person with otherwise the same values but who isn’t OK with other people drinking coffee.” I think that the latter would inconvenience the former (to the extent that coffee was important to the former), but that they wouldn’t become bitter enemies, that anti-C wouldn’t kill the pro-coffee person because the value function was imperfectly aligned, that the pro-coffee person would still derive substantial value from that universe.
Possibly the anti-coffee value would even be squashed by the rest of anti-C’s values, because the anti-coffee value wasn’t reflectively endorsed by the rest of anti-C’s values. That’s another way in which I think anti-C can be “close enough” and things work out fine.