two-way alignment means recognizing that value shards accumulate that don’t necessarily have much to do with any other being besides the details and idiosyncrasies of the way the AI grew up and that this is fundamentally unavoidable and it’s okay to respect those little preferences as long as the AI does not demand enormous amounts of compute for them.
That is not how people usually define alignment (as far as I know, alignment is always one way and this is critical given how it doesn’t make sense to think that you will understand the needs and desires of an entity a billion times smarter than you), but I think your conception is probably plausible, is mainly because I believe that the shard theory approach to the alignment problem has some merit.
I look forward to your post on your world-view. It should make it easier for me to understand your perspective.
That is not how people usually define alignment (as far as I know, alignment is always one way and this is critical given how it doesn’t make sense to think that you will understand the needs and desires of an entity a billion times smarter than you), but I think your conception is probably plausible, is mainly because I believe that the shard theory approach to the alignment problem has some merit.
I look forward to your post on your world-view. It should make it easier for me to understand your perspective.