Claim #1 (about a “privileged subset”) is a claim that there aren’t multiple such natural abstractions (e.g. any other subset of human values that satisfies #3 would be a superset of the privileged subset, or a subset of the basin of attraction around the privileged subset.)
[But I haven’t yet fully read that post or your other linked posts.]
A load bearing claim of the robust values hypothesis for “alignment by default” is #2:
Said subset is a “naturalish” abstraction
The more natural the abstraction, the more robust values are
Example operationalisations of “naturalish abstraction”
The subset is highly privileged by the inductive biases of most learning algorithms that can efficiently learn our universe
More privileged → more natural
Most efficient representations of our universe contain a simple embedding of the subset
Simpler embeddings → more natural
The safety comes from #3, and #1, but #2 is why we’re not throwing a dart at random into AI space. It’s a property that makes value learning easier.
Sure. Though see Take 4.
Claim #1 (about a “privileged subset”) is a claim that there aren’t multiple such natural abstractions (e.g. any other subset of human values that satisfies #3 would be a superset of the privileged subset, or a subset of the basin of attraction around the privileged subset.)
[But I haven’t yet fully read that post or your other linked posts.]