Two things I’d especially like to highlight in this post:
Fundamentally, structural constraints give us back some of the guarantees of the main epistemic strategies of Science and Engineering that get lost in alignment: we don’t have the technology yet, but we have some ideas of how it will work.
This is possibly the best one-sentence summary I’ve seen of how these sorts of theorems would be useful.
One corollary of recovering (some of) the usual science-and-engineering strategies is that selection theorems would open the door to a lot of empirical work on alignment and agency. Thus the importance of this section:
Proving structural selection theorem
Choose a selection mechanism to investigate.
Find a structural constraint that should be favored by the mechanism.
Prove the theorem.
Show that agents with these structural constraints are easier to find.
Show that many agents without the structural constraints can be easily found by the selection pressure.
Show that agents with structural constraints are a majority.
Show that there isn’t a majority of selected-for agents with structural constraints.
Show that agents with structural constraints are easier to sample.
Argue that the set of selected-for agents is different that the one used in the work, and that for the actual set, sampling agents without structural constraints becomes simpler.
Propose a sampling of agents and show it results in structural constraints with high-probability.
Show that the proposed sampling disagrees with what the selection pressure actually finds (showing that the probabilities are different, or that one can sample agents that the other can’t).
Checking that the selection theorem applies
Check that the selection exists.
For a mechanism, check that it fits with how selection happens.
Show that the actual selection works differently than the mechanism described, and that these differences influence massively what is selected in the end.
These are all potential ways to empirically test various kinds of selection theorems.
Two things I’d especially like to highlight in this post:
This is possibly the best one-sentence summary I’ve seen of how these sorts of theorems would be useful.
One corollary of recovering (some of) the usual science-and-engineering strategies is that selection theorems would open the door to a lot of empirical work on alignment and agency. Thus the importance of this section:
These are all potential ways to empirically test various kinds of selection theorems.