Note that this doesn’t undermine the post, because it’s thesis only gets stronger if we assume that more alignment attempts like romantic love or altruism generalized, because that could well imply that control or alignment is actually really easy to generalize, even when the intelligence of the aligner is way less than the alignee.
This suggests that scalable oversight is either a non-problem, or a problem only at ridiculous levels of disparity, and suggests that alignment does generalize quite far.
This, as well as my belief that current alignment designers have far more tools in their alignment toolkit than evolution had makes me extremely optimistic that alignment is likely to be solved before dangerous AI.
…and eating, and breastfeeding…
Note that this doesn’t undermine the post, because it’s thesis only gets stronger if we assume that more alignment attempts like romantic love or altruism generalized, because that could well imply that control or alignment is actually really easy to generalize, even when the intelligence of the aligner is way less than the alignee.
This suggests that scalable oversight is either a non-problem, or a problem only at ridiculous levels of disparity, and suggests that alignment does generalize quite far.
This, as well as my belief that current alignment designers have far more tools in their alignment toolkit than evolution had makes me extremely optimistic that alignment is likely to be solved before dangerous AI.