Yep! Or rather arguing that from a broadly RL-y + broadly Darwinian point of view ‘self-consistent ethics’ are likely to be natural enough that we can instill them, sticky enough to self-maintain, and capabilities-friendly enough to be practical and/or survive capabilities-optimization pressures in training.
Yep! Or rather arguing that from a broadly RL-y + broadly Darwinian point of view ‘self-consistent ethics’ are likely to be natural enough that we can instill them, sticky enough to self-maintain, and capabilities-friendly enough to be practical and/or survive capabilities-optimization pressures in training.