General comment on this list: the mental image behind most of the items seems to be roughly “break the system into parts, which are individually interpretable and nonmalicious”. Intermediates expressed in natural language, decomposition into interpretable subtasks, myopia, minimal hidden state, lack of situational awareness, legibility, process-based, and composition of narrow tools all tie into that general pattern.
That’s not really a problem for any of these properties individually—they’re each still potentially worthwhile properties which make AI relatively safer. But in aggregate, bear in mind that there’s decreasing marginal returns to properties which all pursue roughly-similar upsides and have roughly-similar shortcomings. If this list is your starting point, then there’s probably a lot more marginal value in adding properties which which address e.g. the composability problem, rather than additional similar properties to those listed.
General comment on this list: the mental image behind most of the items seems to be roughly “break the system into parts, which are individually interpretable and nonmalicious”. Intermediates expressed in natural language, decomposition into interpretable subtasks, myopia, minimal hidden state, lack of situational awareness, legibility, process-based, and composition of narrow tools all tie into that general pattern.
These all share similar shortcomings: interpretability/tool-ness/corrigibility/etc are not composable, and also (depending on which versions of these ideas one imagines) the not-yet-well-written-up problem of “you don’t get to choose the ontology”.
That’s not really a problem for any of these properties individually—they’re each still potentially worthwhile properties which make AI relatively safer. But in aggregate, bear in mind that there’s decreasing marginal returns to properties which all pursue roughly-similar upsides and have roughly-similar shortcomings. If this list is your starting point, then there’s probably a lot more marginal value in adding properties which which address e.g. the composability problem, rather than additional similar properties to those listed.