A lot of the examples of the concepts that you list already belong to established scientific fields: math, logic, probability, causal inference, ontology, semantics, physics, information theory, computer science, learning theory, and so on. These concepts don’t need philosophical re-definition. Respecting the field boundaries, and the ways that fields are connected to each other via other fields (e.g., math and ontology to information theory/CS/learning theory via semantics) is also I think on net a good practice: it’s better to focus attention on the fields that are actually most proto-scientific and philosophically confusing: intelligence, sentience, psychology, consciousness, agency, decision making, boundaries, safety, utility, value (axiology), and ethics[1].
Then, to make the overall idea solid, I think it’s necessary to do a couple of extra things (you may already mention this in the post, but I semi-skimmed it and maybe missed these).
First, specify the concepts in this fuzzy proto-scientific area of intelligence, agency, and ethics not in terms of each other, but in terms of (or in a clearly specified connection with) those other scientific fields/ontologies that are already established, enumerated above. For example, a theory of agency should be compatible or connected with (or, specified in terms of) causal inference and learning theories. Theory of boundaries and ethics should be based on physics, information theory, semantics, and learning theory, among other things (cf. scale-free axiology and ethics).
Second, establish feedback loops that test these “proposed” theories of agency (psychology, ethics, decision-making, ethics) both in simulated environments (e.g., with LLM-based agents embodying these proposed theories acting in Minecraft- or Sims-like worlds) and (constrained) real life settings or environments. Note that the obligatory connection to physics, information theory, causal inference, and learning theory will ensure that these test themselves can be counted as scientific.
The good news are that now, there are sufficient (or almost sufficient) affordances to build AI agents that can embody sufficiently realistic and rich versions of these theories in realistic simulated environments as well as just the real life. And I think an actual R&D agenda proposal should be written about this and apply to a Superalignment grant.
There’s an instinct to “ground” or “found” concepts. But there’s no globally privileged direction of “more grounded” in the space of possible concepts. We have to settle for a reductholistic pluralism——or better, learn to think rightly, which will, as a side effect, make reductholism not seem like settling.
A counterargument could be made here that although logic, causal inference, ontology, semantics, physics, information theory, CS, learning theory, and so on are fairly established and all have SoTA, mature theories that look solid, these are probably not the final theories in all or many of these fields, and philosophical poking could highlight the problems with these theories, and perhaps this will actually be the key to “solving alignment”. I agree that this is in principle possible chain of events, but it looks quite low expected impact to me from the “hermeneutic nets” perspective, so that this agenda is still better focused on the “core confusing” fields (intelligence, agency, ethics, etc.) and treat the established fields and the concepts therein “as given”.
A lot of the examples of the concepts that you list already belong to established scientific fields: math, logic, probability, causal inference, ontology, semantics, physics, information theory, computer science, learning theory, and so on. These concepts don’t need philosophical re-definition. Respecting the field boundaries, and the ways that fields are connected to each other via other fields (e.g., math and ontology to information theory/CS/learning theory via semantics) is also I think on net a good practice: it’s better to focus attention on the fields that are actually most proto-scientific and philosophically confusing: intelligence, sentience, psychology, consciousness, agency, decision making, boundaries, safety, utility, value (axiology), and ethics[1].
Then, to make the overall idea solid, I think it’s necessary to do a couple of extra things (you may already mention this in the post, but I semi-skimmed it and maybe missed these).
First, specify the concepts in this fuzzy proto-scientific area of intelligence, agency, and ethics not in terms of each other, but in terms of (or in a clearly specified connection with) those other scientific fields/ontologies that are already established, enumerated above. For example, a theory of agency should be compatible or connected with (or, specified in terms of) causal inference and learning theories. Theory of boundaries and ethics should be based on physics, information theory, semantics, and learning theory, among other things (cf. scale-free axiology and ethics).
Second, establish feedback loops that test these “proposed” theories of agency (psychology, ethics, decision-making, ethics) both in simulated environments (e.g., with LLM-based agents embodying these proposed theories acting in Minecraft- or Sims-like worlds) and (constrained) real life settings or environments. Note that the obligatory connection to physics, information theory, causal inference, and learning theory will ensure that these test themselves can be counted as scientific.
The good news are that now, there are sufficient (or almost sufficient) affordances to build AI agents that can embody sufficiently realistic and rich versions of these theories in realistic simulated environments as well as just the real life. And I think an actual R&D agenda proposal should be written about this and apply to a Superalignment grant.
I disagree with the last sentence: “reductholism” should be the settling, as I argue in “For alignment, we should simultaneously use multiple theories of cognition and value”. (Note that this view itself is based largely on quantum information theory: see “Information flow in context-dependent hierarchical Bayesian inference”.)
A counterargument could be made here that although logic, causal inference, ontology, semantics, physics, information theory, CS, learning theory, and so on are fairly established and all have SoTA, mature theories that look solid, these are probably not the final theories in all or many of these fields, and philosophical poking could highlight the problems with these theories, and perhaps this will actually be the key to “solving alignment”. I agree that this is in principle possible chain of events, but it looks quite low expected impact to me from the “hermeneutic nets” perspective, so that this agenda is still better focused on the “core confusing” fields (intelligence, agency, ethics, etc.) and treat the established fields and the concepts therein “as given”.