List of fields/questions for interdisciplinary AI alignment research
The following list of fields and leading questions could be interesting for interdisciplinry AI alignment reserach. I started to compile this list to provide some anchorage for evaluating the value of interdiscplinary research for EA causes, specifically AI alignment.
Some comments on the list:
Some of these domains are likely already very much on the radar of some people, other’s are more speculative.
In some cases I have a decent idea of concrete lines of question that might be interesting, in other cases all I do is very broadly gesturing that “something here might be of interest”.
I don’t mean this list to be comprehensive or authoritative. On the contrary, this list is definitely skewed by domains I happened to have come across and found myself interested in.
While this list is specific to AI alignment (/safety/governance), I think the same rationale applies to other EA-relevant domains and I’d be excited for other people to compile similar lists relevant to their area of interest/expertise.
Very interested in hearing thoughts on the below!
Target domain: AI alignment/safety/governance
Evolutionary biology
Evolutionary biology seems to have a lot of potentially interesting things to say about AI alignment. Just a few examples include:
The relationship between environment, agent, evolutionary paths (which e.g. relates to to the role of training environments)
Niche construction as an angle on embedded agency
The nature of intelligence
Linguistics and Philosophy of language
Lots of things that are relevant to understanding the nature and origin of (general) intelligence better.
Sub-domains, such as semiotics could, for example, have relevant insights on topics like delegation and interpretability.
Cognitive science and neuroscience
Examples include Minsky’s Society of Minds (“The power of intelligence stems from our vast diversity, not from any single, perfect principle”), Hawkin’s A thousand brains (the role of reference frames for general intelligence), Frinston et al’s Predictive Coding/Predictive Processing (in its most ambitious versions a near universal theory of all things cognition, perception, comprehension and agency), and many more
Information theory
Information theory is hardly news to the AI alignment idea space. However, there might still be value on the table from deeper dives or more out-of-the-orderly applications of its insights. One example of this might be this paper on The Information Theory of Individuality.
Cybernetics/Control Systems
Cybernetics seems straightforwardly relevant to AI alignment. Personally, I’d love to have a piece of writing synthesising the most exciting intellectual developments under cybernetics done by someone with awareness of where the AI alignment field is at currently.
Complex systems studies
What does the study of complex systems have to say about robustness, interoperability, emergent alignment? It also offers insights into and methodology for approaching self-organization and collective intelligence which is interesting in particular in multi-multi scenarios.
Heterodox schools of economic thinking
Schools of thought are trying to reimagine the economy/capitalism and (political) organization, e.g. through decentralization and self-organization, by working on antitrust, by trying to understand potentially radical implications of digitalization on the fabric of the economy, etc. Complexity economics, for example, can help understanding the out-of-equilibrium dynamics that shape much of our economy and lives.
The richness of the history of political thought is astonishing; the most obvious might be ideas related to social choice or principles of governance. (A denses while also high-quality overview is offered by this podcast series History Of Ideas.) The crux in making the depth of political thought available and relevant to AI alignment is formalization, which seems extremely undersupplied in current academia for very similar reasons as I’ve argued above.
Management and organizational theory, Institutional economics and Institutional design
Talks for example about desiderata for institutions like robustness (e.g. here), or about how to understand and deal with institutional path-dependencies (e.g. here).
Pragmatically reliable alignment [taken from On purpose (footnotes); sharing this here because I want to be able to link to this extract specifically]
AI safety-relevant side note: The idea that translations of meaning need only be sufficiently reliable in order to be reliably useful might provide an interesting avenue for AI safety research.
Language works, evidenced by the striking success of human civilisations made possible through advanced coordination which in return requires advanced communication. (Sure, humans miscommunicate what feels like a whole lot, but in the bigger scheme of things, we still appear to be pretty damn good at this communication thing.)
Notably, language works without there being theoretically air-tight proofs that map meanings on words.
Right there, we have an empirical case study of a symbolic system that functions on a (merely) pragmatically reliable regime. We can use it to inform our priors on how well this regime might work in other systems, such as AI, and how and why it tends to fail.
One might argue that a pragmatically reliable alignment isn’t enough—not given the sheer optimization power of the systems we are talking about. Maybe that is true; maybe we do need more certainty than pragmatism can provide. Nevertheless, I believe that there are sufficient reasons for why this is an avenue worth exploring further.
Context: (1) Motivations for fostering EA-relevant interdisciplinary research; (2) “domain scanning” and “epistemic translation” as a way of thinking about interdisciplinary research
[cross-posted to the EA forum in shortform]
List of fields/questions for interdisciplinary AI alignment research
The following list of fields and leading questions could be interesting for interdisciplinry AI alignment reserach. I started to compile this list to provide some anchorage for evaluating the value of interdiscplinary research for EA causes, specifically AI alignment.
Some comments on the list:
Some of these domains are likely already very much on the radar of some people, other’s are more speculative.
In some cases I have a decent idea of concrete lines of question that might be interesting, in other cases all I do is very broadly gesturing that “something here might be of interest”.
I don’t mean this list to be comprehensive or authoritative. On the contrary, this list is definitely skewed by domains I happened to have come across and found myself interested in.
While this list is specific to AI alignment (/safety/governance), I think the same rationale applies to other EA-relevant domains and I’d be excited for other people to compile similar lists relevant to their area of interest/expertise.
Very interested in hearing thoughts on the below!
Target domain: AI alignment/safety/governance
Evolutionary biology
Evolutionary biology seems to have a lot of potentially interesting things to say about AI alignment. Just a few examples include:
The relationship between environment, agent, evolutionary paths (which e.g. relates to to the role of training environments)
Niche construction as an angle on embedded agency
The nature of intelligence
Linguistics and Philosophy of language
Lots of things that are relevant to understanding the nature and origin of (general) intelligence better.
Sub-domains, such as semiotics could, for example, have relevant insights on topics like delegation and interpretability.
Cognitive science and neuroscience
Examples include Minsky’s Society of Minds (“The power of intelligence stems from our vast diversity, not from any single, perfect principle”), Hawkin’s A thousand brains (the role of reference frames for general intelligence), Frinston et al’s Predictive Coding/Predictive Processing (in its most ambitious versions a near universal theory of all things cognition, perception, comprehension and agency), and many more
Information theory
Information theory is hardly news to the AI alignment idea space. However, there might still be value on the table from deeper dives or more out-of-the-orderly applications of its insights. One example of this might be this paper on The Information Theory of Individuality.
Cybernetics/Control Systems
Cybernetics seems straightforwardly relevant to AI alignment. Personally, I’d love to have a piece of writing synthesising the most exciting intellectual developments under cybernetics done by someone with awareness of where the AI alignment field is at currently.
Complex systems studies
What does the study of complex systems have to say about robustness, interoperability, emergent alignment? It also offers insights into and methodology for approaching self-organization and collective intelligence which is interesting in particular in multi-multi scenarios.
Heterodox schools of economic thinking
Schools of thought are trying to reimagine the economy/capitalism and (political) organization, e.g. through decentralization and self-organization, by working on antitrust, by trying to understand potentially radical implications of digitalization on the fabric of the economy, etc. Complexity economics, for example, can help understanding the out-of-equilibrium dynamics that shape much of our economy and lives.
Political economy
An interesting framework for thinking about AI alignment as a socio-technical challenge. Particularly relevant from a multi-multi perspective, or for thinking along the lines of cooperative AI. Pointer: Mapping the Political Economy of Reinforcement Learning Systems: The Case of Autonomous Vehicles
Political theory
The richness of the history of political thought is astonishing; the most obvious might be ideas related to social choice or principles of governance. (A denses while also high-quality overview is offered by this podcast series History Of Ideas.) The crux in making the depth of political thought available and relevant to AI alignment is formalization, which seems extremely undersupplied in current academia for very similar reasons as I’ve argued above.
Management and organizational theory, Institutional economics and Institutional design
Has things to say about e.g. interfaces (read this to get a gist for why I think interfaces are interesting for AI alignment); delegation ( e.g. Organizations and Markets by Herbert SImon; (potentially) the ontology form forms and (the relevant) agent boundaries (e.g. The secret to social forms has been in institutional economics all along?)
Talks for example about desiderata for institutions like robustness (e.g. here), or about how to understand and deal with institutional path-dependencies (e.g. here).
Pragmatically reliable alignment
[taken from On purpose (footnotes); sharing this here because I want to be able to link to this extract specifically]
AI safety-relevant side note: The idea that translations of meaning need only be sufficiently reliable in order to be reliably useful might provide an interesting avenue for AI safety research.
Language works, evidenced by the striking success of human civilisations made possible through advanced coordination which in return requires advanced communication. (Sure, humans miscommunicate what feels like a whole lot, but in the bigger scheme of things, we still appear to be pretty damn good at this communication thing.)
Notably, language works without there being theoretically air-tight proofs that map meanings on words.
Right there, we have an empirical case study of a symbolic system that functions on a (merely) pragmatically reliable regime. We can use it to inform our priors on how well this regime might work in other systems, such as AI, and how and why it tends to fail.
One might argue that a pragmatically reliable alignment isn’t enough—not given the sheer optimization power of the systems we are talking about. Maybe that is true; maybe we do need more certainty than pragmatism can provide. Nevertheless, I believe that there are sufficient reasons for why this is an avenue worth exploring further.