A systematic way for classifying AI safety work could use a matrix, where one dimension is the system level:
A monolithic AI system, e.g., a conversational LLM
A cyborg, human + AI(s)
A system of AIs with emergent qualities (e.g., https://numer.ai/, but in the future, we may see more systems like this, operating on a larger scope, up to fully automatic AI economy; or a swarm of CoEms automating science)
A human+AI group, community, or society (scale-free consideration, supports arbitrary fractal nestedness): collective intelligence
Design time: research into how the corresponding system should be designed (engineered, organised): considering its functional (“capability”, quality of decisions) properties, adversarial robustness (= misuse safety, memetic virus security), and security.
Manufacturing and deployment time: research into how to create the desired designs of systems successfully and safely:
AI training and monitoring of training runs.
Offline alignment of AIs during (or after) training.
AI strategy (= research into how to transition into the desirable civilisational state = design).
Designing upskilling and educational programs for people to become cyborgs is also here (= designing efficient procedures for manufacturing cyborgs out of people and AIs).
Operations time: ongoing (online) alignment of systems on all levels to each other, ongoing monitoring, inspection, anomaly detection, and governance.
Evolutionary time: research into how the (evolutionary lineages of) systems at the given level evolve long-term:
How the human psyche evolves when it is in a cyborg
How humans will evolve over generations as cyborgs
How groups, communities, and society evolve.
Designing feedback systems that don’t let systems “drift” into undesired state over evolutionary time.
Considering system property: property of flexibility of values (i.e., the property opposite of value lock-in, Riedel (2021)).
IMO, it (sometimes) makes sense to think about this separately from alignment per se. Systems could be perfectly aligned with each other but drift into undesirable states and not even notice this if they don’t have proper feedback loops and procedures for reflection.
There would be 5*4 = 20 slots in this matrix, and almost all of them have something interesting to research and design, and none of them is “too early” to consider.
There is still some AI safety work (research) that doesn’t fit this matrix, e.g., org design, infosec, alignment, etc. of AI labs (= the system that designs, manufactures, operates, and evolves monolithic AI systems and systems of AIs).
A systematic way for classifying AI safety work could use a matrix, where one dimension is the system level:
A monolithic AI system, e.g., a conversational LLM
A cyborg, human + AI(s)
A system of AIs with emergent qualities (e.g., https://numer.ai/, but in the future, we may see more systems like this, operating on a larger scope, up to fully automatic AI economy; or a swarm of CoEms automating science)
A human+AI group, community, or society (scale-free consideration, supports arbitrary fractal nestedness): collective intelligence
The whole civilisation, e.g., Open Agency Architecture
Another dimension is the “time” of consideration:
Design time: research into how the corresponding system should be designed (engineered, organised): considering its functional (“capability”, quality of decisions) properties, adversarial robustness (= misuse safety, memetic virus security), and security.
Manufacturing and deployment time: research into how to create the desired designs of systems successfully and safely:
AI training and monitoring of training runs.
Offline alignment of AIs during (or after) training.
AI strategy (= research into how to transition into the desirable civilisational state = design).
Designing upskilling and educational programs for people to become cyborgs is also here (= designing efficient procedures for manufacturing cyborgs out of people and AIs).
Operations time: ongoing (online) alignment of systems on all levels to each other, ongoing monitoring, inspection, anomaly detection, and governance.
Evolutionary time: research into how the (evolutionary lineages of) systems at the given level evolve long-term:
How the human psyche evolves when it is in a cyborg
How humans will evolve over generations as cyborgs
How groups, communities, and society evolve.
Designing feedback systems that don’t let systems “drift” into undesired state over evolutionary time.
Considering system property: property of flexibility of values (i.e., the property opposite of value lock-in, Riedel (2021)).
IMO, it (sometimes) makes sense to think about this separately from alignment per se. Systems could be perfectly aligned with each other but drift into undesirable states and not even notice this if they don’t have proper feedback loops and procedures for reflection.
There would be 5*4 = 20 slots in this matrix, and almost all of them have something interesting to research and design, and none of them is “too early” to consider.
There is still some AI safety work (research) that doesn’t fit this matrix, e.g., org design, infosec, alignment, etc. of AI labs (= the system that designs, manufactures, operates, and evolves monolithic AI systems and systems of AIs).