Removed the following text because it’s no longer necessary and it’s definitely not the only way to use boundaries.
Recap
If you’re unfamiliar with «boundaries», here’s a recap:
What I consider to be the main hypothesis for the agenda of (directly) applying «boundaries» to AI safety:most (if not all) instances of active harm from AI can be formally described as forceful violation of the ~objective (or ~intersubjective) causal separation between humans[1] and their environment. (For example, someone being murdered would be a violation of their physical ‘boundary’, and someone being unilaterally mind-controlled would be a violation of their informational ‘boundary’.)
What I consider to be the ultimate goal: To create safety by formally and ~objectively specifying «boundaries» and respect of «boundaries» as an outer alignment safety goal. I.e.: have AI systems respect the boundaries of humans[1].
What I consider to be the main premise: there exists some meaningful causal separation between humans[1] and their environment that can be observed externally.
Updates thread for this post
Removed the following text because it’s no longer necessary and it’s definitely not the only way to use boundaries.
Recap
If you’re unfamiliar with «boundaries», here’s a recap:
What I consider to be the main hypothesis for the agenda of (directly) applying «boundaries» to AI safety: most (if not all) instances of active harm from AI can be formally described as forceful violation of the ~objective (or ~intersubjective) causal separation between humans[1] and their environment. (For example, someone being murdered would be a violation of their physical ‘boundary’, and someone being unilaterally mind-controlled would be a violation of their informational ‘boundary’.)
What I consider to be the ultimate goal: To create safety by formally and ~objectively specifying «boundaries» and respect of «boundaries» as an outer alignment safety goal. I.e.: have AI systems respect the boundaries of humans[1].
What I consider to be the main premise: there exists some meaningful causal separation between humans[1] and their environment that can be observed externally.
Work by other researchers: Davidad is optimistic about this idea and hopes to use it in his Open Agency Architecture (OAA) safety paradigm. Prior work on the topic has also been done via «Boundaries» sequence (Andrew Critch) and Cartesian Frames (Scott Garrabrant). For more, see «Boundaries/Membranes» and AI safety compilation or the Boundaries / Membranes tag.
Simplified some wording
moved the section of Abram’s criticism to a comment