Chipmonk comments on Formalizing «Boundaries» with Markov blankets

Chipmonk 9 Dec 2023 14:16 UTC
1 point
0
Updates thread for this post
- Chipmonk 2 Jan 2024 22:48 UTC
  1 point
  0
  Parent
  Removed the following text because it’s no longer necessary and it’s definitely not the only way to use boundaries.
  Recap
  If you’re unfamiliar with «boundaries», here’s a recap:
  What I consider to be the main hypothesis for the agenda of (directly) applying «boundaries» to AI safety: most (if not all) instances of active harm from AI can be formally described as forceful violation of the ~objective (or ~intersubjective) causal separation between humans^[1] and their environment. (For example, someone being murdered would be a violation of their physical ‘boundary’, and someone being unilaterally mind-controlled would be a violation of their informational ‘boundary’.)
  What I consider to be the ultimate goal: To create safety by formally and ~objectively specifying «boundaries» and respect of «boundaries» as an outer alignment safety goal. I.e.: have AI systems respect the boundaries of humans^[1].
  What I consider to be the main premise: there exists some meaningful causal separation between humans^[1] and their environment that can be observed externally.
  Work by other researchers: Davidad is optimistic about this idea and hopes to use it in his Open Agency Architecture (OAA) safety paradigm. Prior work on the topic has also been done via «Boundaries» sequence (Andrew Critch) and Cartesian Frames (Scott Garrabrant). For more, see «Boundaries/Membranes» and AI safety compilation or the Boundaries / Membranes tag.
- Chipmonk 9 Dec 2023 14:17 UTC
  1 point
  0
  Parent
  - Simplified some wording
  - moved the section of Abram’s criticism to a comment

Chipmonk comments on Formalizing «Boundaries» with Markov blankets

Recap