Thank you for the post. It got me thinking about human values not only as something inherent to a single human being but as something that is a property of humans interacting in a shared larger system (with quantities that you can call, e.g., “capital”). A while back, I was thinking about classes of alignment solutions around the idea of leaving the human brain untouched, but that doesn’t work as an AGI could intervene on the environment in many ways, and I discarded this line of thought.
Now, I have revisited the idea. I tried to generalize the idea of capital. Specifically, I was searching for a conserved or quasi-conserved quantity related to human value. Or, equivalently, a symmetry of the joint system. I have not found anything good yet, but at least I wanted to share the idea, such that others can comment on whether the direction may be worthwhile or not.
One idea was to look at causality chains (in the sense of Natural Abstraction: A can reach B) that go from humans to humans (treating a suitably bounded human as a node). If we can quantify the width (in terms of bandwidth, delay, I don’t know) of these causality chains, then we could limit the AGI to actions that don’t reduce these quantities.
Thank you for the post. It got me thinking about human values not only as something inherent to a single human being but as something that is a property of humans interacting in a shared larger system (with quantities that you can call, e.g., “capital”). A while back, I was thinking about classes of alignment solutions around the idea of leaving the human brain untouched, but that doesn’t work as an AGI could intervene on the environment in many ways, and I discarded this line of thought.
Now, I have revisited the idea. I tried to generalize the idea of capital. Specifically, I was searching for a conserved or quasi-conserved quantity related to human value. Or, equivalently, a symmetry of the joint system. I have not found anything good yet, but at least I wanted to share the idea, such that others can comment on whether the direction may be worthwhile or not.
One idea was to look at causality chains (in the sense of Natural Abstraction: A can reach B) that go from humans to humans (treating a suitably bounded human as a node). If we can quantify the width (in terms of bandwidth, delay, I don’t know) of these causality chains, then we could limit the AGI to actions that don’t reduce these quantities.