If I understand this paper correctly, then I can see parallels between the concepts that this paper suggests (Safety Specifications, World Model and Verifier) and the they way the different religions of the world have been functioning as frameworks to align humans.
The parallels here could be as follows...
Safety Specifications ~ Ethical values that the religion purports
World Model ~ Ontological philosophy of the religion
Verifier ~ Societal mechanisms (Law, Judiciary, Policing etc) based on the above.
While the Ontological philosophies and Ethical values of the different religions are generally well stated, the Societal mechanisms that verify adherence to these values can be complex if the societies are secular or comprise of multi-religious populations. In such situations, the usually, one religion dominates the verification systems.
Also, based on this comment below, it would appear that I am not too far off with this analogy. If this is a possible outcome of the ideas presented in the paper than again, this is seems like a shadow of how societies treat a “non-aligned” human by “transitioning them to safe mode” (aka judicial custody for example) and “disabling” (aka prison/isloation for example).
“You could also monitor the environment of the AI at runtime to look for signs that the world model is inaccurate in a certain situation, and if such signs are detected, transition the AI to a safe mode where it can be disabled.
I am curious if the authors think this analogy is valid or if its too far off?
If this analogy stands, then perhaps the various religions of the world already provide a set of “Safety Specifications” and “World Models” that can help test this thesis.
If I understand this paper correctly, then I can see parallels between the concepts that this paper suggests (Safety Specifications, World Model and Verifier) and the they way the different religions of the world have been functioning as frameworks to align humans.
The parallels here could be as follows...
Safety Specifications ~ Ethical values that the religion purports
World Model ~ Ontological philosophy of the religion
Verifier ~ Societal mechanisms (Law, Judiciary, Policing etc) based on the above.
While the Ontological philosophies and Ethical values of the different religions are generally well stated, the Societal mechanisms that verify adherence to these values can be complex if the societies are secular or comprise of multi-religious populations. In such situations, the usually, one religion dominates the verification systems.
Also, based on this comment below, it would appear that I am not too far off with this analogy. If this is a possible outcome of the ideas presented in the paper than again, this is seems like a shadow of how societies treat a “non-aligned” human by “transitioning them to safe mode” (aka judicial custody for example) and “disabling” (aka prison/isloation for example).
I am curious if the authors think this analogy is valid or if its too far off?
If this analogy stands, then perhaps the various religions of the world already provide a set of “Safety Specifications” and “World Models” that can help test this thesis.