I don’t think humans (or anything else with bounded rationality) can have cleanly separated values, world models, and decision theory.
An independent decision theory uses them to guide our actions, and we use our epistemology to update our world models with new information. In this view, all these parts are all nice, independent, consistent things.
You can only update world models on evidence independently if you have enough cognitive capacity to examine every conceivable world model and update each one on the evidence. Due to the combinatorial explosion of interdependence of parts within each of those models, this is physically impossible. Failing that, we can only make updates via non-independent heuristics and try to avoid known systematic failures.
Every increment of reducing failure modes is expensive. They are often difficult to recognize, it is difficult to find better heuristics to replace them, and then there is the difficulty of putting them into actual practice. Many of the heuristics known to fail have no known replacements that are demonstrably better.
What’s worse, there seem to be underlying psychological attractors toward known bad processes. Some of these bad epistemic attractors seem to be actually good in many other ways that we can’t clearly identify or quantify, and definitely can’t yet adequately improve.
We individually, as a community, and as a species don’t know nearly enough yet to do much better. I’m not sure that without technological self-modification we can do much better, and that would have enormous risks itself.
I don’t think humans (or anything else with bounded rationality) can have cleanly separated values, world models, and decision theory.
You can only update world models on evidence independently if you have enough cognitive capacity to examine every conceivable world model and update each one on the evidence. Due to the combinatorial explosion of interdependence of parts within each of those models, this is physically impossible. Failing that, we can only make updates via non-independent heuristics and try to avoid known systematic failures.
Every increment of reducing failure modes is expensive. They are often difficult to recognize, it is difficult to find better heuristics to replace them, and then there is the difficulty of putting them into actual practice. Many of the heuristics known to fail have no known replacements that are demonstrably better.
What’s worse, there seem to be underlying psychological attractors toward known bad processes. Some of these bad epistemic attractors seem to be actually good in many other ways that we can’t clearly identify or quantify, and definitely can’t yet adequately improve.
We individually, as a community, and as a species don’t know nearly enough yet to do much better. I’m not sure that without technological self-modification we can do much better, and that would have enormous risks itself.