If you set up the system like that, you may run into the mentioned problems. It might be possible wrap both into a single model that is trained together.
An advanced system may reason about the joint effect, e.g. by employing fixed-point theorems and Logical Induction.
Three thoughts:
If you set up the system like that, you may run into the mentioned problems. It might be possible wrap both into a single model that is trained together.
An advanced system may reason about the joint effect, e.g. by employing fixed-point theorems and Logical Induction.
Steven Byrne’s [Intro to brain-like-AGI safety] 6. Big picture of motivation, decision-making, and RL models humans as having three components:
world model that is mainly trained by prediction error
a steering system that encodes preferences over world states
a system that learns how world model predictions relate to steering system feedback