An idealized or fully correct agent’s behavior is too hard to predict (=implement) in a complex world. That’s why you introduce the heuristics: they are easier to calculate. Can’t that be used to also make them easier to predict by a third party?
Separately from this, the agent might learn or self-modify to have new heuristics. But what does the word “heuristic” mean here? What’s special about it that doesn’t apply to all self modifications and all learning models, if you can’t predict their behavior without actually running them?
An idealized or fully correct agent’s behavior is too hard to predict (=implement) in a complex world. That’s why you introduce the heuristics: they are easier to calculate. Can’t that be used to also make them easier to predict by a third party?
Separately from this, the agent might learn or self-modify to have new heuristics. But what does the word “heuristic” mean here? What’s special about it that doesn’t apply to all self modifications and all learning models, if you can’t predict their behavior without actually running them?
Possibly. we need to be closer to the implementation for this.