A lot of what you write here seems related to my notion of Turing Reinforcement Learning. In Turing RL we consider an AI comprising of a “core” RL agent and an “envelope” which is a computer on which the core can run programs (somewhat similarly to neural Turing machines). From the point of the view of the core, the envelope is a component of its environment (in addition to its usual I/O), about which it has somewhat stronger priors than about the rest. Such a system learns how to make optimal use of the envelope’s computing resources. Your “boundary” corresponds to the core, which is the immutable part of the algorithm that produces everything else. Regarding the “justification” of why a particular core algorithm is correct, the justification should come from regret bounds we prove about this algorithm w.r.t. some prior over incomplete models. Incomplete models are the solution to “even if you could obtain a perfect model of your world and beings like you, you wouldn’t be able to fit it inside your own head”. Instead of obtaining a perfect model, the agent learns all patterns (incomplete models) in the world that it can fit into its head, and exploits these patterns for gain. More precisely, in Turing RL the agent starts with some small class of patterns that the core can fit into its head, and bootstraps from those to a larger class of patterns, accounting for a cost-benefit analysis of resource use. This way, the regret bound satisfied by the core algorithm should lead to even stronger guarantees for the system as a whole (for example this).
A lot of what you write here seems related to my notion of Turing Reinforcement Learning. In Turing RL we consider an AI comprising of a “core” RL agent and an “envelope” which is a computer on which the core can run programs (somewhat similarly to neural Turing machines). From the point of the view of the core, the envelope is a component of its environment (in addition to its usual I/O), about which it has somewhat stronger priors than about the rest. Such a system learns how to make optimal use of the envelope’s computing resources. Your “boundary” corresponds to the core, which is the immutable part of the algorithm that produces everything else. Regarding the “justification” of why a particular core algorithm is correct, the justification should come from regret bounds we prove about this algorithm w.r.t. some prior over incomplete models. Incomplete models are the solution to “even if you could obtain a perfect model of your world and beings like you, you wouldn’t be able to fit it inside your own head”. Instead of obtaining a perfect model, the agent learns all patterns (incomplete models) in the world that it can fit into its head, and exploits these patterns for gain. More precisely, in Turing RL the agent starts with some small class of patterns that the core can fit into its head, and bootstraps from those to a larger class of patterns, accounting for a cost-benefit analysis of resource use. This way, the regret bound satisfied by the core algorithm should lead to even stronger guarantees for the system as a whole (for example this).