I think another concrete example of a possible “goal agnostic system” is the tree search-based system I proposed here
Yup, a version of that could suffice at a glance, although I think fully satisfying the “bad behavior has negligible probability by default” requirement implies some extra constraints on the system’s modules. As you mentioned in the post, picking a bad evaluation function could go poorly, and (if I’m understanding the design correctly) there are many configurations for the other modules that could increase the difficulty of picking a sufficiently not-bad evaluation function. Also, the fact that the system is by default operating over world states isn’t necessarily a problem for goal agnosticism, but it does imply a different default than, say, a raw pretrained LLM alone.
Once we’re dealing with really powerful systems, introducing goal-agnosticism brings in an additional risk: accidental loss-of-control by the goal-agnostic system itself.
Yup to all of that. I do tend to put this under the “accidental misuse by humans” umbrella, though. It implies we’ve failed to narrow the goal agnostic system into an agent of the sort that doesn’t perform massive and irrevocable actions without being sufficiently sure ahead of time (and very likely going back and forth with humans).
In other words, the simulacrum (or entity more generally) we end up cobbling together from the goal agnostic foundation is almost certainly not going to be well-described as goal agnostic itself, even if the machinery executing it is. The value-add of goal agnosticism was in exposing extreme capability without exploding everything, and in using that capability to help us aim it (e.g. the core condition inference ability of predictors).
Yup, a version of that could suffice at a glance, although I think fully satisfying the “bad behavior has negligible probability by default” requirement implies some extra constraints on the system’s modules. As you mentioned in the post, picking a bad evaluation function could go poorly, and (if I’m understanding the design correctly) there are many configurations for the other modules that could increase the difficulty of picking a sufficiently not-bad evaluation function. Also, the fact that the system is by default operating over world states isn’t necessarily a problem for goal agnosticism, but it does imply a different default than, say, a raw pretrained LLM alone.
Yup to all of that. I do tend to put this under the “accidental misuse by humans” umbrella, though. It implies we’ve failed to narrow the goal agnostic system into an agent of the sort that doesn’t perform massive and irrevocable actions without being sufficiently sure ahead of time (and very likely going back and forth with humans).
In other words, the simulacrum (or entity more generally) we end up cobbling together from the goal agnostic foundation is almost certainly not going to be well-described as goal agnostic itself, even if the machinery executing it is. The value-add of goal agnosticism was in exposing extreme capability without exploding everything, and in using that capability to help us aim it (e.g. the core condition inference ability of predictors).