it needs the ability to recursively handle instrumental subgoals anyway, so why have a whole extra different kind of goal at the top?
I’m picturing this somewhat differently. I imagine each goal as a node in a directed graph, with zero or more arrows pointing toward its supergoal(s). Under this construction, we have a natural separation between instrumental goals (nodes with at least one outgoing arrow) and terminal goals (nodes with no outgoing arrows), even though they’re not fundamentally different “types” of things.
So the question of “how do you build an agent with no terminal goals?” translates to “how do you build a directed graph with no dead ends?” And obviously the solution is that the graph must contain cycles (and specifically nothing but cycles). In the simplest case, you get an agent that wants to do A in order to do B, and wants to do B in order to do A.
I suspect this kind of structure shouldn’t occur in humans because we’re naturally born with certain base drives. So even if our goal graphs have cycles, the subgoals still lead to a dead end somewhere. E.g. when I play video games, I want to kill monsters to get experience, and I want to get experience so I can get better at killing monsters, but ultimately it’s grounded in some basic desire for achievement or something.
However with an artificial mind, we may be able to tweak its goal graph directly and prune off all the dead ends, or link them back into other nodes.
(epistemic status: unsure if this idea actually has merit or if I’m taking the graph metaphor too far)
Yeah, this is the case I’m thinking about. I think on longer outputs it will be possible for the AI to use a more nebulous sense of “style”, rather than only relying on specific features like “using X secret phrase” (which are presumably easier to redact away).
I think this could be countered by paraphrasing at a higher level—i.e. rewriting entire paragraphs from scratch rather than just rewording sentences. I’m noting this down as a potential topic if I have time to do another project.
Thanks for the discussion, it’s helped me clear up my thoughts on this topic.