A problem is that
we don’t know specific goal representation (actual string in place of “A”),
we don’t know how to evaluate LLM output (in particular, how to check whether the plan suggested works for a goal),
we have a large (presumably infinite non-enumerable) set of behavior B we want to avoid,
we have explicit representation for some items in B, mentally understand a bit more, and don’t understand/know about other unwanted things.
A problem is that
we don’t know specific goal representation (actual string in place of “A”),
we don’t know how to evaluate LLM output (in particular, how to check whether the plan suggested works for a goal),
we have a large (presumably infinite non-enumerable) set of behavior B we want to avoid,
we have explicit representation for some items in B, mentally understand a bit more, and don’t understand/know about other unwanted things.