Mh, I thought perhaps you were going in this direction. A world where there’s a many-to-one mapping between prompts and output is plausibly a world where the visible mappings are just the tip of the iceberg. And if you then have the opportunity to iteratively filter out seemingly dangerous behaviour, that’s likely to just push the entire behaviour under the surface—still present, just not legible.
Mh, I thought perhaps you were going in this direction. A world where there’s a many-to-one mapping between prompts and output is plausibly a world where the visible mappings are just the tip of the iceberg. And if you then have the opportunity to iteratively filter out seemingly dangerous behaviour, that’s likely to just push the entire behaviour under the surface—still present, just not legible.