Alex Lawsen comments on alexrjl’s Shortform

Alex Lawsen 8 Sep 2022 6:07 UTC
2 points
0
I think currently nothing (which is why I ended up writing that I regretted the sensationalist framing). However I expect that the very strong default of any methods to use chain of thought to monitor/steer/interpret systems being that they end up providing exactly that selection pressure, and I’m skeptical about preventing this.
- Emrik 8 Sep 2022 18:09 UTC
  1 point
  0
  Parent
  Mh, I thought perhaps you were going in this direction. A world where there’s a many-to-one mapping between prompts and output is plausibly a world where the visible mappings are just the tip of the iceberg. And if you then have the opportunity to iteratively filter out seemingly dangerous behaviour, that’s likely to just push the entire behaviour under the surface—still present, just not legible.