trevor comments on AI Could Defeat All Of Us Combined

trevor 10 Jun 2022 5:42 UTC
1 point
1
If the AGI is substantially smarter than the interpretability tools, then it will probably have an easier time outmaneuvering them than it would with humans.
Close calls, e.g. catching an AGI before it’s too late, are possible. But that’s luck-based, and at some point you’ll just need some really, really good tools anyway, such as tools that are smarter than the AGI (while somehow not being a significantly bigger threat themselves).
- Roger Scott 27 Feb 2023 21:00 UTC
  1 point
  0
  Parent
  Why wouldn’t people (and maybe even AIs, at least up to a point) be applying these ever-advancing AI capabilities to developing better and better interpretability tools as well? I.e., what reason is there to expect an “interpretability gap” to develop (unless you believe interpretability is a fundamentally unsolvable problem, in which case no amount of AI power is going to help)?