Noosphere89 comments on Counterarguments to the basic AI x-risk case

Noosphere89 16 Oct 2022 17:35 UTC
3 points
1
I agree that boxing is at least a first step, so that it doesn’t get more compute, or worse, FOOM.

The tricky problem is we need to be able to train away a deceptive AI or forbid it entirely, without making it being obfuscated so that it looks trained away.

This is why we need to move beyond the black box paradigm, and why strong interpretability tools are necessary.