I don’t think I’ve grokked the “wrapper” concept sufficiently to have specific comments, but I find this idea very interesting and would really like to see a concrete description of an AI system that lacks the wrapper (or of a way to build one). Ideally in the form of a training story (per https://www.alignmentforum.org/posts/FDJnZt8Ks2djouQTZ/how-do-we-become-confident-in-the-safety-of-a-machine).
I don’t think I’ve grokked the “wrapper” concept sufficiently to have specific comments, but I find this idea very interesting and would really like to see a concrete description of an AI system that lacks the wrapper (or of a way to build one). Ideally in the form of a training story (per https://www.alignmentforum.org/posts/FDJnZt8Ks2djouQTZ/how-do-we-become-confident-in-the-safety-of-a-machine).