I’m surprised to hear you say that, since you write
Upfront, I want to clarify: I don’t believe or wish to claim that GSAI is a full or general panacea to AI risk.
I kinda think anything which is not a panacea is swiss cheese, that those are the only two options.
In a matter of what sort of portofolio can lay down slices of swiss cheese at what rate and with what uncorrelation. And I think in this way GSAI is antifragile to next year’s language models, which is why I can agree mostly with Zac’s talk and still work on GSAI (I don’t think he talks about my cruxes).
Specifically, I think the guarantees of each module and the guarantees of each pipe (connecting the modules) isolate/restrict the error to the world-model gap or the world-spec gap, and I think the engineering problems of getting those guarantees are straightforward / not conceptual problems. Furthermore, I think the conceptual problems with reducing the world-spec gap below some threshold presented by Safeguarded’s TA1 are easier than the conceptual problems in alignment/safety/control.
I understood Nora as saying that GSAI in itself is not a swiss cheese approach. This is different from saying that [the overall portfolio of AI derisking approaches, one of which is GSAI] is not a swiss cheese approach.
I’m surprised to hear you say that, since you write
I kinda think anything which is not a panacea is swiss cheese, that those are the only two options.
In a matter of what sort of portofolio can lay down slices of swiss cheese at what rate and with what uncorrelation. And I think in this way GSAI is antifragile to next year’s language models, which is why I can agree mostly with Zac’s talk and still work on GSAI (I don’t think he talks about my cruxes).
Specifically, I think the guarantees of each module and the guarantees of each pipe (connecting the modules) isolate/restrict the error to the world-model gap or the world-spec gap, and I think the engineering problems of getting those guarantees are straightforward / not conceptual problems. Furthermore, I think the conceptual problems with reducing the world-spec gap below some threshold presented by Safeguarded’s TA1 are easier than the conceptual problems in alignment/safety/control.
I understood Nora as saying that GSAI in itself is not a swiss cheese approach. This is different from saying that [the overall portfolio of AI derisking approaches, one of which is GSAI] is not a swiss cheese approach.