Bouncing off a comment upthread, I wrote a post here with some general thoughts on security of powerful systems. I also gave an argument that marginal transparency doesn’t translate into marginal security. I’ve mostly not figured out what its relation is to the OP, and below I’ll just say that with a few more words. Note that I’m a layman when it comes to the details of ML, so am speaking like a 101 undergraduate or something like that.
It sounds to me like the transparency work being published at Distill, and some of its longer term visions (as outlined in the OP), if successful, are going to substantially increase how well we understand ML systems, and how useful they are.
From the perspective of security that I discuss in the linked post, I feel like I don’t personally have a good enough understanding of what Chris and others think their work is, to be able to tell whether the transparency work being published in Distill will (as it progresses) make systems secure in the way I described in the link, and it’s plausibly a much further effort on top. Evan mentions the idea of turning an ML system into an understandable codebase the size of the Linux Kernel, which sounds breathtakingly ambitious and incredibly useful. Though, for example, it’s typically very hard to make a codebase secure when you’ve been handed a system that didn’t have security built in.
Relatedly, I don’t feel I have a good understanding of the folks at Clarity’s model of why (or whether) AI will not go well by default (to pick a concrete possibility: if we reach AGI by largely continuing the current trajectory of work that the field of ML is doing), where any big risks come from (to pick some clear possibilities: whether it’s broadly similar to Paul’s first model, second model, or neither), and what sort of technical work is required to prevent the bad outcomes. I’d be interested if Chris or anyone else working alongside him on Clarity feels like they can offer a crisp claim about how optimisation enters the system and what features of a transparent system mean that it’s findable and removable with an appropriate amount of resources (though I don’t think that’s an easy ask or anything).
Bouncing off a comment upthread, I wrote a post here with some general thoughts on security of powerful systems. I also gave an argument that marginal transparency doesn’t translate into marginal security. I’ve mostly not figured out what its relation is to the OP, and below I’ll just say that with a few more words. Note that I’m a layman when it comes to the details of ML, so am speaking like a 101 undergraduate or something like that.
It sounds to me like the transparency work being published at Distill, and some of its longer term visions (as outlined in the OP), if successful, are going to substantially increase how well we understand ML systems, and how useful they are.
From the perspective of security that I discuss in the linked post, I feel like I don’t personally have a good enough understanding of what Chris and others think their work is, to be able to tell whether the transparency work being published in Distill will (as it progresses) make systems secure in the way I described in the link, and it’s plausibly a much further effort on top. Evan mentions the idea of turning an ML system into an understandable codebase the size of the Linux Kernel, which sounds breathtakingly ambitious and incredibly useful. Though, for example, it’s typically very hard to make a codebase secure when you’ve been handed a system that didn’t have security built in.
Relatedly, I don’t feel I have a good understanding of the folks at Clarity’s model of why (or whether) AI will not go well by default (to pick a concrete possibility: if we reach AGI by largely continuing the current trajectory of work that the field of ML is doing), where any big risks come from (to pick some clear possibilities: whether it’s broadly similar to Paul’s first model, second model, or neither), and what sort of technical work is required to prevent the bad outcomes. I’d be interested if Chris or anyone else working alongside him on Clarity feels like they can offer a crisp claim about how optimisation enters the system and what features of a transparent system mean that it’s findable and removable with an appropriate amount of resources (though I don’t think that’s an easy ask or anything).