I’m relatively skeptical of the importance of this kind of boxing, because I don’t think side-channel attacks by incredibly sophisticated AI are likely to be an important part of the problem. Instead I expect AI systems to be broadly deployed in ways that let them cause lots of trouble (or even used internally within the AI firm in ways that cause lots of trouble) well before an AI used in an incredibly limited way would be transformative.
If doing this kind of boxing, it seems very unlikely that the overhead of FHE would be acceptable. Even a factor of 10 seems like pushing it a whole lot. I expect the action would be in much more mundane forms of protection against side channels with orders of magnitude lower cost. (And e.g. conservatism about what you do with AI outputs and how much you allow your AI to pursue incomprehensible plans.)
I think unconditional security for FHE is wildly out of reach, that’s more ambitious than P vs NP which I’d hope is significantly harder than alignment. I think that “FHE that is applicable to massive compute-intensive applications” is also extremely difficult but less totally crazy.
Overall my guess would be that random tangentially relevant questions have better cost/benefit (e.g.: what’s the complexity of games between two teams of non-communicating provers?), but that hopefully we could give prizes for stuff that’s much more on target than either.
I’d guess that random investment in security or cryptography, even in ways totally undirected at AI, are likely also better bang for your buck as well.
Also it’s worth noting that the LW post in question is quite old; I was mostly amused by the thought of encryption protecting us from the data instead of the other way around, but this was before I’d really gotten into AI safety.
I’m relatively skeptical of the importance of this kind of boxing, because I don’t think side-channel attacks by incredibly sophisticated AI are likely to be an important part of the problem. Instead I expect AI systems to be broadly deployed in ways that let them cause lots of trouble (or even used internally within the AI firm in ways that cause lots of trouble) well before an AI used in an incredibly limited way would be transformative.
If doing this kind of boxing, it seems very unlikely that the overhead of FHE would be acceptable. Even a factor of 10 seems like pushing it a whole lot. I expect the action would be in much more mundane forms of protection against side channels with orders of magnitude lower cost. (And e.g. conservatism about what you do with AI outputs and how much you allow your AI to pursue incomprehensible plans.)
I think unconditional security for FHE is wildly out of reach, that’s more ambitious than P vs NP which I’d hope is significantly harder than alignment. I think that “FHE that is applicable to massive compute-intensive applications” is also extremely difficult but less totally crazy.
Overall my guess would be that random tangentially relevant questions have better cost/benefit (e.g.: what’s the complexity of games between two teams of non-communicating provers?), but that hopefully we could give prizes for stuff that’s much more on target than either.
I’d guess that random investment in security or cryptography, even in ways totally undirected at AI, are likely also better bang for your buck as well.
Also it’s worth noting that the LW post in question is quite old; I was mostly amused by the thought of encryption protecting us from the data instead of the other way around, but this was before I’d really gotten into AI safety.
This and this seem relevant.