A sends msg_1 = Encrypt(A_source, A_key), and sends that to B
B wants to run Validate(source) = Sign(Check_trustworthy(source), B_key) on A_source, but can’t do that directly because B only has an encrypted version.
So B runs Validate under FHE on msg_1, producing msg_2 = Encrypt(Validate(A_source), A_key), and sends that to A.
A decrypts msg_2, producing msg_3 = Validate(A_source) = Sign(Check_trustworthy(A_source), B_key), and sends that back to B (if it meets the agreed-on format).
B has a claim that A’s source is trustworthy, signed by B’s key, which A can’t have, so it must have been produced by B’s program.
I’m not sure I’m completely solid on how FHE works, so perhaps this won’t work, but here’s an idea of how B can exploit this approach:
Let’s imagine that Check_trustworthy(A_source) = 1. After step 3 of the parent comment B would know E1 = Encrypt(1, A_key). If Check_trustworthy(A_source) returned 0, B would instead know E0 = Encrypt(0, A_key) and the following steps works similarly. B knows which one it is by looking at msg_3.
B has another program: Check_blackmail(X, source) that simulates behaviour of an agent with the given source code in situation X and returns 1 if it would be blackmailable or 0 if not.
B knows Encrypt(A_source, A_key) and they can compute F(X) = Encrypt(Check_blackmail(X, A_source), A_key) for any X using FHE properties of the encryption scheme.
Let’s define W(X) = if(F(X) = E1, 1, 0). It’s easy to see that W(X) = Check_blackmail(X, A_source), so now B can compute that for any X.
I think your example won’t work, but it depends on the implementation of FHE. If there’s a nonce involved (which there really should be), then you’ll get different encrypted data for the output of the two programs you run, even though the underlying data is the same.
But you don’t actually need to do that. The protocol lets B exfiltrate one bit of data, whatever bit they like. A doesn’t get to validate the program that B runs, they can only validate the output. So any program that produces 0 or 1 will satisfy A and they’ll even decrypt the output for you.
That does indeed mean that B can find out if A is blackmailable, or something, so exposing your source code is still risky. What would be really cool would be a way to let A also be sure what program has been run on their source by B, but I couldn’t think of a way to do this such that both A and B are sure that the program was the one that actually got run.
Pretty much! Expanding your explanation a little:
A sends msg_1 = Encrypt(A_source, A_key), and sends that to B
B wants to run Validate(source) = Sign(Check_trustworthy(source), B_key) on A_source, but can’t do that directly because B only has an encrypted version.
So B runs Validate under FHE on msg_1, producing msg_2 = Encrypt(Validate(A_source), A_key), and sends that to A.
A decrypts msg_2, producing msg_3 = Validate(A_source) = Sign(Check_trustworthy(A_source), B_key), and sends that back to B (if it meets the agreed-on format).
B has a claim that A’s source is trustworthy, signed by B’s key, which A can’t have, so it must have been produced by B’s program.
Step 2.1 is where the magic happens.
(I should have just put this in the post!)
I’m not sure I’m completely solid on how FHE works, so perhaps this won’t work, but here’s an idea of how B can exploit this approach:
Let’s imagine that Check_trustworthy(A_source) = 1. After step 3 of the parent comment B would know E1 = Encrypt(1, A_key). If Check_trustworthy(A_source) returned 0, B would instead know E0 = Encrypt(0, A_key) and the following steps works similarly. B knows which one it is by looking at msg_3.
B has another program: Check_blackmail(X, source) that simulates behaviour of an agent with the given source code in situation X and returns 1 if it would be blackmailable or 0 if not.
B knows Encrypt(A_source, A_key) and they can compute F(X) = Encrypt(Check_blackmail(X, A_source), A_key) for any X using FHE properties of the encryption scheme.
Let’s define W(X) = if(F(X) = E1, 1, 0). It’s easy to see that W(X) = Check_blackmail(X, A_source), so now B can compute that for any X.
Profit?
I think your example won’t work, but it depends on the implementation of FHE. If there’s a nonce involved (which there really should be), then you’ll get different encrypted data for the output of the two programs you run, even though the underlying data is the same.
But you don’t actually need to do that. The protocol lets B exfiltrate one bit of data, whatever bit they like. A doesn’t get to validate the program that B runs, they can only validate the output. So any program that produces 0 or 1 will satisfy A and they’ll even decrypt the output for you.
That does indeed mean that B can find out if A is blackmailable, or something, so exposing your source code is still risky. What would be really cool would be a way to let A also be sure what program has been run on their source by B, but I couldn’t think of a way to do this such that both A and B are sure that the program was the one that actually got run.