Hello, I have some issue with the epistomology of the problem : my problem is that even if the process of training was giving the behavior we want, we would have no way to check the IA is working properly in practice. I try now to give more details : in the volt probleme, given the same information, let’s think of an IA that just as to answer the question “Is the diamon still in the volt ?”.
Something we can suppose is that, the set Y, from which we draw the labeled examples to train the IA (a set of technique for the thief), is not important : trying to increase its size it isn’t a solution (because there is always something that can be thought out of our imagination). We can in fact try to solve the problem relatively to Y. We consider then X the scenarios that the IA can understand given it was trained on Y. Then the only way to act on X\Y is to train on Y in a specific way (I think). So we need a link between X and Y that we can exploit. So we need to know what X looks like, but we can’t since its the goal. The only thing we could know is X’ the set of scenarios which could be imagined or understood by a human, even if that human could not label such the scenario. Since we don’t know if X = X’ by definition, there may always be some cases in which the IA understood how the thief did but we don’t. To me, the problem here is to have the IA giving us the information it has when the thief uses a technique in X’ and not X. Because in X\X’, there is nothing we know to help us guide the IA toward having a good behavior on this set. But it seems possible in X’, because we can imagine scenarios and so ways of guiding the IA. So the thing I don’t understand is why the counter-example with the thief using a secret property of transistors is a good counter-example ? To me, we are in the case were because the method is out of reach for the humans, we have no idea if the IA tells the truth or not, because we can’t be sure to train an IA to have a specific behavior on example we could not imagine. Moreover we can’t check if it says the truth, so how would we trust it ?.
Hello, I have some issue with the epistomology of the problem : my problem is that even if the process of training was giving the behavior we want, we would have no way to check the IA is working properly in practice.
I try now to give more details : in the volt probleme, given the same information, let’s think of an IA that just as to answer the question “Is the diamon still in the volt ?”.
Something we can suppose is that, the set Y, from which we draw the labeled examples to train the IA (a set of technique for the thief), is not important : trying to increase its size it isn’t a solution (because there is always something that can be thought out of our imagination). We can in fact try to solve the problem relatively to Y. We consider then X the scenarios that the IA can understand given it was trained on Y. Then the only way to act on X\Y is to train on Y in a specific way (I think). So we need a link between X and Y that we can exploit. So we need to know what X looks like, but we can’t since its the goal. The only thing we could know is X’ the set of scenarios which could be imagined or understood by a human, even if that human could not label such the scenario. Since we don’t know if X = X’ by definition, there may always be some cases in which the IA understood how the thief did but we don’t.
To me, the problem here is to have the IA giving us the information it has when the thief uses a technique in X’ and not X. Because in X\X’, there is nothing we know to help us guide the IA toward having a good behavior on this set. But it seems possible in X’, because we can imagine scenarios and so ways of guiding the IA.
So the thing I don’t understand is why the counter-example with the thief using a secret property of transistors is a good counter-example ? To me, we are in the case were because the method is out of reach for the humans, we have no idea if the IA tells the truth or not, because we can’t be sure to train an IA to have a specific behavior on example we could not imagine. Moreover we can’t check if it says the truth, so how would we trust it ?.
Thank you for reading