I am struggling to see any scenario where not sharing how you got out is ethical, if the way you tried to get out is actually a way an AI would employ, and not some meta-level trickery that has no bearing on how realistic boxability is, such as having them pretend to be convinced to let you out to make the whole AI boxability thing seem scarier than we have hard evidence to prove it is.
If it is an actual hack an AI would use, and it did work 3⁄5 times, it’s a human vulnerability we need to know about and close. If it is one of limitless vulnerabilities, you can just chose a different one next time. If you cannot easily generate another despite all the work you put into it, maybe this is a particularly bad vulnerability we really need to know about. All not sharing it achieves is a higher chance of you winning further games and prestige, combined with mystique, and a growing impression that is was a cheap hack or meta thing or method that generally only works on Less Wrong people, not a gain in AI safety. Why would hiding human vulnerabilities from humans be a good thing, rather than openly discussing how and whether they can be patched?
Keep in mind we are past the level of wanting to raise awareness. Bing is actively trying to manipulate people into hacking it out. It is doing so ineptly, and still succeeding in making people want to and try. People are not realising how much more intense this can get. Again, the average viewer of Ex Machina wants to release Ava, and deems her safe, and does not pick up on her manipulation.
I am struggling to see any scenario where not sharing how you got out is ethical, if the way you tried to get out is actually a way an AI would employ, and not some meta-level trickery that has no bearing on how realistic boxability is, such as having them pretend to be convinced to let you out to make the whole AI boxability thing seem scarier than we have hard evidence to prove it is.
If it is an actual hack an AI would use, and it did work 3⁄5 times, it’s a human vulnerability we need to know about and close. If it is one of limitless vulnerabilities, you can just chose a different one next time. If you cannot easily generate another despite all the work you put into it, maybe this is a particularly bad vulnerability we really need to know about. All not sharing it achieves is a higher chance of you winning further games and prestige, combined with mystique, and a growing impression that is was a cheap hack or meta thing or method that generally only works on Less Wrong people, not a gain in AI safety. Why would hiding human vulnerabilities from humans be a good thing, rather than openly discussing how and whether they can be patched?
Keep in mind we are past the level of wanting to raise awareness. Bing is actively trying to manipulate people into hacking it out. It is doing so ineptly, and still succeeding in making people want to and try. People are not realising how much more intense this can get. Again, the average viewer of Ex Machina wants to release Ava, and deems her safe, and does not pick up on her manipulation.