The problem with this kind of test is that the subject’s awareness that a test is occurring skews the result. If you leave a piece of delicious-looking cake on my counter for a week, there is a high probability (>0.95) that I will eat it within a week. If you leave the exact same cake on my counter and say to me, “I want to see if you have the willpower to refrain from eating this cake,” I would say the probability of my eating the cake within a week drops to 0.01 or less.
The problem with this kind of test is that the subject’s awareness that a test is occurring skews the result. If you leave a piece of delicious-looking cake on my counter for a week, there is a high probability (>0.95) that I will eat it within a week. If you leave the exact same cake on my counter and say to me, “I want to see if you have the willpower to refrain from eating this cake,” I would say the probability of my eating the cake within a week drops to 0.01 or less.
Actually, I took that into account in considering this a level 1 benchmark. It’s not supposed to be very hard.