kpreid comments on You Are Not Measuring What You Think You Are Measuring

kpreid 24 Sep 2022 16:49 UTC
5 points
1

Next, high and low settings are chosen for each X factor, and all possible combinations of settings are arranged in a hypercube. Instead of experimenting on one factor at a time with enough repetitions to build up statistical significance, you can perform just a few repetitions at each corner of the hypercube.

This concept reminds me of the problem of planning software tests: I want to exercise all behaviors of the code under test, but actually testing the cartesian product of input conditions often means writing a test that is so generic it duplicates the code under test (unless there is a more naïve algorithm that the test can use), and is hard to evaluate for its own correctness. Instead, I end up writing a selected set of cases intended to cover interesting combinations of inputs — but then the problem is thinking of which inputs are worth testing. When bugs are discovered, they may be combinations of inputs that were not thought of (or they may be parameters we didn’t think of testing, i.e. implicitly put in the “control” category, or specific edge-case values of parameters we did test).

An alternative to hand-written testing of specific cases is to write a property test, like “is input A + input B always ≤ output C, under a wide-ranging selection of inputs”. This feels analogous to measuring correlations in that hypercube — and the part of the actual output that you’re not checking precisely (in my example, the value A + B − C) is the part of the test that is “noise” rather than “control” because we’ve decided it is more practical to ignore that information than to control it (write a test that contains or computes the exact answer to expect).