After some initial exploration I considered only a single combination of qualitative traits (No/Mint/Adequate/[‘Eerie Silence’], though I think it wouldn’t have mattered if I chose something else) in order to study the quantitative variables without distractions.
Since Murphy’s constant had the biggest effect, I first chose an approximation for the effect of Murphy’s Constant (initially a parabola), then divided the ZPPG data by my prediction for Murphy’s constant to get the effects of another variable (in this case, the local value of pi) to show up better. And so on, going back to refine my previously guessed functions as the noise from other variables cleared up.
As it turned out, this approach was unreasonably effective as the large majority of the variation (at least for the traits I ended up studying - see below) seems to be accounted for by multiplicative factors, each factor only taking into account one of the traits or variables.
Murphy’s constant:
Cubic (I tried to get it to fit some kind of exponential, or even logistic function, because I had a headcanon explanation of something like that a higher value causes problems at a higher rate and the individual problems would multiply together before subtracting from nominal. (Or something.) But cubic fits better.) It visually looks like it’s inflecting near the extreme values of the data (not checked quantitatively) so maybe it’s a (cubic) spline.
Local Value of Pi:
Piecewise linear, peaking around 3.15, same slope on either side I think. I tried to fit a sine to it first, similar reasons as with Murphy and exponentials.
Latitude:
Piecewise constant, lower value if between −36 and 36.
Longitude:
This one seems to be a sine, though not literally sin(x) - displaced vertically and horizontally. I briefly experimented to see if I could get a better fit substituting the local value of pi for our boring old conventional value, didn’t seem to work, but maybe I implemented that wrong.
Shortitude:
Another piecewise constant. Lower value if greater than 45. Unlike latitude, this one is not symmetrical—it only penalizes in the positive direction.
Deltitude:
I found no effect.
Traits:
I only considered traits that seemed relatively promising from my initial exploration (really just what their max value was and how many tries they needed to get it): No or EXTREMELY, Mint, Burning or Copper, (any Feng Shui) and [‘Eerie Silence’] or [‘Otherworldly Skittering’].
All traits tested seemed to me to have a constant multiplier.
Values in my current predictor (may not have been tested on all the relevant data, and significant digits shown are not justified):
Extremely (relative to No): 0.94301
Burning, Copper (relative to Mint): 1.0429, 0.9224
Exceptional, Disharmonious (relative to Adequate): 1.0508,0.8403 - edit: I think these may actually be 1.05, 0.84 exactly.
Skittering (relative to Silience): 0.960248
Residual errors typically within 1%, relatively rarely above 1.5%. There could be other things I missed (e.g. non-multiplicative interactions) to account for the rest, or afaik it could be random. Since I haven’t studied other traits than the ones listed, clues could also be lurking in those traits.
Using my overall predictor, my expected values for the 12 sites listed above are about:
Given my error bars in the (part that I actually used of the) data set I’m pretty comfortable with this selection (in terms of building instead of folding, not necessarily that these are the best choices), though I should maybe check to see if any is right next to one of those cutoffs (latitude/shortitude) and I should also maybe be wary of extrapolating to very low values of Murphy’s Constant. (e.g. 94304, 23565, 96286)
edited to add: aphyer’s third post (which preceded this comment) has the same sort of conclusion and some similar approximations (though mine seem to be more precise), and unnamed also mentioned that it appears to be a bunch of things multiplied together. All of aphyer’s posts have a lot of interesting general findings as well.
edited to also add: the second derivative of a cubic is a linear function. The cubic having zero second derivative at two different points is thus impossible unless the linear function is zero, which happens only when the first two coefficients of the cubic are zero (so the cubic is linear). So my mumbling about inflection points at both ends is complete nonsense… however, it does have close to zero second derivative near 0, so maybe it is a spline where we are seeing one end of it where the second derivative is set to 0 at that end. todo: see what happens if I actually set that to 0
edited again: see below comment—can actually set both linear and quadratic terms to 0
I think that the overall multiplication factor from Murphy’s constant is 1-0.004*(Murphy’s constant)^3 - this appears close enough, I don’t think I need linear or quadratic terms.
On Pi:
I think the multiplication factor is probably 1-10*abs((local Value of Pi)-3.15) - again, appears close enough, and I don’t think I need a quadratic term.
Regarding aphyer saying cubic doesn’t fit Murphy’s, and both unnamed and aphyer saying Pi needs a quadratic term, I am beginning to suspect that maybe they are modeling these multipliers in a somewhat different way, perhaps 1/x from the way I am modeling it? (I am modeling each function as a multiplicative factor that multiplies together with the others to get the end result).
edited to add: aphyer’s formulas predict the log; my formulas predict the output, then I take the log after if I want to (e.g. to set a scaling factor). I think this is likely the source of the discrepancy. If predicting the log, put each of these formulas in a log (e.g. log(1-10*abs((local Value of Pi)-3.15))).
Thanks for giving us this puzzle, abstractapplic.
My answer (possibly to be refined later, but I’ll check other’s responses and aphyer’s posts after posting this):
id’s: 96286,9344,107278,68204,905,23565,8415,83512,62718,42742,16423,94304
observations and approach used:
After some initial exploration I considered only a single combination of qualitative traits (No/Mint/Adequate/[‘Eerie Silence’], though I think it wouldn’t have mattered if I chose something else) in order to study the quantitative variables without distractions.
Since Murphy’s constant had the biggest effect, I first chose an approximation for the effect of Murphy’s Constant (initially a parabola), then divided the ZPPG data by my prediction for Murphy’s constant to get the effects of another variable (in this case, the local value of pi) to show up better. And so on, going back to refine my previously guessed functions as the noise from other variables cleared up.
As it turned out, this approach was unreasonably effective as the large majority of the variation (at least for the traits I ended up studying - see below) seems to be accounted for by multiplicative factors, each factor only taking into account one of the traits or variables.
Murphy’s constant:
Cubic (I tried to get it to fit some kind of exponential, or even logistic function, because I had a headcanon explanation of something like that a higher value causes problems at a higher rate and the individual problems would multiply together before subtracting from nominal. (Or something.) But cubic fits better.)
It visually looks like it’s inflecting near the extreme values of the data (not checked quantitatively) so maybe it’s a (cubic) spline.Local Value of Pi:
Piecewise linear, peaking around 3.15, same slope on either side I think. I tried to fit a sine to it first, similar reasons as with Murphy and exponentials.
Latitude:
Piecewise constant, lower value if between −36 and 36.
Longitude:
This one seems to be a sine, though not literally sin(x) - displaced vertically and horizontally. I briefly experimented to see if I could get a better fit substituting the local value of pi for our boring old conventional value, didn’t seem to work, but maybe I implemented that wrong.
Shortitude:
Another piecewise constant. Lower value if greater than 45. Unlike latitude, this one is not symmetrical—it only penalizes in the positive direction.
Deltitude:
I found no effect.
Traits:
I only considered traits that seemed relatively promising from my initial exploration (really just what their max value was and how many tries they needed to get it): No or EXTREMELY, Mint, Burning or Copper, (any Feng Shui) and [‘Eerie Silence’] or [‘Otherworldly Skittering’].
All traits tested seemed to me to have a constant multiplier.
Values in my current predictor (may not have been tested on all the relevant data, and significant digits shown are not justified):
Extremely (relative to No): 0.94301
Burning, Copper (relative to Mint): 1.0429, 0.9224
Exceptional, Disharmonious (relative to Adequate): 1.0508,0.8403 - edit: I think these may actually be 1.05, 0.84 exactly.
Skittering (relative to Silience): 0.960248
Residual errors typically within 1%, relatively rarely above 1.5%. There could be other things I missed (e.g. non-multiplicative interactions) to account for the rest, or afaik it could be random. Since I haven’t studied other traits than the ones listed, clues could also be lurking in those traits.
Using my overall predictor, my expected values for the 12 sites listed above are about:
96286: 112.3, 9344: 110.0, 107278: 109.3, 68204: 109.2, 905: 109.0, 23565: 108.1, 8415: 106.5, 83512: 106.0, 62718: 105.9 ,42742: 105.7, 16423: 105.4, 94304: 105.2
Given my error bars in the (part that I actually used of the) data set I’m pretty comfortable with this selection (in terms of building instead of folding, not necessarily that these are the best choices), though I should maybe check to see if any is right next to one of those cutoffs (latitude/shortitude) and I should also maybe be wary of extrapolating to very low values of Murphy’s Constant. (e.g. 94304, 23565, 96286)
edited to add: aphyer’s third post (which preceded this comment) has the same sort of conclusion and some similar approximations (though mine seem to be more precise), and unnamed also mentioned that it appears to be a bunch of things multiplied together. All of aphyer’s posts have a lot of interesting general findings as well.
edited to also add: the second derivative of a cubic is a linear function. The cubic having zero second derivative at two different points is thus impossible unless the linear function is zero, which happens only when the first two coefficients of the cubic are zero (so the cubic is linear). So my mumbling about inflection points at both ends is complete nonsense… however, it does have close to zero second derivative near 0, so maybe it is a spline where we are seeing one end of it where the second derivative is set to 0 at that end. todo: see what happens if I actually set that to 0
edited again: see below comment—can actually set both linear and quadratic terms to 0
update:
on Murphy:
I think that the overall multiplication factor from Murphy’s constant is 1-0.004*(Murphy’s constant)^3 - this appears close enough, I don’t think I need linear or quadratic terms.
On Pi:
I think the multiplication factor is probably 1-10*abs((local Value of Pi)-3.15) - again, appears close enough, and I don’t think I need a quadratic term.
Regarding aphyer saying cubic doesn’t fit Murphy’s, and both unnamed and aphyer saying Pi needs a quadratic term, I am beginning to suspect that
maybe they are modeling these multipliers in a somewhat different way, perhaps 1/x from the way I am modeling it?(I am modeling each function as a multiplicative factor that multiplies together with the others to get the end result).edited to add: aphyer’s formulas predict the log; my formulas predict the output, then I take the log after if I want to (e.g. to set a scaling factor). I think this is likely the source of the discrepancy. If predicting the log, put each of these formulas in a log (e.g. log(1-10*abs((local Value of Pi)-3.15))).