Initial stuff that hasn’t turned out to be very important:
My immediate thought was that there are likely to be different types of entities we are classifying, so my initial approach was to look at the distributions to try to find clumps.
All of the 5 characteristics (Corporeality, Sliminess, Intellect, Hostility, Grotesqueness) have bimodal distributions with one peak around 15-30 (position varies) and the other peak at around 65-85 (position varies. Overall, the shapes are very similar looking. The trough between the peaks is not very deep, plenty of intermediate values.
All of these characteristics are correlated with each other.
Looking at sizes of bins for pairs of characteristics, again there appears to be two humps—but this time in the 2d plot only. That is, there is a high/high hump and a low/low hump, but noticeably there does not appear to be, for example, a high-sliminess peak when restricting to low-corporality data points.
Again, the shape varies a bit between characteristic pairs but overall looks very similar.
Adding all characteristics together gets a deeper trough between the peaks, though still no clean separation.
Overall, it looks to me like there are two types, one with high values of all characteristics, and another with low values of all characteristics, but I don’t see any clear evidence for any other groupings so far.
Eyeballing the plots, it looks compatible with no relation between characteristics other than the high/low groupings. Have not checked this with actual math.
In order to get a cleaner separation between the high/low types, I used the following procedure to get a probability estimate for each data point being in the high/low type:
For each characteristic, sum up all the other characteristics (rather, subtract that characteristic from the total)
For each characteristic, classify each data point into pretty clearly low (<100 total), pretty clearly high (>300 total) or unclear based on the sum of all the other characteristics
obtain frequency distribution for the characteristic values for the points classified clearly low and high using the above steps for each characteristic
smooth in ad hoc manner
obtain odds ratio from ratio of high and low distributions, ad hoc adjustment for distortions caused by ad hoc smoothing
multiply odds ratios obtained for each characteristic and obtain probability from odds ratio
I think this gives cleaner separation, but still not super great imo, most points 99%+ likely to be in one type or the other, but still 2057 (out of 34374) are between 0.1 and 0.9 in my ad hoc estimator. Todo: look for some function to fit to the frequency distributions and redo with the function instead of ad hoc approach.
Likely classifications of our mansion’s ghosts:
low: A,B,D,E,G,H,I,J,M,N,O,Q,S,U,V,W
high: C,F,K,L,P,R,T
To actually solve the problem: I now proceeded to split the data based on exorcist group. Expecting high/low type to be relevant, I split the DD points by likely type (50% cutoff), and then tried some stuff for DD low including a linear regression. Did a couple graphs on the characteristics that seemed to matter (grotesqueness and hostility in this case) to confirm effects looked linear. So, then tried linear regression for DD high and got the same coefficients, within error bars. So then I thought, if it’s the same linear coefficients in both cases, I probably could have gotten them from the combined data for DD, don’t need to separate into high and low, and indeed linear regression on the combined DD data gave the same coefficients more or less.
Actually finding the answer:
So, then I did regression for the exorcist groups without splitting based on high/low type. (I did split after to check whether it mattered)
Results:
DD cost depends on Grotesqueness and to a lesser extent Hostility.
EE cost depends on all characteristics slightly, Sliminess then Intellect/Grotesqueness being the most important. Note: Grotesqueness less important, perhaps zero effect, for “high” type.
MM cost actually very slightly declines for higher values of all characteristics. (note: less effect for “high” type, possibly zero effect)
PP cost depends mainly on Sliminess. However, slight decline in cost with more Corporeality and increase with more of everything else.
SS cost depends primarily on Intellect. However, slight decline with Hostility and increase with Sliminess.
WW cost depends primarily on Hostility. However, everything else also has at least a slight effect, especially Sliminess and Grotesqueness.
Provisionally, I’m OK with just using the linear regression coefficients without the high/low split, though I will want to verify later if this was causing a problem (also need to verify linearity, only checked for DD low (and only for Grotesqueness and Hostility separately, not both together)).
Results:
Ghost | group with lowest estimate | estimated cost for that group
So that’s my provisional solution, and I will pay the extra 400sp one time fee so that Demon Destroyers can deal with ghosts D, G, J, V, W.
--Edit: whoops, missed most of this paragraph (other than the Demon Destroyers):
“Bad news! In addition to their (literally and figuratively) arcane rules about territory and prices, several of the exorcist groups have all-too-human arbitrary constraints: the Spectre Slayers and the Entity Eliminators hate each other to the point that hiring one will cause the other to refuse to work for you, the Poltergeist Pummelers are too busy to perform more than three exorcisms for you before the start of the social season, and the Demon Destroyers are from far enough away that – unless you eschew using them at all – they’ll charge a one-time 400sp fee just for showing up.”
will edit to fix! post edit: Actually my initial result is still compatible with that paragraph, it doesn’t involve the Entity Eliminators, and only uses the Phantom Pummelers 3 times. --
Not very confident in my solution (see things to verify above), and if it is indeed this simple it is an easier problem than I expected.
further edit (late July 15 2024): haven’t gotten around to checking those things and also my check of linearity, where I did check, binned the data and could be hiding all sorts of patterns.
Thanks abstractapplic! Initial analysis:
Initial stuff that hasn’t turned out to be very important:
My immediate thought was that there are likely to be different types of entities we are classifying, so my initial approach was to look at the distributions to try to find clumps.
All of the 5 characteristics (Corporeality, Sliminess, Intellect, Hostility, Grotesqueness) have bimodal distributions with one peak around 15-30 (position varies) and the other peak at around 65-85 (position varies. Overall, the shapes are very similar looking. The trough between the peaks is not very deep, plenty of intermediate values.
All of these characteristics are correlated with each other.
Looking at sizes of bins for pairs of characteristics, again there appears to be two humps—but this time in the 2d plot only. That is, there is a high/high hump and a low/low hump, but noticeably there does not appear to be, for example, a high-sliminess peak when restricting to low-corporality data points.
Again, the shape varies a bit between characteristic pairs but overall looks very similar.
Adding all characteristics together gets a deeper trough between the peaks, though still no clean separation.
Overall, it looks to me like there are two types, one with high values of all characteristics, and another with low values of all characteristics, but I don’t see any clear evidence for any other groupings so far.
Eyeballing the plots, it looks compatible with no relation between characteristics other than the high/low groupings. Have not checked this with actual math.
In order to get a cleaner separation between the high/low types, I used the following procedure to get a probability estimate for each data point being in the high/low type:
For each characteristic, sum up all the other characteristics (rather, subtract that characteristic from the total)
For each characteristic, classify each data point into pretty clearly low (<100 total), pretty clearly high (>300 total) or unclear based on the sum of all the other characteristics
obtain frequency distribution for the characteristic values for the points classified clearly low and high using the above steps for each characteristic
smooth in ad hoc manner
obtain odds ratio from ratio of high and low distributions, ad hoc adjustment for distortions caused by ad hoc smoothing
multiply odds ratios obtained for each characteristic and obtain probability from odds ratio
I think this gives cleaner separation, but still not super great imo, most points 99%+ likely to be in one type or the other, but still 2057 (out of 34374) are between 0.1 and 0.9 in my ad hoc estimator. Todo: look for some function to fit to the frequency distributions and redo with the function instead of ad hoc approach.
Likely classifications of our mansion’s ghosts:
low: A,B,D,E,G,H,I,J,M,N,O,Q,S,U,V,W
high: C,F,K,L,P,R,T
To actually solve the problem: I now proceeded to split the data based on exorcist group. Expecting high/low type to be relevant, I split the DD points by likely type (50% cutoff), and then tried some stuff for DD low including a linear regression. Did a couple graphs on the characteristics that seemed to matter (grotesqueness and hostility in this case) to confirm effects looked linear. So, then tried linear regression for DD high and got the same coefficients, within error bars. So then I thought, if it’s the same linear coefficients in both cases, I probably could have gotten them from the combined data for DD, don’t need to separate into high and low, and indeed linear regression on the combined DD data gave the same coefficients more or less.
Actually finding the answer:
So, then I did regression for the exorcist groups without splitting based on high/low type. (I did split after to check whether it mattered)
Results:
DD cost depends on Grotesqueness and to a lesser extent Hostility.
EE cost depends on all characteristics slightly, Sliminess then Intellect/Grotesqueness being the most important. Note: Grotesqueness less important, perhaps zero effect, for “high” type.
MM cost actually very slightly declines for higher values of all characteristics. (note: less effect for “high” type, possibly zero effect)
PP cost depends mainly on Sliminess. However, slight decline in cost with more Corporeality and increase with more of everything else.
SS cost depends primarily on Intellect. However, slight decline with Hostility and increase with Sliminess.
WW cost depends primarily on Hostility. However, everything else also has at least a slight effect, especially Sliminess and Grotesqueness.
Provisionally, I’m OK with just using the linear regression coefficients without the high/low split, though I will want to verify later if this was causing a problem (also need to verify linearity, only checked for DD low (and only for Grotesqueness and Hostility separately, not both together)).
Results:
Ghost | group with lowest estimate | estimated cost for that group
A | Spectre Slayers | 1926.301885259
B | Wraith Wranglers | 1929.72034133793
C | Mundanifying Mystics | 2862.35739392631
D | Demon Destroyers | 1807.30638053037 (next lowest: Wraith Wranglers, 1951.91410462716)
E | Wraith Wranglers | 2154.47901124028
F | Mundanifying Mystics | 2842.62070661731
G | Demon Destroyers | 1352.86163670857 (next lowest: Phantom Pummelers, 1688.45809434935)
H | Phantom Pummelers | 1923.30132492753
I | Wraith Wranglers | 2125.87216703498
J | Demon Destroyers | 1915.0299245701 (Next lowest: Wraith Wranglers, 2162.49691339282)
K | Mundanifying Mystics | 2842.16499046146
L | Mundanifying Mystics | 2783.55221244497
M | Spectre Slayers | 1849.71986735069
N | Phantom Pummelers | 1784.8259008802
O | Wraith Wranglers | 2269.45361189797
P | Mundanifying Mystics | 2775.89249612121
Q | Wraith Wranglers | 1748.56167086623
R | Mundanifying Mystics | 2940.5652346428
S | Spectre Slayers | 1666.64380523907
T | Mundanifying Mystics | 2821.89307084084
U | Phantom Pummelers | 1792.3319145455
V | Demon Destroyers | 1472.45641559628 (Next lowest: Spectre Slayers, 1670.68911559919)
W | Demon Destroyers | 1833.86462523462 (Next lowest: Wraith Wranglers, 2229.1901870478)
So that’s my provisional solution, and I will pay the extra 400sp one time fee so that Demon Destroyers can deal with ghosts D, G, J, V, W.
--Edit: whoops, missed most of this paragraph (other than the Demon Destroyers):
“Bad news! In addition to their (literally and figuratively) arcane rules about territory and prices, several of the exorcist groups have all-too-human arbitrary constraints: the Spectre Slayers and the Entity Eliminators hate each other to the point that hiring one will cause the other to refuse to work for you, the Poltergeist Pummelers are too busy to perform more than three exorcisms for you before the start of the social season, and the Demon Destroyers are from far enough away that – unless you eschew using them at all – they’ll charge a one-time 400sp fee just for showing up.”
will edit to fix! post edit: Actually my initial result is still compatible with that paragraph, it doesn’t involve the Entity Eliminators, and only uses the Phantom Pummelers 3 times. --
Not very confident in my solution (see things to verify above), and if it is indeed this simple it is an easier problem than I expected.
further edit (late July 15 2024): haven’t gotten around to checking those things and also my check of linearity, where I did check, binned the data and could be hiding all sorts of patterns.