I took an analytic approach and picked some reasonable choices based on that. I’ll almost certainly try throwing ML at this problem some point but for now I want to note down what a me-who-can’t-use-XGBoost would do.
Findings:
There are at least some fingerprintable gladiators who keep gladiating, and who need to be Accounted For (the presence of such people makes all archetypery suspect: are Dwarven Knights really that Good, or are there just a handful of super-prolific Dwarven Knights who give everyone an unfairly good impression?). This includes a Level 7 Elven Ninja, almost certainly Cadagal’s Champion, who inexplicably insists on always wearing black (even though it doesn’t seem to make a difference to how well ninjas ninj).
Level 4 Boots and Level 4 Gauntlets are super rare in the dataset. The Gauntlets are always worn by a pair of hypercompetent Level 7 Dwarven Monks; the Boots are always worn by the Level 7 Elven Ninja.
Despite this, Cadagal’s Champion is facing us with Level 2 Boots.
We have some Level 4 Boots.
. . . we robbed this guy, didn’t we? And if we wear the boots—our most powerful equipment—he’ll flip out and set his House against us whether we win or lose? Dammit . . .
Who fights whom?
A is a Human Warrior. Warriors lose to Fencers, Humans lose to Fencers, Humans lose to Elves. We have an Elven Fencer on call; send Y.
B is a Human Knight. Rangers are best vs Knights, so send W. (Not super confident in this one)
C is an Elven Ninja. Ninjas are super weak against Knights. Send Z, the Elven Knight. (Slightly concerned by how underrepresented Elves are in the sample of gladiators who managed to beat this guy but I’m assuming that’s either noise or an effect which Z will be able to shrug off with the Power of Friendship and/or Urgency)
D is a Dwarven Monk. Monks are weak to Ninjas; send U.
Who wears what?
I haven’t managed to figure out how equipment works beyond “higher number good”; if there’s specific synergies with/against specific classes/races/whatever they elude me. For that reason:
Y and Z are my best shots. I’ll have them both wear what their opponents are wearing, to reduce the effects of uncertainty and turn those fights into “who wore it better?” contests. (So +3 Boots and +1 Gauntlets for Y, +2 Boots and +3 Gauntlets for Z.)
U vs D looks pretty solid so I’ll give him the remaining +2 Gauntlets and +1 Boots.
W vs B is my most tenuous guess, I hope she won’t hold a grudge after I send her out unequipped to boost everyone else’s chances.
Took an ML approach, got radically different results which I’m choosing to trust.
Fit a LightGBM model to the raw data, and to the data transformed by simon’s stats-to-strength-and-speed model. Simon’s version got slightly better results on an outsample despite having many fewer degrees of freedom and fewer chances to ‘cheat’ by fingerprinting exceptional fighters; I therefore used that going forward. (I also tried tweaking some of the arbitrary constants in simon’s model: this invariably lowered performance, reassuring me that he got all the multipliers right.)
Iterated all possible matchups, then all possible loadouts (modulo not using the +4 boots), looking for max EV of total count of wins.
New strategy goes like this:
Against A, send U, with +3 Boots
Against B, send X, with +2 Boots and +1 Gauntlets
Against C, send V, with +3 Gauntlets
Against D, send Y, with +1 Boots and +2 Gauntlets
Notes:
The machines say this gives me ~2.6 expected victories but I’m selecting for things they liked so realistically I expect my EV somewhere in the 2-2.5 range.
If I was doing this IRL I’d move the Gauntlets from V to U, lowering EV but (almost) guaranteeing me at least one win.
My best guess about why my solution works (assuming it does) is that the “going faster than your opponent” bonus hits sharply diminishing returns around +4 speed. But that’s just post hoc confabulation.
“My best guess about why my solution works (assuming it does) is that the “going faster than your opponent” bonus hits sharply diminishing returns around +4 speed”
In my model
There is a sharp threshold at +1 speed, so returns should sharply diminish after +1 speed
in fact in the updated version of my model
There is no effect of speed beyond the threshold (speed effect depends only on sign(speed difference))
I think the discrepancy might possibly relate to this:
“Iterated all possible matchups, then all possible loadouts (modulo not using the +4 boots), looking for max EV of total count of wins.”
because
If you consider only the matchups with no items, the model needs to assign the matchups assuming no boots, so it sends your characters against opponents over which they have a speed advantage without boots (except the C-V matchup as there is no possibility of beating C on speed).
so an optimal allocation
needs to take into account the fact that your boots can allow you to use slower and stronger characters, so can’t be done by choosing the matchups first without items.
I agree pick-characters-then-equipment has the limitation you describe—I’m still not sure about the B-vs-X matchup in particular—but I eyeballed some possible outcomes and they seem close enough to optimal that I’m not going to write any more code for this.
I put your solution into my ML model and it seems to think
That your A and C matchups are pretty good (though A could be made slightly better by benching Willow and letting Uzben do her job with the same gear), but B and D have <50% success odds.
However
I didn’t do much hyperparameter tuning and I’m working with a new model type, so it might have more epicycles than warranted.
And
“My model says the solution my model says is best is better than another solution” isn’t terribly reassuring.
. . . regardless, I’m sticking with my choices.
One last note:
I don’t actually think there’s a strict +4 speed benefit cutoff—if I did I’d reallocate the +1 Boots from Y to V—but I suspect there’s some emergent property that kindasorta does the same thing in some highlevel fights maybe.
I tried fitting my ML model without access to speed variables other than sign(speed diff) and got slightly but non-negligibly worse metrics on an outsample. This suggests that sign(speed diff) tells you most of the information you need about speed but if you rely solely on it you’re still missing useful and relevant information.
(. . . either that or my code has another error, I guess. Looking forward to finding out in seven days.)
Very interesting, this would certainly cast doubt on
my simplified model
But so far I haven’t been noticing
any affects not accounted for by it.
After reading your comments I’ve been getting Claude to write up an XGBoost implementation for me, I should have made this reply comment when I started, but will post my results under my own comment chain.
I have not (but should) try to duplicate (or fail to do so) your findings—I haven’t been quite testing the same thing.
I tried fitting a model with only “Strength diff plus 8 times sign(speed diff)” as an explanatory variable, got (impressively, only moderately!) worse results. My best guess is that your model is underfitting, and over-attaching to the (good!) approximation you fed it, because it doesn’t have enough Total Learning to do anything better . . . in which case you might see different outcomes if you increased your number of trees and/or your learning rate.
You may well be right, I’ll look into my hyperparameters. I looked at the code Claude had generated with my interference and that greatly lowered my confidence in them, lol (see edit to this comment).
I took an analytic approach and picked some reasonable choices based on that. I’ll almost certainly try throwing ML at this problem some point but for now I want to note down what a me-who-can’t-use-XGBoost would do.
Findings:
There are at least some fingerprintable gladiators who keep gladiating, and who need to be Accounted For (the presence of such people makes all archetypery suspect: are Dwarven Knights really that Good, or are there just a handful of super-prolific Dwarven Knights who give everyone an unfairly good impression?). This includes a Level 7 Elven Ninja, almost certainly Cadagal’s Champion, who inexplicably insists on always wearing black (even though it doesn’t seem to make a difference to how well ninjas ninj).
Level 4 Boots and Level 4 Gauntlets are super rare in the dataset. The Gauntlets are always worn by a pair of hypercompetent Level 7 Dwarven Monks; the Boots are always worn by the Level 7 Elven Ninja.
Despite this, Cadagal’s Champion is facing us with Level 2 Boots.
We have some Level 4 Boots.
. . . we robbed this guy, didn’t we? And if we wear the boots—our most powerful equipment—he’ll flip out and set his House against us whether we win or lose? Dammit . . .
Who fights whom?
A is a Human Warrior. Warriors lose to Fencers, Humans lose to Fencers, Humans lose to Elves. We have an Elven Fencer on call; send Y.
B is a Human Knight. Rangers are best vs Knights, so send W. (Not super confident in this one)
C is an Elven Ninja. Ninjas are super weak against Knights. Send Z, the Elven Knight. (Slightly concerned by how underrepresented Elves are in the sample of gladiators who managed to beat this guy but I’m assuming that’s either noise or an effect which Z will be able to shrug off with the Power of Friendship and/or Urgency)
D is a Dwarven Monk. Monks are weak to Ninjas; send U.
Who wears what?
I haven’t managed to figure out how equipment works beyond “higher number good”; if there’s specific synergies with/against specific classes/races/whatever they elude me. For that reason:
Y and Z are my best shots. I’ll have them both wear what their opponents are wearing, to reduce the effects of uncertainty and turn those fights into “who wore it better?” contests. (So +3 Boots and +1 Gauntlets for Y, +2 Boots and +3 Gauntlets for Z.)
U vs D looks pretty solid so I’ll give him the remaining +2 Gauntlets and +1 Boots.
W vs B is my most tenuous guess, I hope she won’t hold a grudge after I send her out unequipped to boost everyone else’s chances.
Took an ML approach, got radically different results which I’m choosing to trust.
Fit a LightGBM model to the raw data, and to the data transformed by simon’s stats-to-strength-and-speed model. Simon’s version got slightly better results on an outsample despite having many fewer degrees of freedom and fewer chances to ‘cheat’ by fingerprinting exceptional fighters; I therefore used that going forward. (I also tried tweaking some of the arbitrary constants in simon’s model: this invariably lowered performance, reassuring me that he got all the multipliers right.)
Iterated all possible matchups, then all possible loadouts (modulo not using the +4 boots), looking for max EV of total count of wins.
New strategy goes like this:
Against A, send U, with +3 Boots
Against B, send X, with +2 Boots and +1 Gauntlets
Against C, send V, with +3 Gauntlets
Against D, send Y, with +1 Boots and +2 Gauntlets
Notes:
The machines say this gives me ~2.6 expected victories but I’m selecting for things they liked so realistically I expect my EV somewhere in the 2-2.5 range.
If I was doing this IRL I’d move the Gauntlets from V to U, lowering EV but (almost) guaranteeing me at least one win.
My best guess about why my solution works (assuming it does) is that the “going faster than your opponent” bonus hits sharply diminishing returns around +4 speed. But that’s just post hoc confabulation.
I don’t think this is correct:
In my model
There is a sharp threshold at +1 speed, so returns should sharply diminish after +1 speed
in fact in the updated version of my model
There is no effect of speed beyond the threshold (speed effect depends only on sign(speed difference))
I think the discrepancy might possibly relate to this:
because
If you consider only the matchups with no items, the model needs to assign the matchups assuming no boots, so it sends your characters against opponents over which they have a speed advantage without boots (except the C-V matchup as there is no possibility of beating C on speed).
so an optimal allocation
needs to take into account the fact that your boots can allow you to use slower and stronger characters, so can’t be done by choosing the matchups first without items.
so I predict that your model might predict
a higher EV for my solution
Regarding my strategic approach
I agree pick-characters-then-equipment has the limitation you describe—I’m still not sure about the B-vs-X matchup in particular—but I eyeballed some possible outcomes and they seem close enough to optimal that I’m not going to write any more code for this.
I put your solution into my ML model and it seems to think
That your A and C matchups are pretty good (though A could be made slightly better by benching Willow and letting Uzben do her job with the same gear), but B and D have <50% success odds.
However
I didn’t do much hyperparameter tuning and I’m working with a new model type, so it might have more epicycles than warranted.
And
“My model says the solution my model says is best is better than another solution” isn’t terribly reassuring.
. . . regardless, I’m sticking with my choices.
One last note:
I don’t actually think there’s a strict +4 speed benefit cutoff—if I did I’d reallocate the +1 Boots from Y to V—but I suspect there’s some emergent property that kindasorta does the same thing in some highlevel fights maybe.
Update:
I tried fitting my ML model without access to speed variables other than sign(speed diff) and got slightly but non-negligibly worse metrics on an outsample. This suggests that sign(speed diff) tells you most of the information you need about speed but if you rely solely on it you’re still missing useful and relevant information.
(. . . either that or my code has another error, I guess. Looking forward to finding out in seven days.)
Very interesting, this would certainly cast doubt on
my simplified model
But so far I haven’t been noticing
any affects not accounted for by it.
After reading your comments I’ve been getting Claude to write up an XGBoost implementation for me, I should have made this reply comment when I started, but will post my results under my own comment chain.
I have not (but should) try to duplicate (or fail to do so) your findings—I haven’t been quite testing the same thing.
I tried fitting a model with only “Strength diff plus 8 times sign(speed diff)” as an explanatory variable, got (impressively, only moderately!) worse results. My best guess is that your model is underfitting, and over-attaching to the (good!) approximation you fed it, because it doesn’t have enough Total Learning to do anything better . . . in which case you might see different outcomes if you increased your number of trees and/or your learning rate.
Alternatively
I might just have screwed up my code somehow.
Still . . .
I’m sticking with my choices for now.
You may well be right, I’ll look into my hyperparameters. I looked at the code Claude had generated with my interference and that greatly lowered my confidence in them, lol (see edit to this comment).