D&D Sci Coliseum: Arena of Data
This is an entry in the ‘Dungeons & Data Science’ series, a set of puzzles where players are given a dataset to analyze and an objective to pursue using information from that dataset.
Estimated Complexity: 4⁄5 (this is a guess, I will update based on feedback/seeing how the scenario goes)
STORY
The Demon King rises in his distant Demon Castle. Across the free lands of the world, his legions spread, leaving chaos and death in their wake. The only one who can challenge him is the Summoned Hero, brought by the Goddess Herself from a distant world to aid this one in its time of need. The Summoned Hero must call together all the free peoples of the world under their banner, to triumph united where they would surely fall separately.
And what is the Summoned Hero doing now?
Well, right now you are staring in disbelief at your companions’ explanation of the politics of the Sunset Coast.
Apparently, little things like a Demon King attempting to subjugate the world are not enough to shake them from their traditions. If you want them to listen to you, being the Summoned Hero is not going to suffice. Instead, they conduct all their politics based on gladiatorial combat in the Arena of Dusk.
The good news is that the Four Great Houses of the Sunset Coast will gladly listen to you, and maybe even join you against the Demon King, if you can defeat their Champions in gladiatorial combat.
The bad news is that you are...not really suited to gladiatorial combat. Neither your class nor your isekai cheat powers[1] are especially good at physical fights.
The good news is that you have accumulated by now a large retinue of vagabonds and misfits loyal party members who will gladly fight on your behalf.
The bad news is that even your party members who are good at fighting still seem somewhat outclassed by the Champions.
The good news is that, as any adventuring party should, you have accumulated various magical items, wholly legitimately looted from various places: dungeons, bandits who made the mistake of being your random encounter on a trip between cities, buildings that looked like they might be thieves’ guilds, manifestly corrupt local governors who attempted to have you arrested for no legitimate reason at all...ahem. In any case, you have accumulated various magical items to equip your party members with.
The bad news is that the Four Great Houses have more magic items to equip their Champions with.
The good news is that you’ve gotten your hands on a dataset containing the history of combats in the Arena. With this, you’re hopeful that you can choose how to assign and equip your party members for the best possible odds against the Champions!
The bad news is that it sounds like this will require a lot of work The even better news is that it sounds like this will give you the opportunity to do a lot of fun Data Science! Hooray!
DATA & OBJECTIVES
Your adventuring party has the following martial party members:
Uzben Grimblade, a Level 5 Dwarf Ninja.[2]
Varina Dourstone, a Level 5 Dwarf Warrior.
Willow Brown, a Level 5 Human Ranger.
Xerxes III of Calantha, a Level 5 Human Monk.
Yalathinel Leafstrider, a Level 5 Elf Fencer.
Zelaya Sunwalker, a Level 6 Elf Knight.
You also have some magical items to distribute among them. You have seven magical items total, one each of:
+1, +2, +3 and +4 Boots of Speed
+1, +2 and +3 Gauntlets of Power
You need to choose who will fight each of the four opposing champions:
House Adelon’s champion is a Level 6 Human Warrior with +3 Boots of Speed and +1 Gauntlets of Power.
House Bauchard’s champion is a Level 6 Human Knight with +3 Boots of Speed and +2 Gauntlets of Power.
House Cadagal’s champion is a Level 7 Elf Ninja with +2 Boots of Speed and +3 Gauntlets of Power.
House Deepwrack’s champion is a Level 6 Dwarf Monk with +3 Boots of Speed and +2 Gauntlets of Power.
Your goal is to maximize the number of champions you defeat.
For each opposing champion, you need to choose and equip one of your party members to fight them. You cannot send the same party member to fight two champions, nor can you equip the same item to two party members.
For example, a solution could be:
Give Uzben the +4 Boots of Speed and the +3 Gauntlets of Power and send him to fight House Adelon’s champion.
Give Varina the +3 Boots of Speed and the +2 Gauntlets of Power and send her to fight House Bauchard’s champion.
Give Willow the +2 Boots of Speed and the +1 Gauntlets of Power and send her to fight House Cadagal’s champion.
Give Xerxes the +1 Boots of Speed and send him to fight House Deepwrack’s champion.
Do not send Yalathinel or Zelaya to fight at all.
To assist in this, you have a dataset with the records of past fights in the Arena. Each dataset shows the two fighters that took part, what their levels/races/classes/magical items were, and which one won.
SECRET BONUS OBJECTIVE?
A strange piece of paper appears out of nowhere and falls into your hands. You try to read it, but most of it is damaged beyond recognition. You get a sudden feeling, though, that what it says is very important. Did it come from one of your isekai cheat powers? Was it revealed to you by Enlightenment, or sent from the future by Temporal Distortion? Or is the Goddess putting another finger on the scales?
If you ??? ??? ?? ????? ?? ?????? ??? ???? ??????? ???? ??????? ???? ??? ???? responsible ??? ????? ?????? ????? ???? ??? House. ??? ???? ???? ??? lasting enmity, ??? ?????? ???? ???????? ???? ??? ???? ?? ??? ???? ??????? ?? ?? ????? ?? ????????? ?? ??? ???? your honor ?? ????????? ???? ?? ???? ??? ???? ??? ??? friendship ???? ?? ??? ???? ?? ??? ?????? ??????? ?? ?? ?????
I’ll aim to post the ruleset and results on October 28th (giving one week and both weekends for players). If you find yourself wanting extra time, because you found this scenario late and want a chance to attempt it yourself, or just because you end up a bit rushed/busy with other commitments and would be happier to have a extra week, comment below and I can push this deadline back.
As usual, working together is allowed, but for the sake of anyone who wants to work alone, please spoiler parts of your answers that contain information or questions about the dataset. To spoiler answers on a PC, type a ‘>’ followed by a ‘!’ at the start of a line to open a spoiler block—to spoiler answers on mobile, type a ‘:::spoiler’ at the start of a line and then a ‘:::’ at the end to spoiler the line.
I took an analytic approach and picked some reasonable choices based on that. I’ll almost certainly try throwing ML at this problem some point but for now I want to note down what a me-who-can’t-use-XGBoost would do.
Findings:
There are at least some fingerprintable gladiators who keep gladiating, and who need to be Accounted For (the presence of such people makes all archetypery suspect: are Dwarven Knights really that Good, or are there just a handful of super-prolific Dwarven Knights who give everyone an unfairly good impression?). This includes a Level 7 Elven Ninja, almost certainly Cadagal’s Champion, who inexplicably insists on always wearing black (even though it doesn’t seem to make a difference to how well ninjas ninj).
Level 4 Boots and Level 4 Gauntlets are super rare in the dataset. The Gauntlets are always worn by a pair of hypercompetent Level 7 Dwarven Monks; the Boots are always worn by the Level 7 Elven Ninja.
Despite this, Cadagal’s Champion is facing us with Level 2 Boots.
We have some Level 4 Boots.
. . . we robbed this guy, didn’t we? And if we wear the boots—our most powerful equipment—he’ll flip out and set his House against us whether we win or lose? Dammit . . .
Who fights whom?
A is a Human Warrior. Warriors lose to Fencers, Humans lose to Fencers, Humans lose to Elves. We have an Elven Fencer on call; send Y.
B is a Human Knight. Rangers are best vs Knights, so send W. (Not super confident in this one)
C is an Elven Ninja. Ninjas are super weak against Knights. Send Z, the Elven Knight. (Slightly concerned by how underrepresented Elves are in the sample of gladiators who managed to beat this guy but I’m assuming that’s either noise or an effect which Z will be able to shrug off with the Power of Friendship and/or Urgency)
D is a Dwarven Monk. Monks are weak to Ninjas; send U.
Who wears what?
I haven’t managed to figure out how equipment works beyond “higher number good”; if there’s specific synergies with/against specific classes/races/whatever they elude me. For that reason:
Y and Z are my best shots. I’ll have them both wear what their opponents are wearing, to reduce the effects of uncertainty and turn those fights into “who wore it better?” contests. (So +3 Boots and +1 Gauntlets for Y, +2 Boots and +3 Gauntlets for Z.)
U vs D looks pretty solid so I’ll give him the remaining +2 Gauntlets and +1 Boots.
W vs B is my most tenuous guess, I hope she won’t hold a grudge after I send her out unequipped to boost everyone else’s chances.
Took an ML approach, got radically different results which I’m choosing to trust.
Fit a LightGBM model to the raw data, and to the data transformed by simon’s stats-to-strength-and-speed model. Simon’s version got slightly better results on an outsample despite having many fewer degrees of freedom and fewer chances to ‘cheat’ by fingerprinting exceptional fighters; I therefore used that going forward. (I also tried tweaking some of the arbitrary constants in simon’s model: this invariably lowered performance, reassuring me that he got all the multipliers right.)
Iterated all possible matchups, then all possible loadouts (modulo not using the +4 boots), looking for max EV of total count of wins.
New strategy goes like this:
Against A, send U, with +3 Boots
Against B, send X, with +2 Boots and +1 Gauntlets
Against C, send V, with +3 Gauntlets
Against D, send Y, with +1 Boots and +2 Gauntlets
Notes:
The machines say this gives me ~2.6 expected victories but I’m selecting for things they liked so realistically I expect my EV somewhere in the 2-2.5 range.
If I was doing this IRL I’d move the Gauntlets from V to U, lowering EV but (almost) guaranteeing me at least one win.
My best guess about why my solution works (assuming it does) is that the “going faster than your opponent” bonus hits sharply diminishing returns around +4 speed. But that’s just post hoc confabulation.
I don’t think this is correct:
In my model
There is a sharp threshold at +1 speed, so returns should sharply diminish after +1 speed
in fact in the updated version of my model
There is no effect of speed beyond the threshold (speed effect depends only on sign(speed difference))
I think the discrepancy might possibly relate to this:
because
If you consider only the matchups with no items, the model needs to assign the matchups assuming no boots, so it sends your characters against opponents over which they have a speed advantage without boots (except the C-V matchup as there is no possibility of beating C on speed).
so an optimal allocation
needs to take into account the fact that your boots can allow you to use slower and stronger characters, so can’t be done by choosing the matchups first without items.
so I predict that your model might predict
a higher EV for my solution
Regarding my strategic approach
I agree pick-characters-then-equipment has the limitation you describe—I’m still not sure about the B-vs-X matchup in particular—but I eyeballed some possible outcomes and they seem close enough to optimal that I’m not going to write any more code for this.
I put your solution into my ML model and it seems to think
That your A and C matchups are pretty good (though A could be made slightly better by benching Willow and letting Uzben do her job with the same gear), but B and D have <50% success odds.
However
I didn’t do much hyperparameter tuning and I’m working with a new model type, so it might have more epicycles than warranted.
And
“My model says the solution my model says is best is better than another solution” isn’t terribly reassuring.
. . . regardless, I’m sticking with my choices.
One last note:
I don’t actually think there’s a strict +4 speed benefit cutoff—if I did I’d reallocate the +1 Boots from Y to V—but I suspect there’s some emergent property that kindasorta does the same thing in some highlevel fights maybe.
Update:
I tried fitting my ML model without access to speed variables other than sign(speed diff) and got slightly but non-negligibly worse metrics on an outsample. This suggests that sign(speed diff) tells you most of the information you need about speed but if you rely solely on it you’re still missing useful and relevant information.
(. . . either that or my code has another error, I guess. Looking forward to finding out in seven days.)
Very interesting, this would certainly cast doubt on
my simplified model
But so far I haven’t been noticing
any affects not accounted for by it.
After reading your comments I’ve been getting Claude to write up an XGBoost implementation for me, I should have made this reply comment when I started, but will post my results under my own comment chain.
I have not (but should) try to duplicate (or fail to do so) your findings—I haven’t been quite testing the same thing.
I tried fitting a model with only “Strength diff plus 8 times sign(speed diff)” as an explanatory variable, got (impressively, only moderately!) worse results. My best guess is that your model is underfitting, and over-attaching to the (good!) approximation you fed it, because it doesn’t have enough Total Learning to do anything better . . . in which case you might see different outcomes if you increased your number of trees and/or your learning rate.
Alternatively
I might just have screwed up my code somehow.
Still . . .
I’m sticking with my choices for now.
You may well be right, I’ll look into my hyperparameters. I looked at the code Claude had generated with my interference and that greatly lowered my confidence in them, lol (see edit to this comment).
Thanks aphyer. My analysis so far and proposed strategy:
After initial observations that e.g. higher numbers are correlated with winning, I switched to mainly focus on race and class, ignoring the numerical aspects.
I found major class-race interactions.
It seems that for matchups within the same class, Elves are great, tending to beat dwarves consistently across all classes and humans even harder. While Humans beat dwarves pretty hard too in same-class matchups.
Within same-race matchups there are also fairly consistent patterns: Fencers tend to beat Rangers, Monks and Warriors, Knights beat Ninjas, Monks beat Warriors, Rangers and Knights, Ninjas beat Monks, Fencers and Rangers, Rangers beat Knights and Warriors, and Warriors beat Knights.
If the race and class are both different though… things can be different. For example, a same-class Elf will tend to beat a same-class Dwarf. And a same-race Fencer will tend to beat a same-race Warrior. But if an Elf Fencer faces a Dwarf Warrior, the Dwarf Warrior will most likely win. Another example with Fencers and Warriors: same-class Elves tend to beat Humans—but not only will a Human Warrior tend to beat an Elf Fencer, but also a Human Fencer will tend to beat an Elf Warrior by a larger ratio than for a same-race Fencer/Warrior matchup???
If you look at similarities between different classes in terms of combo win rates, there seems to be a chain of similar classes:
Knight—Warrior—Ranger—Monk—Fencer—Ninja
(I expected a cycle underpinned by multiple parameters. But Ninja is not similar to Knight. This led me to consider that perhaps there is an only a single underlying parameter, or trade off between two (e.g. strength/agility …. or … Speed and Power)).
And going back to the patterns seen before, this seems compatible with races also having speed/power tradeoffs:
Dwarf—Human—Elf
Where speed has a threshold effect but power is more gradual (so something with slightly higher speed beats something with slightly higher power, but something with much higher power beats something with much higher speed).
Putting the Class-race combos on the same spectrum based on similarity/trends in results, I get the following ordering:
Elf Ninja > Elf Fencer > Human Ninja > Elf Monk > Human Fencer > Dwarf Ninja >~ Elf Ranger > Human Monk > Elf Warrior > Dwarf Fencer > Human Ranger > Dwarf Monk >~ Elf Knight > Human Warrior > Dwarf Ranger > Human Knight > Dwarf Warrior > Dwarf Knight
So, it seems a step in the race sequence is about equal to 1.5 steps in the class sequence. On the basis of pretty much just that, I guessed that race steps are a 3 speed vs power tradeoff, class steps are a 2 speed and power tradeoff, levels give 1 speed and power each, and items give what they say on the label.
I have not verified this as much as I would like. (But on the surface it seems to work, e.g. speed threshold seems to be there). One thing that concerns me is that it seems that
higher speed differences actually reduce success chances holding power differences constant(could be an artifact, e.g., of it not just depending on the differences between stat values edit: see further edit below). But, for now, assuming that I have it correct, speed/power of the house champions (with the lowest race and class in a stat assumed to have 0 in that stat):House Adelon: Level 6 Human Warrior +3 Boots +1 Gauntlets − 14 speed 18 power
House Bauchard: Level 6 Human Knight +3 Boots +2 Gauntlets − 12 speed 21 power
House Cadagal: Level 7 Elf Ninja +2 Boots +3 Gauntlets − 25 speed 10 power
House Deepwrack: Level 6 Dwarf Monk +3 Boots +2 Gauntlets − 15 speed 18 power
Whereas the party’s champions, ignoring items, have:
Uzben Grimblade, a Level 5 Dwarf Ninja − 15 speed 11 power
Varina Dourstone, a Level 5 Dwarf Warrior − 7 speed 19 power
Willow Brown, a Level 5 Human Ranger − 12 speed 14 power
Xerxes III of Calantha, a Level 5 Human Monk − 14 speed 12 power
Yalathinel Leafstrider, a Level 5 Elf Fencer − 19 speed 7 power
Zelaya Sunwalker, a Level 6 Elf Knight − 12 speed 16 power
For my proposed strategy (subject to change as I find new info, or find my assumptions off, e.g. such that my attempts to just barely beat the opponents on speed are disastrously slightly wrong):
I will send Willow Brown, with +3 boots and
+1 gauntletsno gauntlets, against House Adelon’s champion (1 speed advantage,34 power deficit)I will send Zelaya Sunwalker, with +1 boots and
+2+1 gauntlets, against House Bauchard’s champion (1 speed advantage,34 power deficit)I will send Xerxes III of Calantha, with +2 boots and
+3+2 gauntlets, against House Deepwrack’s champion (1 speed advantage,34 power deficit)And I will send Varina Dourstone, with +3 gauntlets
no items, to overwhelm House Cadagal’s Elf Ninja with sheer power (18 speed deficit,912 power advantage).And in fact, I will gift the +4 boots of speed to House Cadagal’s Elf Ninja in advance of the fight, making it a 20 speed deficit.
Why? Because I noticed that +4 boots of speed are very rare items that have only been worn by Elf Ninjas in the past. So maybe that’s what the bonus objective is talking about. Of course, another interpretation is that sending a character 2 levels lower without any items, and gifting a powerful item in advance, would be itself a grave insult. Someone please decipher the bonus objective to save me from this foolishness!
Edited to add: It occurs to me that I really have no reason to believe the power calculation is accurate, beyond that symmetry is nice. I’d better look into that.
further edit: it turns out that I was leaving out the class contribution to the power difference when calculating the power difference for determining the effects of power and speed. It looks like this was causing the effect of higher speed differences seeming to reduce win rates. With this fixed the effects look much cleaner (e.g. there’s a hard threshold where if you have a speed deficit you must have at least 3 power advantage to have any chance to win at all), increasing my confidence that effects on power and speed being symmetric is actually correct. This does have the practical effect of making me adjust my item distribution: it looks like a 4 deficit in power is still enough for >90% win rate with a speed advantage, while getting similar win rates with a speed disadvantage will require more than just the 9 power difference, so I shifted the items to boost Varina’s power advantage. Indeed, with the cleaner effects, it appears that I can reasonably model the effect of a speed advantage/disadvantage as equivalent to a power difference of 8, so with the item shift all characters will have an effective +4 power advantage taking this into account.
Noting that I read this (and that therefore you get partial credit for any solution I come up with from here on out): your model and the strategies it implies are both very interesting. I should be able to investigate them with ML alongside everything else, when/if I get around to doing that.
Regarding the Bonus Objective:
I can’t figure out whether offering that guy we unknowingly robbed his shoes back is the best or the worst diplomatic approach our character could take, but yeah I’m pretty sure we both located the problem and roughly what it implies for the scenario.
On the bonus objective:
I didn’t realize that the level 7 Elf Ninjas were all one person or that the boots +4 were always with a level 7 (as opposed to any level) Elf Ninja. It seems you are correct as there are 311 cases of which the first 299 all have the boots of speed 4 and gauntlets 3 with only the last 12 having boots 2 and gauntlets 3 (likely post-theft). It seems to me that they appear both as red and black, though.
>only the last 12 having boots 2 and gauntlets 3 (likely post-theft)
Didn’t notice that but it confirms my theory, nice.
>It seems to me that they appear both as red and black, though.
Ah, I see where the error in my code was that made me think otherwise. Strange coincidence: I thought “oh yeah a powerful wealthy elf ninja who pointedly wears black when assigned red clothes, what a neat but oddly specific 8-bit theater reference” and then it turned out to be a glitch.
updated model for win chance:
I am currently modeling the win ratio as dependent on a single number, the effective power difference. The effective power difference is the power difference plus 8*sign(speed difference).
Power and speed are calculated as:
Power = level + gauntlet number + race power + class power
Speed = level + boots number + race speed + class speed
where race speed and power contributions are determined by each increment on the spectrum:
Dwarf—Human—Elf
increasing speed by 3 and lowering power by 3
and class speed and power contributions are determined by each increment on the spectrum:
Knight—Warrior—Ranger—Monk—Fencer—Ninja
increasing speed by 2 and lower power by 2.
So, assuming this is correct, what function of the effective power determines the win rate? I don’t have a plausible exact formula yet, but:
If the effective power difference is 6 or greater, victory is guaranteed.
If the effective power difference is low, it seems a not-terrible fit that the odds of winning are about exponential in the effective power difference (each +1 effective power just under doubling odds of winning)
It looks like it is trending faster than exponential as the effective power difference increases. At an effective power difference of 4, the odds of the higher effective power character winning are around 17 to 1.
edit: it looks like there is a level dependence when holding effective power difference constant at non-zero values (lower/higher level → winrate imbalance lower/higher than implied by effective power difference). Since I don’t see this at 0 effective power difference, it is presumably not due to an error in the effective power calculation, but an interaction with the effective power difference to determine the final winrate. Our fights are likely “high level” for this purpose implying better odds of winning than the 17 to 1 in each fight mentioned above. Todo: find out more about this effect quantitatively.edit2: whoops that wasn’t a real effect, just me doing the wrong test to look for one.Inspired by abstractapplic’s machine learning and wanting to get some experience in julia, I got Claude (3.5 sonnet) to write me an XGBoost implementation in julia. Took a long time especially with some bugfixing (took a long time to find that a feature matrix was the wrong shape—a problem with insufficient type explicitness, I think). Still way way faster than doing it myself! Not sure I’m learning all that much julia, but am learning how to get Claude to write it for me, I hope.
Anyway, I used a simple model that
only takes into account 8 * sign(speed difference) + power difference, as in the comment this is a reply to
and a full model that
takes into account all the available features including the base data, the number the simple model uses, and intermediate steps in the calculation of that number (that would be, iirc: power (for each), speed (for each), speed difference, power difference, sign(speed difference))
Results:
Rank 1
Full model scores: Red: 94.0%, Black: 94.9%
Combined full model score: 94.4%
Simple model scores: Red: 94.3%, Black: 94.6%
Combined simple model score: 94.5%
Matchups:
Varina Dourstone (+0 boots, +3 gauntlets) vs House Cadagal Champion
Willow Brown (+3 boots, +0 gauntlets) vs House Adelon Champion
Xerxes III of Calantha (+2 boots, +2 gauntlets) vs House Deepwrack Champion
Zelaya Sunwalker (+1 boots, +1 gauntlets) vs House Bauchard Champion
This is the top scoring scoring result with either the simplified model or the full model. It was found by a full search of every valid item and hero combination available against the house champions.
It is also my previously posted, found w/o machine learning, proposal for the solution. Which is reassuring. (Though, I suppose there is some chance that my feeding the models this predictor, if it’s good enough, might make them glom on to it while they don’t find some hard-to learn additional pattern.)
My theory though is that giving the models the useful metric mostly just helps them—they don’t need to learn the metric from the data, and I mostly think that if there was a significant additional pattern the full model would do better.
(for Cadagal, I haven’t changed the champion’s boots to +4, though I don’t expect that to make a significant difference)
As far as I can tell the full model doesn’t do significantly better and does worse in some ways (though, I don’t know much about how to evaluate this, and Claude’s metrics,
including a test set log loss of 0.2527 for the full model and 0.2511 for the simple model, are for a separately generated version which I am not all that confident are actually the same models, though they “should be” up to the restricted training set if Claude was doing it right). * see edit belowBut the red/black variations seen below for the full model seem likely to me (given my prior that red and black are likely to be symmetrical) to be an indication that what the full model is finding that isn’t in the full model is at least partially overfitting. Though actually, if it’s overfitting a lot, maybe it’s surprising that the test set log loss wouldn’t be a lot worse than found (though it is at least worse than the simple model)? Hmm—what if there are actual red/black difference? (something to look into perhaps, as well as try to duplicate abstractapplic’s report regarding sign(speed difference) not exhausting the benefits of speed info
… but for now I’m more likely to leave the machine learning aside and switch to looking at distributions of gladiator characteristics, I think.)Predictions for individual matchups for my and abstractapplic’s solutions:
My matchups:
Varina Dourstone (+0 boots, +3 gauntlets) vs House Cadagal Champion (+2 boots, +3 gauntlets)
Full Model: Red: 91.1%, Black: 96.7%
Simple Model: Red: 94.3%, Black: 94.6%
Willow Brown (+3 boots, +0 gauntlets) vs House Adelon Champion (+3 boots, +1 gauntlets)
Full Model: Red: 94.3%, Black: 95.1%
Simple Model: Red: 94.3%, Black: 94.6%
Xerxes III of Calantha (+2 boots, +2 gauntlets) vs House Deepwrack Champion (+3 boots, +2 gauntlets)
Full Model: Red: 95.2%, Black: 93.7%
Simple Model: Red: 94.3%, Black: 94.6%
Zelaya Sunwalker (+1 boots, +1 gauntlets) vs House Bauchard Champion (+3 boots, +2 gauntlets)
Full Model: Red: 95.3%, Black: 93.9%
Simple Model: Red: 94.3%, Black: 94.6%
(all my matchups have 4 effective power difference in my favour as noted in an above comment)
abstractapplic’s matchups:
Matchup 1:
Uzben Grimblade (+3 boots, +0 gauntlets) vs House Adelon Champion (+3 boots, +1 gauntlets)
Win Probabilities:
Full Model: Red: 72.1%, Black: 62.8%
Simple Model: Red: 65.4%, Black: 65.7%
Stats:
Speed: 18 vs 14 (diff: 4)
Power: 11 vs 18 (diff: −7)
Effective Power Difference: 1
--------------------------------------------------------------------------------
Matchup 2:
Xerxes III of Calantha (+2 boots, +1 gauntlets) vs House Bauchard Champion (+3 boots, +2 gauntlets)
Win Probabilities:
Full Model: Red: 46.6%, Black: 43.9%
Simple Model: Red: 49.4%, Black: 50.6%
Stats:
Speed: 16 vs 12 (diff: 4)
Power: 13 vs 21 (diff: −8)
Effective Power Difference: 0
--------------------------------------------------------------------------------
Matchup 3:
Varina Dourstone (+0 boots, +3 gauntlets) vs House Cadagal Champion (+2 boots, +3 gauntlets)
Win Probabilities:
Full Model: Red: 91.1%, Black: 96.7%
Simple Model: Red: 94.3%, Black: 94.6%
Stats:
Speed: 7 vs 25 (diff: −18)
Power: 22 vs 10 (diff: 12)
Effective Power Difference: 4
--------------------------------------------------------------------------------
Matchup 4:
Yalathinel Leafstrider (+1 boots, +2 gauntlets) vs House Deepwrack Champion (+3 boots, +2 gauntlets)
Win Probabilities:
Full Model: Red: 35.7%, Black: 39.4%
Simple Model: Red: 34.3%, Black: 34.6%
Stats:
Speed: 20 vs 15 (diff: 5)
Power: 9 vs 18 (diff: −9)
Effective Power Difference: −1
--------------------------------------------------------------------------------
Overall Statistics:
Full Model Average: Red: 61.4%, Black: 60.7%
Simple Model Average: Red: 60.9%, Black: 61.4%
Edit: so I checked the actual code to see if Claude was using the same hyperparameters for both, and wtf wtf wtf wtf. The code has 6 functions that all train models (my fault for at one point renaming a function since Claude gave me a new version that didn’t have all the previous functionality (only trained the full model instead of both—this was when doing the great bughunt for the misshaped matrix and a problem was suspected in the full model), then Claude I guess picked up on this and started renaming updated versions spontaneously, and I was adding Claude’s new features in instead of replacing things and hadn’t cleaned up the code or asked Claude to do so). Each one has it’s own hardcoded hyperparameter set. Of these, there are one pair of functions that have matching hyperparameters. Everything else has a unique set. Of course, most of these weren’t being used anymore, but the functions for actually generating the models I used for my results, and the function for generating the models used for comparing results on a train/test split, weren’t among the matching pair. Plus another function that returns a (hardcoded, also unique) updated parameter set, but wasn’t actually used. Oh and all this is not counting the hyperparameter tuning function that I assumed was generating a set of tuned hyperparameters to be used by other functions, but in fact was just printing results for different tunings. I had been running this every time before training models! Obviously I need to be more vigilant (or maybe asking Claude to do so might help?).
edit:
Had Claude clean up the code and tune for more overfitting, still didn’t see anything not looking like overfitting for the full model. Could still be missing something, but not high enough in subjective probability to prioritize currently, so have now been looking at other aspects of the data.
further edit:
My (what I think is) highly overfitted version of my full model really likes Yonge’s proposed solution. In fact it predicts a
higher winrate than forequal winrate to the best possible configuration not using the +4 boots (I didn’t have Claude code the situation where +4 boots are a possibility). I still think that’s probably because they are picking up the same random fluctuations … but it will be amusing if Yonge’s “manual scan” solution turns out to be exactly right.Now using julia with Claude to look at further aspects of the data, particularly in view of other commenters’ observations:
First, thanks to SarahSrinivasan for the key observation that the data is organized into tournaments and non-tournament encounters. The tournaments skew the overall data to higher winrate gladiators, so restricting to the first round is essential for debiasing this (todo: check what is up with non-tournament fights).
Also, thanks to abstractapplic and Lorxus for pointing out that their are some persistent high level gladiators. It seems to me all the level 7 gladiators are persistent (up to the two item changes remarked on by abstractapplic and Lorxus). I’m assuming for now level 6 and below likely aren’t persistent (other than in the same tournament).
(btw there are a couple fights where the +4 gauntlets holder is on both sides. I’m assuming this is likely a bug in the dataset generation rather than an indication that there are two of them (e.g. didn’t check that both sides, drawn randomly from some pool, were not equal)).
For gladiators of levels 1 to 6, the boots and gauntlets in tournament first rounds seem to be independently and randomly assigned as follows:
+1 and +2 gauntlets are equally likely at 10⁄34 chance each;
+3 gauntlets have probability (4 + level)/34
+0 (no) gauntlets have probability (10 - level)/34
and same, independently, for boots.
I didn’t notice obvious deviations for particular races and classes (only did a few checks).
I don’t have a simple formula for level distribution yet. It is clearly much more favouring lower levels in tournament first rounds as compared with non-tournament fights, and level 1 gladiators don’t show up at all in non-tournament fights. Will edit to add more as I find more.
edit: boots/gauntlets distribution seems to be about the same for each level in the non-tournament distribution as in the tournament first rounds. This suggests that the level distribution differences in non-tournament rounds is not due to win/winrate selection (which the complete absence of level 1′s outside of tournaments already suggested).
edit2: race/class distribution for levels 1-6 seems equal in first round data (same probabilities of each, independent). Same in non-tournament data. I haven’t checked for particular levels within that range. edit3: there seems to be more level 1 fencers than other level 1 classes by an amount that is technically statistically significant if Claude’s test is correct, though still probably random I assume.
I haven’t yet gotten into any stats or modeling, just some data exploration, but there’s some things I haven’t seen mentioned elsewhere yet:
Zeroth: the rows are definitely in order! First: the arena holds regular single-elimination tournaments with 64 participants (63 total rounds) and these form contiguous blocks in the dataset with a handful of (unrelated?) bonus rounds in between. Second: Maybe the level 7 Dwarf Monk stole (won?) those +4 boots by winning a tournament (the Elf Ninja’s last use was during a final round vs that monk!) and then we acquired the boots from that monk? They appear to have upgraded their boots once before from +1 to +3 when defeating a Dwarf Ninja, though that was during a bonus round, not a tournament.
Does the fact that we see the winners of tournaments 6x more often than those eliminated in round one matter for modeling? It might; if e.g. gladiators have a hidden “skill” stat but for some reason the house champions don’t have very high skill, we’ll be implicitly significantly overestimating their hidden skill stat.
Given equal level, race, and class, regardless of gauntlets, better boots always wins, no exceptions.
A very good predictor of victory for many race/class vs race/class matchups is the difference in level+boots plus a static modifier based on your matchup. Probably when it’s not as good we should be taking into account gauntlets. But also ninjas seem to maybe just do something weird. I’m guessing a sneak attack of some sort.
Anyway just manually matching up our available gladiators yields this setup which seems extremely likely to simply win:
# Elf Knight to beat Human Warrior 9 with just 1 adv. Needs Boots 3+
# Elf Fencer to beat Human Knight 9 by a lot but gauntlets might matter. Boots +1 are fine. Send Gauntlets +2.
# Human Monk to beat Elf Ninja 9 with 3 adv but gauntlets might matter. Needs Boots 2+. Send Gauntlets +3.
# Human Ranger to beat Dwarf Monk 9 with just 1 adv. Needs Boots 4+
aka
Give Zelaya the +3 Boots of Speed and the +1 Gauntlets of Power and send them to fight House Adelon’s champion.
Give Yalathinel the +1 Boots of Speed and the +2 Gauntlets of Power and send them to fight House Bauchard’s champion.
Give Xerxes III the +2 Boots of Speed and the +3 Gauntlets of Power and send him to fight House Cadagal’s champion.
Give Willow the +4 Boots of Speed and send her to fight House Deepwrack’s champion.
Do not send Uzben or Varina to fight at all.
The problem is that the Elf Ninja might want their +4 Boots. Or might want us to definitely not use them. Or something. As-is, we win; if the Elf Ninja is gonna be irate afterwards maybe winning isn’t enough, but I dunno how to reliably win without using the +4 Boots. We can certainly try to schedule Willow’s fight first, then after the fight against House Cadagal we can gift the +4 Boots back. I think the only better alternative is if it turns out the Elf Ninja is actually willing to throw the match for the +4 Boots back and be friendly with us afterwards, in which case probably there are better ways to set this up.
Looking at how various combinataions of race and class do against one another when their levels are the same there are clearly some combinations that do a lot better than others. Increasing the level helps to an extent, but the race/class combination looks like it is easily the most important factor. Special items do help, but less than the level. In most cases boots seem a bit more useful than gauntlets.
Manually scanning through the data suggests that the following combinations hopefully won’t be too bad:
DWARF NINJA v HUMAN WARRIOR + 1 boots of speed + 2 gauntlets
ELF KNIGHT v HUMAN KNIGHT + 3 boots of speed +1 gauntlets
DWARF WARRIOR v ELF NINJA + 3 gauntlets + 2 boots of speed
HUMAN MONK v DWARF MONK + 4 boots of speed
As for the bonus obective. It looks like someone is threatening us if we do well in one or more of the fights, however I can’t establish the details with any reliability. And as we have no idea who sent, and what their real intentions are, we would probably be wise to ignore it ad do what we would have done anyway, at least for now.
I’m going to start by attacking this a little on my own before I even look much at what other people have done.
Some initial observations from the SQL+Python practice this gave me a good excuse to do:
Adelon looks to have rough matchups against Elf Monks. Which we don’t have. They are however soft to even level 3-4 challengers sometimes. Maybe Monks and/or Fencers have an edge on Warriors?
Bauchard seems to have particularly strong matchups against other Knights, so we don’t send Velaya there. They seem a little soft to Monks and to Dwarf Ninjas and especially to Knights, so maybe Zelaya? Boots should help here.
Cadagal has precious few defeats, but one of them might be to a level 2(!) Human Warrior with fancy +3 Gauntlets. Though it seems like there’s a lot of combats where some Cadagal-like fighter has +4 Boots instead? Not sure if that’s the same guy.
And on that note, the max level is 7, and the max bonus for Boots and Gauntlets both is +4.
Max Boots (+4) is always on a level 7 Elf Ninja with +3 Gauntlets (but disappears altogether most of the way through the dataset).
Max Gauntlets (+4) is on either a level 7 Dwarf Monk who upgraded from +1 Boots to +3 Boots halfway through, or else there’s two of them. Thankfully we’re not facing them.
Deepwrack poses problems. They have just as few defeats, and one of them even contradicts the ordering I derived below! Ninjas are meant to lose to Monks. Maybe the speed matters a lot in that case?
It looks like a strict advantage in level or gear—holding all else constant—means you win every time. If everything is totally identical, you win about half the time. (Which seems obvious but worth checking.)
Looking through upsets—bouts where the classes are different, the losing fighter had at least 2 levels on the winner, and the loser’s gear was no better than the winner’s—we generally see that:
Fencers beat Monks and Rangers and lose to Knights, Ninjas, and Warriors
Knights beat Fencers and Ninjas, tie(???) with Monks and Warriors, and lose (weakly) to Rangers
Monks beat Ninjas, Rangers, and maybe Warriors, tie (?) with Knights, and lose to Fencers
Ninjas beat Fencers and (weakly) Rangers, and lose to Knights, Monks, and Warriors
Rangers beat Knights (weakly), Ninjas, and Warriors, tie with Fencers, and lose to Monks
Warriors beat Fencers, Ninjas, tie(?) with Knights, and lose to Rangers and maybe Monks
So my current best guess (pending understanding which gear is best for which class/race) is:
Willow v Adelon, Varina v Bauchard, Xerxes v Cadagal, Yalathinel v Deepwrack.
If I had to guess what gear to give to who: Warrior v Knight is a rough matchup, so Varina’s going to need the help; the rest of my assignments are based thus far on ~vibes~ for whether speed or power will matter more for the class. Thus:
Willow gets +2 Boots and +1 Gauntlets, Varina gets +4 Boots and +3 Gauntlets, Xerxes gets +1 Boots and +2 Gauntlets, and Yalathinel gets +3 Boots.
Some theories I need to test:
Race affects how good you are at a class. Elves might be best at rangering, say.
Race and/or class affect how much benefit you get out of boots and/or gauntlets. Being a warrior might mean you get full benefit from gauntlets but none from boots.
Color might affect how well classes do. Ninjas wearing red might win way less often.
The color does not actually seem to affect ninjas all that much if at all − 6963 vs 6762 wins. Could still be a tiebreaker?
Color doesn’t affect things much overall either: 40136 vs 39961 wins.
There’s some rank-ordering of class+race+level matchups, maybe an additive one.
Alternatively there could be some nontransitive thing going on with tiebreaks sometimes from levels, races, and gear?
On further reflection that totally seems to be what’s going on here.
Maybe there’s something about the matchup ordering being sorted over (race, class)? D’s loss (as a L6 Dwarf Monk) to a L4 Dwarf Ninja is… unexpected to say the least!
Wild speculation:
If you [use the +4 Boots in combat and beat Cadagal then they’ll know you were] responsible [for] ????? ?????? [Boots from his/her/the] House. [You will gain its] lasting enmity, [and] [people? will?] ???????? ???? ??? ???? ?? ??? ???? ??????? ?? ?? ????? ?? ????????? ?? ??? [upon] your honor [if] ????????? ???? ?? ???? ??? ???? ??? ??? friendship ???? ?? ??? ???? ?? ??? ?????? ??????? ?? ?? ?????.
So maybe we’re OK to use the +4 Boots as long as it’s not against Cadagal?
No idea how to even guess at what’s going on in that second sentence apart from “bad things will happen and everyone will hate you, you dirty thief”.
Ah, late to the party, didn’t see this one coming up. Pity.
Anyway, before I check my result I will just try to preregister a few insights and see if they are carried out.
There might probably be some effects in the dataset that are only relevant at lower levels, and so exist mainly as a red herring, since all our fights are between people of at least level 5. I therefore doublechecked everything looking at only the subset of data with all fighters in high level.
The classes seem to work in a such of Rock/Scissor/Paper way, some being much stronger against others.
I plan to try to beat the: Human Warrior with my Ranger / Human Knight with my Monk / Elf Ninja with my Knight / Dwarf Monk with my Fencer.
Ah, sorry to hear that. You can still look for a solution even if you aren’t in time to make it on the leaderboard!
Also, if you are interested in these scenarios in general, you can subscribe to the D&D. Sci tag (click the ‘Subscribe’ button on that page) and you’ll get notifications whenever a new one is posted.
A thanks a lot. I was actually working through the earlier scenarios, I just missed that I new one had popped up. Subscribed now, then I will hopefully notice the next one.
Also, my approach didn’t work this time, I ended up trying with a way too complicated model. I really like how the actual answer to this one worked.
Dang, I missed seeing this before the solution was posted. And oh dear, it’s high-complexity. Oh well, I’ll give it a shot anyway!
Edit: Hah, I spent an hour checking one thing (which went nowhere), and then ran out of steam, and now I can no longer resist checking the answer. So much for that 😅 Next time I’ll try to check my notifications more often so I see the next one before the answer is up, maybe that’ll give me more drive to keep at it.