(I also tried Excel, but the dataset was too large to load everything in. In retrospect, I realize I could have just loaded in the first million rows − 2⁄3 of the dataset, more than enough to get statistically significant results from—and analyzed that, possibly keeping the remaining ~400k rows as a testing set.)
I started out with Excel, but it could only load, as abstractapplic noted, about 2⁄3 of the dataset. I considered using just that, or splitting the data, but then decided that since I had been thinking of trying out doing data analysis in Haskell, I would abandon Excel and try out Haskell.
After various hangups, including most perniciously the stubborn refusal of the Parsec library to modify its operation to conform to my mental model of how it works, I still haven’t actually loaded the data in my program in a usable form. But I’m hoping I’ll manage it soon. And then I have to figure out how to actually process and get data out in a usable form...
Hm. I thought the large dataset would help analysis, and would be pretty easy to truncate if desired, but it seems that wasn’t as obvious as I hoped. If people are having trouble with it, I’ve added a smaller version by generating the first 200k rows of the large dataset, see here.
This cuts down to a bit under 1⁄6 the size of the main dataset, which will make it harder to identify effects, but if you can’t use the main dataset I assume this is better than nothing.
I am now requesting extra time. I’ve loaded the data using my Haskell program (fixed the parsing late last night), and used it to check
basic stats for number of players, aspects, classes including individual player aspect/class combos, which all seem pretty evenly distributed and none of which seem to effect winrate very much except number of players and that’s not that big either
but still need to
check for interactions and especially look for symmetries in those interactions—relatively constant winrates overall suggests symmetric interactions unless the effects are weak
edited to add:
No wait, the variations in winrate for individual player aspect/class combos don’t look that small at all. Noticed this shortly after making the above comment but didn’t want to actually make the edit until I had got the Haskell program to calculate the p-values though the variations were obviously too big to be random if the variations in total numbers for each combo were assumed to be random. The variations in winrates of classes and aspects, while much smaller, are still strongly statistically significant in some cases (if I got the program to do the right math).
Since I was busy with that I haven’t gotten around to looking at correlations between different players in the same team yet. There definitely do seem to be patterns in which classes go with which aspects for individual player aspect/class combos, though.
And looks like I could use the weekend as well, if that’s OK. Though, if other players object, I do feel like I am abusing this a bit—the time ratio between “data analysis” vs “learning Haskell” has been low.
Thanks aphyer, I might end up requesting extra time, though I don’t need the 200k row dataset; if I wanted to I could just accept what Excel (or Libre/OpenOfficeCalc) truncates it to.
What are people using to load and analyze the data?
I used the python package Pandas.
(I also tried Excel, but the dataset was too large to load everything in. In retrospect, I realize I could have just loaded in the first million rows − 2⁄3 of the dataset, more than enough to get statistically significant results from—and analyzed that, possibly keeping the remaining ~400k rows as a testing set.)
I started out with Excel, but it could only load, as abstractapplic noted, about 2⁄3 of the dataset. I considered using just that, or splitting the data, but then decided that since I had been thinking of trying out doing data analysis in Haskell, I would abandon Excel and try out Haskell.
After various hangups, including most perniciously the stubborn refusal of the Parsec library to modify its operation to conform to my mental model of how it works, I still haven’t actually loaded the data in my program in a usable form. But I’m hoping I’ll manage it soon. And then I have to figure out how to actually process and get data out in a usable form...
Hm. I thought the large dataset would help analysis, and would be pretty easy to truncate if desired, but it seems that wasn’t as obvious as I hoped. If people are having trouble with it, I’ve added a smaller version by generating the first 200k rows of the large dataset, see here.
This cuts down to a bit under 1⁄6 the size of the main dataset, which will make it harder to identify effects, but if you can’t use the main dataset I assume this is better than nothing.
If anyone wants extra time, let me know.
I am now requesting extra time. I’ve loaded the data using my Haskell program (fixed the parsing late last night), and used it to check
basic stats for number of players, aspects, classes including individual player aspect/class combos, which all seem pretty evenly distributed and none of which seem to effect winrate very much except number of players and that’s not that big either
but still need to
check for interactions and especially look for symmetries in those interactions—relatively constant winrates overall suggests symmetric interactions unless the effects are weak
edited to add:
No wait, the variations in winrate for individual player aspect/class combos don’t look that small at all. Noticed this shortly after making the above comment but didn’t want to actually make the edit until I had got the Haskell program to calculate the p-values though the variations were obviously too big to be random if the variations in total numbers for each combo were assumed to be random. The variations in winrates of classes and aspects, while much smaller, are still strongly statistically significant in some cases (if I got the program to do the right math).
Since I was busy with that I haven’t gotten around to looking at correlations between different players in the same team yet. There definitely do seem to be patterns in which classes go with which aspects for individual player aspect/class combos, though.
Understood, no worries! I’ll aim to post the solution on Friday unless I hear further—if you want another weekend I could instead do next Monday.
And looks like I could use the weekend as well, if that’s OK. Though, if other players object, I do feel like I am abusing this a bit—the time ratio between “data analysis” vs “learning Haskell” has been low.
Fine with me
If it helps, I for one am completely okay with you taking the weekend.
Thanks!
Thanks aphyer, I might end up requesting extra time, though I don’t need the 200k row dataset; if I wanted to I could just accept what Excel (or Libre/OpenOfficeCalc) truncates it to.