I liked the bonus objective myself, but maybe I’m biased about that...
As a someone who is also not a “data scientist” (but just plays one on lesswrong), I also don’t know what exactly actual “data science” is, but I guess it’s likely intended to mean using more advanced techniques?
(And if I can pull the same Truth from the void with less powerful tools, should that not mark me as more powerous in the Art? :P)
Perhaps, but don’t make a virtue of not using the more powerful tools, the objective is to find the truth, not to find it with handicaps...
Speaking of which one thing that could help making things easier is aggregating data, eliminating information you think is irrelevant. For example, in this case, I assumed early on (without actually checking) that timing would likely be irrelevant, so aggregated data for ingredient combinations. As in, each tried ingredient combination gets only one row, with the numbers of different outcomes listed. You can do this by assigning a unique identifier to each ingredient combination (in this case you can just concatenate over the ingredient list), then counting the results for the different unique identifiers. Countifs has poor performance for large data sets, but you can sort using the identifiers then make a column that adds up the number of rows (or, the number of rows with a particular outcome) since the last change in the identifier, and then filter the rows for the last row before the change in the identifier (be wary of off-by-one errors). Then copy the result (values only) to a new sheet.
This also reduces the number of rows, though not enormously in this case.
Of course, in this case, it turns out that timing was relevant, not for outcomes but only for the ingredient selection (so I would have had to reconsider this assumption to figure out the ingredient selection).
Perhaps, but don’t make a virtue of not using the more powerful tools, the objective is to find the truth, not to find it with handicaps...
I’m obviously seeking out more powerful tools, too—I just haven’t got them yet. I don’t think it’s intrinsically good to stick to less powerful tools, but I do think that it’s intrinsically good to be able to fall back to those tools if you can still win.
And when I need to go out and find truth for real, I don’t deny myself tools, and I rarely go it alone. But this is not that.
You don’t need to justify—hail fellow D&Dsci player, I appreciate your competition and detailed writeup of your results, and I hope to see you in the next d&dsci!
I liked the bonus objective myself, but maybe I’m biased about that...
As a someone who is also not a “data scientist” (but just plays one on lesswrong), I also don’t know what exactly actual “data science” is, but I guess it’s likely intended to mean using more advanced techniques?
Perhaps, but don’t make a virtue of not using the more powerful tools, the objective is to find the truth, not to find it with handicaps...
Speaking of which one thing that could help making things easier is aggregating data, eliminating information you think is irrelevant. For example, in this case, I assumed early on (without actually checking) that timing would likely be irrelevant, so aggregated data for ingredient combinations. As in, each tried ingredient combination gets only one row, with the numbers of different outcomes listed. You can do this by assigning a unique identifier to each ingredient combination (in this case you can just concatenate over the ingredient list), then counting the results for the different unique identifiers. Countifs has poor performance for large data sets, but you can sort using the identifiers then make a column that adds up the number of rows (or, the number of rows with a particular outcome) since the last change in the identifier, and then filter the rows for the last row before the change in the identifier (be wary of off-by-one errors). Then copy the result (values only) to a new sheet.
This also reduces the number of rows, though not enormously in this case.
Of course, in this case, it turns out that timing was relevant, not for outcomes but only for the ingredient selection (so I would have had to reconsider this assumption to figure out the ingredient selection).
I’m obviously seeking out more powerful tools, too—I just haven’t got them yet. I don’t think it’s intrinsically good to stick to less powerful tools, but I do think that it’s intrinsically good to be able to fall back to those tools if you can still win.
And when I need to go out and find truth for real, I don’t deny myself tools, and I rarely go it alone. But this is not that.
You don’t need to justify—hail fellow D&Dsci player, I appreciate your competition and detailed writeup of your results, and I hope to see you in the next d&dsci!