As Jay Bailey mentioned, you can look at how other players approached challenges, and copy the approaches that worked. Pablo Repetto’s playthroughs of three early .scis seem particularly worthwhile given your situation, both because of how comprehensive & well-written they are, and because they were made by someone in the process of learning to use code on data science problems (the first playthrough was done in pure Excel, the other two were handled in Python).
By following a sensible strategy
Below is my standard plan for investigating a dataset, synthetic or otherwise (cribbed from an otherwise-mediocre Udacity course I took most of a decade ago, and still worth following).
-
Univariate Analysis: How is each feature distributed when considered in isolation? You should probably make a histogram for each column.
Bivariate Analysis: Construct and check the correlation matrix between all features. Are there clusters? Create scatterplots (or equivalent) for any pair of features which correlate unusually strongly, any pair of features where at least one is a response variable, and any pair of features you find yourself curious about.
Feature Derivation: Based on what you’ve seen so far – and/or common sense – are there meaningful features you can create from what you’ve been provided? (i.e., if you’re given “Number of Wizards”, “Number of Sorcerors” and “Number of Druids” for each row, it might be worth creating a “Total Number of Magic Users” column.) Investigate how these features interact with others.
ML Modelling: If you can, and it seems like a good idea, build an ML model predicting the important/unknown features from those you have. If constructed successfully, this becomes an oracle you can ask about the outcome of any possible choice you could make. (XGBoost and similar tools are extremely versatile, and have pretty good performance on most problems.)
-
(The above is just a rough guide for what to do when you don’t know what to do. If you follow it, you should pretty quickly find yourself with a list of rabbitholes to fall down; you should probably err on the side of dropping everything and deviating from the path as soon as you find something interesting.)
By playing easier D&D.Scis
Difficulty of D&D.Sci games tends to be both high and high-variance; it’s usually assumed that players will have both data-manipulation and model-building skills. For what it’s worth, I can confirm that two relatively-approachable scenarios where not-using-ML won’t put you at a disadvantage are (spoilered because this technically leaks information about them):
By imitating other players
As Jay Bailey mentioned, you can look at how other players approached challenges, and copy the approaches that worked. Pablo Repetto’s playthroughs of three early .scis seem particularly worthwhile given your situation, both because of how comprehensive & well-written they are, and because they were made by someone in the process of learning to use code on data science problems (the first playthrough was done in pure Excel, the other two were handled in Python).
By following a sensible strategy
Below is my standard plan for investigating a dataset, synthetic or otherwise (cribbed from an otherwise-mediocre Udacity course I took most of a decade ago, and still worth following).
-
Univariate Analysis: How is each feature distributed when considered in isolation? You should probably make a histogram for each column.
Bivariate Analysis: Construct and check the correlation matrix between all features. Are there clusters? Create scatterplots (or equivalent) for any pair of features which correlate unusually strongly, any pair of features where at least one is a response variable, and any pair of features you find yourself curious about.
Feature Derivation: Based on what you’ve seen so far – and/or common sense – are there meaningful features you can create from what you’ve been provided? (i.e., if you’re given “Number of Wizards”, “Number of Sorcerors” and “Number of Druids” for each row, it might be worth creating a “Total Number of Magic Users” column.) Investigate how these features interact with others.
ML Modelling: If you can, and it seems like a good idea, build an ML model predicting the important/unknown features from those you have. If constructed successfully, this becomes an oracle you can ask about the outcome of any possible choice you could make. (XGBoost and similar tools are extremely versatile, and have pretty good performance on most problems.)
-
(The above is just a rough guide for what to do when you don’t know what to do. If you follow it, you should pretty quickly find yourself with a list of rabbitholes to fall down; you should probably err on the side of dropping everything and deviating from the path as soon as you find something interesting.)
By playing easier D&D.Scis
Difficulty of D&D.Sci games tends to be both high and high-variance; it’s usually assumed that players will have both data-manipulation and model-building skills. For what it’s worth, I can confirm that two relatively-approachable scenarios where not-using-ML won’t put you at a disadvantage are (spoilered because this technically leaks information about them):
The Sorceror’s Personal Shopper and The Oracle and the Monk