I think you’re overthinking it. In all the n-back research I’ve read, hardly anyone seems to’ve cared about how exactly the adaptiveness was implemented...
The game is broken down into waves, each of which presents an N-back-like task with certain parameters, such as the number of attributes, the number of variants in each attribute, the tempo, and so on. I would like to find a way to collapse these parameters into a single difficulty parameter that I can compare against a player’s skill level to predict their performance on a given wave.
Play a bunch of level with random settings for each parameter, then regress the scores on all the parameters. There, now you have a predictor.
But I realize that some players will be better at some challenges than others (e.g. memory, matching multiple attributes, handling fast tempos, dealing with visual distractions like rotation, or recognizing letters). Skill and difficulty are multidimensional quantities, and this makes performance hard to predict. The question is, is there a single-parameter approximation that delivers an adequate experience?
OK fine, start with a pre-specified linear model or whatever, and then update it periodically with the user’s data. As the user’s data builds up, the regression picks up their particular strengths/weaknesses.
A player with higher skill should have smaller timing errors, so a well-timed match is evidence for higher skill. I am still unsure exactly how I can use this information optimally.
Take the absolute value of the ‘true’ time minus when the user actually responded. There, you have an ‘timing error’ value which can be fed into the regression along with the others.
How do I model player skill over time (just a time-weighted average? as a series of slopes and plateaus?
My impression from the DNB score series I’ve seen and discussions is that n-back exhibits the usual logarithmic-looking graph: fast initial improvements then a plateau. So, a log model.
how should I expect skill to change over a period of time without any play?)?
Very slowly degrade. Some of the n-back studies have done 6-month followups and the like, and the degradation doesn’t seem very large.
I do have some concerns over data privacy, so I may allow users to opt out of sending their data up to the server.
You should definitely provide an opt-out.
Determining difficulty is another hard problem. I currently have a complicated ad-hoc formula that I cobbled together with logarithms, exponentials, and magic numbers, and lots of trial and error. It seems to work pretty well for the limited set of levels I’ve tested with a small group of playtesters, but I’m worried that it won’t predict difficulty well outside of that domain.
In all the n-back research I’ve read, hardly anyone seems to’ve cared about how exactly the adaptiveness was implemented...
The adaptiveness (in Brain Workshop, which is the only implementation I’ve spent much time with) feels pretty horrible. At least compared to a typical modern well-balanced game.
Play a bunch of level with random settings for each parameter, then regress the scores on all the parameters. There, now you have a predictor.
You’re talking about least squares, or some modification of least squares that deals with outliers, right? Doesn’t this assume linearity? Or I suppose I’d have to model, say, the effect of increasing the N-back as aN^b (or some other suitable formula) and plug in values for a and b as parameters? But that means I have to take some time choosing a good model.
Also, it’s important for users to have a good first experience, so I’d like to require as little personalized training data as possible (I will have some data available, since the adaptive mode won’t be unlocked until clearing some number of earlier baked-in stages). Having some notion of a standard difficulty-with-respect-to-an-idealized-player should help with this.
To be fair, I haven’t actually tried running any kind of regression like this yet, and this approach pretty clearly seems worth trying.
So, a log model.
Log in the number of plays, I assume. This sounds pretty reasonable, but I expect won’t take into account some of the more interesting features of each player’s skill trajectory, especially early on. Of course this could be based on my misinterpretations of personal experiences with learning, so I could easily be wrong. It would certainly make things simpler!
Why didn’t any simpler approaches work?
If you mean simple like least squares regression, mostly because I didn’t try it. If you mean a simple formula, I wouldn’t be surprised if none exists (the formula I’m using is actually not quite as complicated as I made it sound). Anyway, my understanding is that even if I use regression on individual user data I’d need to use a pretty complex model with lots of parameters to make it work. Is this not true?
Also, is the Bernoulli trials stuff reasonable, or am I making things too complicated with that too?
The adaptiveness (in Brain Workshop, which is the only implementation I’ve spent much time with) feels pretty horrible. At least compared to a typical modern well-balanced game.
N-back simply isn’t a fun game in the first place, so I don’t know how much the adaptiveness is to blame.
You’re talking about least squares, or some modification of least squares that deals with outliers, right? Doesn’t this assume linearity? Or I suppose I’d have to model, say, the effect of increasing the N-back as aN^b (or some other suitable formula) and plug in values for a and b as parameters? But that means I have to take some time choosing a good model.
Yes, least squares requires a lot of assumptions to be provably optimal. On the other hand, it works all the time. Stupid simple approaches do that pretty frequently.
Anyway, my understanding is that even if I use regression on individual user data I’d need to use a pretty complex model with lots of parameters to make it work. Is this not true?
I don’t see why it would be, necessarily. What sort of complex model did you have in mind?
Also, is the Bernoulli trials stuff reasonable, or am I making things too complicated with that too?
I’m not sure what the binomial stuff is gaining you over a simple %-correct number. If the user gets 2 out of 10 matches right, then the max-likelihood estimate of the underlying probability under a binomial model is going to be… 0.2. You bring in the binomial/Bernoulli stuff when you want to do something more complex.
I think you’re overthinking it. In all the n-back research I’ve read, hardly anyone seems to’ve cared about how exactly the adaptiveness was implemented...
Play a bunch of level with random settings for each parameter, then regress the scores on all the parameters. There, now you have a predictor.
OK fine, start with a pre-specified linear model or whatever, and then update it periodically with the user’s data. As the user’s data builds up, the regression picks up their particular strengths/weaknesses.
Take the absolute value of the ‘true’ time minus when the user actually responded. There, you have an ‘timing error’ value which can be fed into the regression along with the others.
My impression from the DNB score series I’ve seen and discussions is that n-back exhibits the usual logarithmic-looking graph: fast initial improvements then a plateau. So, a log model.
Very slowly degrade. Some of the n-back studies have done 6-month followups and the like, and the degradation doesn’t seem very large.
You should definitely provide an opt-out.
Why didn’t any simpler approaches work?
Thanks, gwern, this is extremely helpful.
This is quite likely. :)
The adaptiveness (in Brain Workshop, which is the only implementation I’ve spent much time with) feels pretty horrible. At least compared to a typical modern well-balanced game.
You’re talking about least squares, or some modification of least squares that deals with outliers, right? Doesn’t this assume linearity? Or I suppose I’d have to model, say, the effect of increasing the N-back as aN^b (or some other suitable formula) and plug in values for a and b as parameters? But that means I have to take some time choosing a good model.
Also, it’s important for users to have a good first experience, so I’d like to require as little personalized training data as possible (I will have some data available, since the adaptive mode won’t be unlocked until clearing some number of earlier baked-in stages). Having some notion of a standard difficulty-with-respect-to-an-idealized-player should help with this.
To be fair, I haven’t actually tried running any kind of regression like this yet, and this approach pretty clearly seems worth trying.
Log in the number of plays, I assume. This sounds pretty reasonable, but I expect won’t take into account some of the more interesting features of each player’s skill trajectory, especially early on. Of course this could be based on my misinterpretations of personal experiences with learning, so I could easily be wrong. It would certainly make things simpler!
If you mean simple like least squares regression, mostly because I didn’t try it. If you mean a simple formula, I wouldn’t be surprised if none exists (the formula I’m using is actually not quite as complicated as I made it sound). Anyway, my understanding is that even if I use regression on individual user data I’d need to use a pretty complex model with lots of parameters to make it work. Is this not true?
Also, is the Bernoulli trials stuff reasonable, or am I making things too complicated with that too?
N-back simply isn’t a fun game in the first place, so I don’t know how much the adaptiveness is to blame.
Yes, least squares requires a lot of assumptions to be provably optimal. On the other hand, it works all the time. Stupid simple approaches do that pretty frequently.
I don’t see why it would be, necessarily. What sort of complex model did you have in mind?
I’m not sure what the binomial stuff is gaining you over a simple %-correct number. If the user gets 2 out of 10 matches right, then the max-likelihood estimate of the underlying probability under a binomial model is going to be… 0.2. You bring in the binomial/Bernoulli stuff when you want to do something more complex.
Make sure to do this with an experienced player, to get on that plateau.
You might need a different set of parameters for newbies. Of course, measuring for them is complicated by the rapid change in their abilities.