Andrea Muzii memorized a 630-digit number within five minutes. Since the most efficient encoding we can have for random digits is about 3.2 bits, this suggests that Andrea Muzii has at least 2016 bits of working memory. If Muzii was storing 7 chunks in working memory, this would suggest chunks of about 288 bits each.
Per your footnote 6, I wouldn’t expect that the whole 630-digit number was ever simultaneously in working memory.
IIUC, at least some “memory athletes” use navigational memory (via “memory palace”)…
My neuroscience knowledge says that navigational memory heavily involves the hippocampus, and (IIRC) the hippocampus can do one-shot long-term (or at least long-ish-term) storage of information, with a massive capacity.
My common sense / experience says that: if I’m playing a video game and there are 20 subsequent screens, each with memorable features, and someone shows me a route through all those screens (“at the screen with the turtle, go left by the statue. And then you get to the screen with the robot, and you go up by the snake, …”), even if I’ve only seen it once and certainly if I’ve seen it 2 or 3 times, I can trace the route afterwards. But I never had whole route simultaneously in working memory. Rather, I see one screen, and it jogs my memory for what to do next, and then I see the next screen, and it jogs my memory, etc.
Maybe a better example: if I’m memorizing a tune, I’m relying on each part of the song to jog my memory for the next part of the song. The “jog my memory” action presumably involves pulling information out of a kind of storage, i.e. information that is not already in working memory.
Per your footnote 6, I wouldn’t expect that the whole 630-digit number was ever simultaneously in working memory.
How would you like to define “simultaneously in working memory”?
The benefit of an operationalization like the sequential recall task is concreteness and easily tested predictions. I think if we try to talk about the actual information content of the actual memory, we can start to get lost in alternative assumptions. What, exactly, counts as actual working memory?
One way to think about the five-minute memorization task which I used for my calculation is that it measures how much can be written to memory within five minutes, but it does little to test memory volatility (it doesn’t tell us how much of the 630-digit number would have been forgotten after an hour with no rehearsal). If by “short-term memory” we mean memory which only lasts a short while without rehearsal, the task doesn’t differentiate that.
So, “for all we know” from this test, the information gets spread across many different types of memory, some longer-lasting and some shorter-lasting. This is one way of interpreting your point about the 630 digits not all being in working memory.
According to this way of thinking, we can think of the 5 minute memorization period as an extended “write” task. The “about one minute” factoid gets re-stated as: what you can write to memory in five minutes, you could explain in natural language in about one minute, if performing about optimally, and assuming you don’t need to fill in any background context for the explanation.
“5 minutes of lets you capture, at best, 1 minute of spoken material” sounds much less impressive than my one-minute-per-moment headline.
However, this way of thinking about it makes it tempting to think that the memory athlete is able to store a set number of bits into memory per second studying; a linear relationship between study time and the length of sequences which can be recalled. I doubt the relationship is that simple.
The spaced repetition literature suggests a model based on forgetting curves, where the number and pattern of times we’ve reviewed a specific piece of information determines how long we’ll recall it. In this model, we don’t so much think of “short term memory” and “long term memory” capacity, instead focusing on the expected durability of specific memories. This expected durability increases in an understood way with practice.
In contrast to the simple “write to memory” model, this provides a more detailed (and, I think, plausible) account of what goes on during the 5 minutes one has to rehearse: a memory athlete would, presumably, rehearse the sequence, making memories more robust to the passage of time via repetition.
In order to keep a set of information “in working memory” in this paradigm is to keep rehearsing it at a spaced-repetition schedule such that you recall each fact before you forget it. The details of the forgetting curve would enable a prediction for how many such facts can be memorized given an amount of study time.
The natural place to bring up “chunks” here is the amount of information that can fit in an individual memory (a single “fact”). It no longer makes sense to talk about the “total information capacity of short-term memory”, since memory is being modeled on a continuum from short to long, and a restricted capacity like 7±2 is not really part of this type of model. Without running any detailed math on this better sort of model, I suppose the information capacity of a memory would come out close to the “five to ten seconds of spoken language per chunk” which we get when we apply information theory to the Miller model.
However, this way of thinking about it makes it tempting to think that the memory athlete is able to store a set number of bits into memory per second studying; a linear relationship between study time and the length of sequences which can be recalled. I doubt the relationship is that simple.
Yeah this website implies that it’s sublinear—something like 50% more content when they get twice as long to study? Just from quickly eyeballing it.
In order to keep a set of information “in working memory” in this paradigm is to keep rehearsing it at a spaced-repetition schedule such that you recall each fact before you forget it.
I still feel like you’re using the term “working memory” in a different way from how I would use it. Suppose you have 30 minutes to study a list of numbers. You first see Item X and try to memorize it in minute 3. Then you revisit it in minute 9, and it turns out that you’ve already “forgotten it” (in the sense that you would have failed a quiz) but it “rings a bell” when you see it, and you try again to memorize it. I think you’re still benefitting from the longer forgetting curve associated with the second revisit of Item X. But Item X wasn’t “in working memory” in minute 8, by my definitions.
(Note that I don’t know the details of how memory athletes spend their 30 minutes and didn’t check. For all I know they do a single pass.)
You first see Item X and try to memorize it in minute 3. Then you revisit it in minute 9, and it turns out that you’ve already “forgotten it” (in the sense that you would have failed a quiz) but it “rings a bell” when you see it, and you try again to memorize it. I think you’re still benefitting from the longer forgetting curve associated with the second revisit of Item X. But Item X wasn’t “in working memory” in minute 8, by my definitions.
One way to parameterize recall tasks is x,y,z = time you get to study the sequence, time between in which you must maintain the memory, time you get to try and recall the sequence.
During “x”, you get the case you described. I presume it makes sense to do the standard spaced-rep study schedule, where you re-study information at a time when you have some probability of having already forgotten it. (I also have not looked into what memory champions do.)
During “y”, you have to maintain. You still want to rehearse things, but you don’t want to wait until you have some probability of having forgotten, at this point, because the study material is no longer in front of you; if you forget something, it is lost. This is what I was referring to when I described “keeping something in working memory”.
During “z”, you need to try and recall all of the stored information and report it in the correct sequence. I suppose having longer z helps, but the amount it helps probably drops off pretty sharply as z increases. So x and y are in some sense the more important variables.
I still feel like you’re using the term “working memory” in a different way from how I would use it.
So how do you want to use it?
I think my usage is mainly weird because I’m going hard on the operationalization angle, using performance on memory experiments as a definition. I think this way of defining things is particularly practical, but does warp things a lot if we try to derive causal models from it.
I think it’s cool what you’re trying to do, I just wish you had made up your own original term instead of using the existing term “working memory”. To be honest I’m not an expert on exactly how “working memory” is defined, but I’m pretty sure it has some definition, and that this definition is widely accepted (at least in broad outline; probably people argue around the edges), and that this accepted definition is pretty distant from the thing you’re talking about. I’m open to being corrected; like I said, I’m not an expert on memory terminology. :)
The term “working memory” was coined by Miller, and I’m here using his definition. In this sense, I think what I’m doing is about as terminologically legit as one can get. But Miller’s work is old; possibly I should be using newer concepts instead.
That task measures what can be written to memory within 5 minutes, given unlimited time to write relevant compression codes into long-term semantic memory. It’s complex. See my top-level comment.
Per your footnote 6, I wouldn’t expect that the whole 630-digit number was ever simultaneously in working memory.
IIUC, at least some “memory athletes” use navigational memory (via “memory palace”)…
My neuroscience knowledge says that navigational memory heavily involves the hippocampus, and (IIRC) the hippocampus can do one-shot long-term (or at least long-ish-term) storage of information, with a massive capacity.
My common sense / experience says that: if I’m playing a video game and there are 20 subsequent screens, each with memorable features, and someone shows me a route through all those screens (“at the screen with the turtle, go left by the statue. And then you get to the screen with the robot, and you go up by the snake, …”), even if I’ve only seen it once and certainly if I’ve seen it 2 or 3 times, I can trace the route afterwards. But I never had whole route simultaneously in working memory. Rather, I see one screen, and it jogs my memory for what to do next, and then I see the next screen, and it jogs my memory, etc.
Maybe a better example: if I’m memorizing a tune, I’m relying on each part of the song to jog my memory for the next part of the song. The “jog my memory” action presumably involves pulling information out of a kind of storage, i.e. information that is not already in working memory.
How would you like to define “simultaneously in working memory”?
The benefit of an operationalization like the sequential recall task is concreteness and easily tested predictions. I think if we try to talk about the actual information content of the actual memory, we can start to get lost in alternative assumptions. What, exactly, counts as actual working memory?
One way to think about the five-minute memorization task which I used for my calculation is that it measures how much can be written to memory within five minutes, but it does little to test memory volatility (it doesn’t tell us how much of the 630-digit number would have been forgotten after an hour with no rehearsal). If by “short-term memory” we mean memory which only lasts a short while without rehearsal, the task doesn’t differentiate that.
So, “for all we know” from this test, the information gets spread across many different types of memory, some longer-lasting and some shorter-lasting. This is one way of interpreting your point about the 630 digits not all being in working memory.
According to this way of thinking, we can think of the 5 minute memorization period as an extended “write” task. The “about one minute” factoid gets re-stated as: what you can write to memory in five minutes, you could explain in natural language in about one minute, if performing about optimally, and assuming you don’t need to fill in any background context for the explanation.
“5 minutes of lets you capture, at best, 1 minute of spoken material” sounds much less impressive than my one-minute-per-moment headline.
However, this way of thinking about it makes it tempting to think that the memory athlete is able to store a set number of bits into memory per second studying; a linear relationship between study time and the length of sequences which can be recalled. I doubt the relationship is that simple.
The spaced repetition literature suggests a model based on forgetting curves, where the number and pattern of times we’ve reviewed a specific piece of information determines how long we’ll recall it. In this model, we don’t so much think of “short term memory” and “long term memory” capacity, instead focusing on the expected durability of specific memories. This expected durability increases in an understood way with practice.
In contrast to the simple “write to memory” model, this provides a more detailed (and, I think, plausible) account of what goes on during the 5 minutes one has to rehearse: a memory athlete would, presumably, rehearse the sequence, making memories more robust to the passage of time via repetition.
In order to keep a set of information “in working memory” in this paradigm is to keep rehearsing it at a spaced-repetition schedule such that you recall each fact before you forget it. The details of the forgetting curve would enable a prediction for how many such facts can be memorized given an amount of study time.
The natural place to bring up “chunks” here is the amount of information that can fit in an individual memory (a single “fact”). It no longer makes sense to talk about the “total information capacity of short-term memory”, since memory is being modeled on a continuum from short to long, and a restricted capacity like 7±2 is not really part of this type of model. Without running any detailed math on this better sort of model, I suppose the information capacity of a memory would come out close to the “five to ten seconds of spoken language per chunk” which we get when we apply information theory to the Miller model.
This model also has many problems, of course.
Yeah this website implies that it’s sublinear—something like 50% more content when they get twice as long to study? Just from quickly eyeballing it.
I still feel like you’re using the term “working memory” in a different way from how I would use it. Suppose you have 30 minutes to study a list of numbers. You first see Item X and try to memorize it in minute 3. Then you revisit it in minute 9, and it turns out that you’ve already “forgotten it” (in the sense that you would have failed a quiz) but it “rings a bell” when you see it, and you try again to memorize it. I think you’re still benefitting from the longer forgetting curve associated with the second revisit of Item X. But Item X wasn’t “in working memory” in minute 8, by my definitions.
(Note that I don’t know the details of how memory athletes spend their 30 minutes and didn’t check. For all I know they do a single pass.)
One way to parameterize recall tasks is x,y,z = time you get to study the sequence, time between in which you must maintain the memory, time you get to try and recall the sequence.
During “x”, you get the case you described. I presume it makes sense to do the standard spaced-rep study schedule, where you re-study information at a time when you have some probability of having already forgotten it. (I also have not looked into what memory champions do.)
During “y”, you have to maintain. You still want to rehearse things, but you don’t want to wait until you have some probability of having forgotten, at this point, because the study material is no longer in front of you; if you forget something, it is lost. This is what I was referring to when I described “keeping something in working memory”.
During “z”, you need to try and recall all of the stored information and report it in the correct sequence. I suppose having longer z helps, but the amount it helps probably drops off pretty sharply as z increases. So x and y are in some sense the more important variables.
So how do you want to use it?
I think my usage is mainly weird because I’m going hard on the operationalization angle, using performance on memory experiments as a definition. I think this way of defining things is particularly practical, but does warp things a lot if we try to derive causal models from it.
I think it’s cool what you’re trying to do, I just wish you had made up your own original term instead of using the existing term “working memory”. To be honest I’m not an expert on exactly how “working memory” is defined, but I’m pretty sure it has some definition, and that this definition is widely accepted (at least in broad outline; probably people argue around the edges), and that this accepted definition is pretty distant from the thing you’re talking about. I’m open to being corrected; like I said, I’m not an expert on memory terminology. :)
The term “working memory” was coined by Miller, and I’m here using his definition. In this sense, I think what I’m doing is about as terminologically legit as one can get. But Miller’s work is old; possibly I should be using newer concepts instead.
That task measures what can be written to memory within 5 minutes, given unlimited time to write relevant compression codes into long-term semantic memory. It’s complex. See my top-level comment.
I’m sure you know this but the ‘jog my memory’ is plausibly explained by memory being like a hopfield network;