The mnemonsyth data is laying around for years without anyone analysing them to find more effective algorithm for human learning.
Personally, I don’t expect much from the data. From reading through scores of papers comparing minute differences in spacing and getting contradictory results and small improvements, I get the impression that once you’ve moved from massed to spacing (almost any kind of spacing), you’ve gotten the overwhelming majority of the benefits, and the rest is basically frippery which needs a lot of domain expertise to improve upon. I understand Peter hasn’t looked at the Mnemosyne data much either because it didn’t indicate to him that the fancier SuperMemo algorithms were much help.
But I could be wrong. I haven’t worked with the Mnemosyne data very much beyond looking at correlating scores with hour of day and day of week; I’ve been waiting for the data to import to SQL to work with the whole dataset.
(So far I’m up to 81%… I’m hopeful that the 1TB SSD I just ordered will help speed things up a lot, and then I can host the SQL on Amazon S3 or something for anyone who is interested; a quick estimate is that it’ll cost me ~$5-10 a month or $60-120 a year to host, but I figure that I can solicit some donations to help cover it. If nothing else, it’ll save Peter a lot of time and effort in uploading the raw logs for each person who asks him. EDIT: the SSD sped things up even more than I expected: processing time goes from months to ~25 hours. So I deleted it and am fetching a fresher dataset to process & distribute.)
Personally, I don’t expect much from the data. From reading through scores of papers comparing minute differences in spacing and getting contradictory results and small improvements, I get the impression that once you’ve moved from massed to spacing (almost any kind of spacing), you’ve gotten the overwhelming majority of the benefits, and the rest is basically frippery which needs a lot of domain expertise to improve upon. I understand Peter hasn’t looked at the Mnemosyne data much either because it didn’t indicate to him that the fancier SuperMemo algorithms were much help.
What do you think are the prospects of a SRS that uses a forgetting curve specific to the individual, by relying on past performance? Has this been tried or considered?
You can already modify the forgetting curve yourself in most SRS based on your needs via a constant. Unless an automatic algorithm goes with the personal best past performance, I expect a continuous decay of performance using such an algorithm for most individuals. I think Anki already automatically modifies intervals of individual cards based on your past performance i.e. the experienced difficulty and instances of forgetting, for example. New cards are not affected by past performance, as far as I know.
You need to specify which parts are being modified by an SRS system: each card has an easiness parameter and that will be continuously modified based your performance, but I don’t think existing SRS systems like Anki or Mnemosyne will modify other parts of the curve like the exponent. For example, SM2′s algorithm runs in part based on updating the easiness as EF+(0.1-(5-q)*(0.08+(5-q)*0.02)) - the EF will be progressively updated, but the formula itself never changes even if 0.1 is not ideal and 0.15 would be better or something.
Both Peter and Damien think that the further SuperMemo algorithms provide no benefit.
As far as I know they make they don’t make that judgement because of data but because they have a feeling the the algorithm isn’t better.
Piotr Wozniak who actually did run the data claims:
Below you will find a general outline of the seventh major formulation of the repetition spacing algorithm used in SuperMemo. It is referred to as Algorithm SM-11 since it was first implemented in SuperMemo 11.0 (SuperMemo 2002). Although the increase in complexity of Algorithm SM-11 as compared with its predecessors (e.g. Algorithm SM-6) is incomparably greater than the expected benefit for the user, there is a substantial theoretical and practical evidence that the increase in the speed of learning resulting from the upgrade may fall into the range from 30 to 50%.
I don’t think that’s it’s certain that Piotr is right. On the other hand if he’s right that’s on a scale that matters a great deal.
If you are better at estimating when a card will be forgotten you are also nearer at the point where you do deliberate practice that might make you better at learning.
The second issue is daily memory performance variation. I’m not sure but I think there might be days when the brain doesn’t work well at storing memories. If you answer 200 cards on such a day and they get sheduled into the future and you get 20 of the first 30 cards wrong when they get tested again it would make sense to reshedule the rest of the 170 cards to a time closer to the present.
We do have practical issues that the present algorithm doesn’t handle well. You can’t tell the present algorithm that you want to really know all the facts in a deck at a particular date when you write an exam.
Having a stable mathematical theory that can predict when a card would be forgotten can help towards that end.
You might also think about the kind of tools that psychologists use to measure a trait like unconscious racism in the present. Words or images get flashed for short time durations. You might measure unconscious racism the same way through testing people long-term memory for the ability to remember related information.
If you both have the tool of flashing images and the tool that measures the effect of unconscious racism on long term memory you can start asking questions such as: “Which unconscious racism metric changes first and which lags behind?”
The Mnemonsyth data doesn’t allow us to answer that question but it can provide a foundation on which the mathematical theories for long-term memory can build that help you to run that experiment.
It can be the basis for learning stuff about the way the human mind works that you can’t get by gathering 50 participants and putting them into an fMRI while you ask questions.
Scientific progress often comes from progress in underlying tools and frameworks.
Piotr Wozniak who actually did run the data claims:
I’m not sure what data he has run; skimming that page doesn’t help much. I know he has no dataset comparable to the Mnemosyne dataset because I sent him my initial results a few months ago and he told me so, so it can’t be based on that.
At the present time he has Supermemo Online and that should provide an interesting data set. But I don’t think he had that dataset at the time he wrote those lines.
I think Piotr worked a lot with his own data. But he also writes:
The increase in the speed of the convergence was achieved by employing actual approximation data obtained from students who used SuperMemo 6 and/or SuperMemo 7
Algorithm SM-8 is constantly being perfected in successive releases of SuperMemo, esp. to account for newly collected repetition data, convergence data, input parameters, etc.
He also described it in his thesis in a bit of detail.
32 test subjects does not compare to the Mnemosyne dataset but it does provide plenty of data for testing algorithms and the might be enough data to decide that SM-8 is significantly better than SM-2.
Personally, I don’t expect much from the data. From reading through scores of papers comparing minute differences in spacing and getting contradictory results and small improvements, I get the impression that once you’ve moved from massed to spacing (almost any kind of spacing), you’ve gotten the overwhelming majority of the benefits, and the rest is basically frippery which needs a lot of domain expertise to improve upon. I understand Peter hasn’t looked at the Mnemosyne data much either because it didn’t indicate to him that the fancier SuperMemo algorithms were much help.
But I could be wrong. I haven’t worked with the Mnemosyne data very much beyond looking at correlating scores with hour of day and day of week; I’ve been waiting for the data to import to SQL to work with the whole dataset.
(So far I’m up to 81%… I’m hopeful that the 1TB SSD I just ordered will help speed things up a lot, and then I can host the SQL on Amazon S3 or something for anyone who is interested; a quick estimate is that it’ll cost me ~$5-10 a month or $60-120 a year to host, but I figure that I can solicit some donations to help cover it. If nothing else, it’ll save Peter a lot of time and effort in uploading the raw logs for each person who asks him. EDIT: the SSD sped things up even more than I expected: processing time goes from months to ~25 hours. So I deleted it and am fetching a fresher dataset to process & distribute.)
What do you think are the prospects of a SRS that uses a forgetting curve specific to the individual, by relying on past performance? Has this been tried or considered?
You can already modify the forgetting curve yourself in most SRS based on your needs via a constant. Unless an automatic algorithm goes with the personal best past performance, I expect a continuous decay of performance using such an algorithm for most individuals. I think Anki already automatically modifies intervals of individual cards based on your past performance i.e. the experienced difficulty and instances of forgetting, for example. New cards are not affected by past performance, as far as I know.
You need to specify which parts are being modified by an SRS system: each card has an easiness parameter and that will be continuously modified based your performance, but I don’t think existing SRS systems like Anki or Mnemosyne will modify other parts of the curve like the exponent. For example, SM2′s algorithm runs in part based on updating the easiness as
EF+(0.1-(5-q)*(0.08+(5-q)*0.02))
- the EF will be progressively updated, but the formula itself never changes even if 0.1 is not ideal and 0.15 would be better or something.Both Peter and Damien think that the further SuperMemo algorithms provide no benefit.
As far as I know they make they don’t make that judgement because of data but because they have a feeling the the algorithm isn’t better.
Piotr Wozniak who actually did run the data claims:
I don’t think that’s it’s certain that Piotr is right. On the other hand if he’s right that’s on a scale that matters a great deal.
If you are better at estimating when a card will be forgotten you are also nearer at the point where you do deliberate practice that might make you better at learning.
The second issue is daily memory performance variation. I’m not sure but I think there might be days when the brain doesn’t work well at storing memories. If you answer 200 cards on such a day and they get sheduled into the future and you get 20 of the first 30 cards wrong when they get tested again it would make sense to reshedule the rest of the 170 cards to a time closer to the present.
We do have practical issues that the present algorithm doesn’t handle well. You can’t tell the present algorithm that you want to really know all the facts in a deck at a particular date when you write an exam. Having a stable mathematical theory that can predict when a card would be forgotten can help towards that end.
You might also think about the kind of tools that psychologists use to measure a trait like unconscious racism in the present. Words or images get flashed for short time durations. You might measure unconscious racism the same way through testing people long-term memory for the ability to remember related information.
If you both have the tool of flashing images and the tool that measures the effect of unconscious racism on long term memory you can start asking questions such as: “Which unconscious racism metric changes first and which lags behind?”
The Mnemonsyth data doesn’t allow us to answer that question but it can provide a foundation on which the mathematical theories for long-term memory can build that help you to run that experiment.
It can be the basis for learning stuff about the way the human mind works that you can’t get by gathering 50 participants and putting them into an fMRI while you ask questions.
Scientific progress often comes from progress in underlying tools and frameworks.
I’m not sure what data he has run; skimming that page doesn’t help much. I know he has no dataset comparable to the Mnemosyne dataset because I sent him my initial results a few months ago and he told me so, so it can’t be based on that.
At the present time he has Supermemo Online and that should provide an interesting data set. But I don’t think he had that dataset at the time he wrote those lines.
I think Piotr worked a lot with his own data. But he also writes:
He also described it in his thesis in a bit of detail. 32 test subjects does not compare to the Mnemosyne dataset but it does provide plenty of data for testing algorithms and the might be enough data to decide that SM-8 is significantly better than SM-2.