All of my calculations are in an excel spreadsheet, so I’ll email you the text of the post, as well as the excel file, if you’re interested in looking over my work.
One of the trends I’ve seen happening that I’m a fan of is writing posts/papers/etc. in R, so that the analysis can be trivially reproduced or altered. In general, spreadsheets are notoriously prone to calculation errors because the underlying code is hidden and decentralized; it’s much easier to look at a python or R script and check its consistency than an Excel table.
(It’s better to finish this project as is than to delay this project until you know enough Python or R to reproduce the analysis, but something to think about for future projects / something to do if you already know enough Python or R.)
Spreadsheets can be reproduced and altered just as any code. I think the purpose of writing a post in code is mainly about keeping the code in sync with the exposition. But this was the purpose of MS Office before R even existed.
I am skeptical of spreadsheets, but is there any evidence that they are worse than any other kind of code? Indeed
These error rates, although troubling, are in line with those in programming and other human cognitive domains.
(I am not sure what that means. If the per-cell error rate is the same as the per-line rate of conventional programming, that definitely counts as spreadsheets being terrible. But I think the claim is 0.5% per-cell error rate and 5% per-line error rate.)
Even if there were evidence that spreadsheets are worse than other codebases, I would be hesitant to blame the spreadsheets, rather than the operators. It is true that there are many classes of errors that they make possible, but they also have the positive effect of encouraging the user to look at intermediate steps in the calculation. I suspect that the biggest problem with spreadsheets is that they are used by amateurs. People see them as safe and easy, while they see conventional code as difficult and dangerous.
I can take a look; you know my email.
One of the trends I’ve seen happening that I’m a fan of is writing posts/papers/etc. in R, so that the analysis can be trivially reproduced or altered. In general, spreadsheets are notoriously prone to calculation errors because the underlying code is hidden and decentralized; it’s much easier to look at a python or R script and check its consistency than an Excel table.
(It’s better to finish this project as is than to delay this project until you know enough Python or R to reproduce the analysis, but something to think about for future projects / something to do if you already know enough Python or R.)
Spreadsheets can be reproduced and altered just as any code. I think the purpose of writing a post in code is mainly about keeping the code in sync with the exposition. But this was the purpose of MS Office before R even existed.
I am skeptical of spreadsheets, but is there any evidence that they are worse than any other kind of code? Indeed
(I am not sure what that means. If the per-cell error rate is the same as the per-line rate of conventional programming, that definitely counts as spreadsheets being terrible. But I think the claim is 0.5% per-cell error rate and 5% per-line error rate.)
Even if there were evidence that spreadsheets are worse than other codebases, I would be hesitant to blame the spreadsheets, rather than the operators. It is true that there are many classes of errors that they make possible, but they also have the positive effect of encouraging the user to look at intermediate steps in the calculation. I suspect that the biggest problem with spreadsheets is that they are used by amateurs. People see them as safe and easy, while they see conventional code as difficult and dangerous.
The key word missing here is inspected, which seems like the core difference to me.
I agree with this.