Spreadsheets make it really easy to set up a simple “mathematical model”. It really doesn’t take more than about 5 minutes to learn enough about spreadsheets to get something useful going, and actually starting a spreadsheet has very low overhead, both in terms of what you have to do and of how much thought you have to put into it.
The problem with that is that it’s easy to start using them for everything, including things that are really too complicated to be safely done in a spreadsheet. It’s also possible to have something that started out as a reasonable spreadsheet application grow from that into an unreliable, unmaintainable monstosity.
If you view a spreadsheet as a program, it’s written in a “write only language”. It’s really hard to come into a big spreadsheet and understand how everything works, or know how to safely make a change, or meaningfully review it for correctness, or even apply revision control to it. There’s no global view; you have to interact with the whole thing one cell at a time. And you’re not exactly encouraged to give things meaningful names, either.
… but it’s SO EASY to start with a spreadsheet that people are often lulled into making one, and then adding more and more to it. When your spreadsheet reaches a certain complexity level, you may then find yourself investing time into learning more and more arcane features so you can extend what it does… which means that, for you, a spreadsheet will become even more the default tool the next time you want to do something. But it’ll still be a bad programming language.
You can end up with “spreadsheet experts” who use them for everything. If I had a nickel for every spreadsheet I’ve seen that should have been a database, for example...
It’s sort of like writing shell scripts; it’s trivial to write a script to automate a few commands you do all the time, but if you keep adding features, then a year later you have a monstrosity that you wish you’d written in a regular, maintainable language.
And by a “cloud version”, I mean Google Sheets (or Office 365, for that matter). The data are controlled entirely by the host; it may or may not be feasible to extract all of the information you put in, and it’s definitely not going to be trivial if your spreadsheet has any complexity. The program that does the calculations is controlled entirely by the host, and may be changed at any time, including in ways that alter the results. The feature set is controlled entirely by the host; features you rely on may be changed or completely removed at any moment. Not really attractive as a long-term investment.
On edit: what I use instead is usually a real programming language. I won’t say which ones I favor, because it would be impolite to start even more of a language war. :-)
I would love a web-based tool that allowed me to enter data in a spreadsheet-like way, present it in a spreadsheet-like way, but use code to bridge the two.
Subtracting out the “web-based” part as a first class requirement, while focusing on the bridge made of code as a “middle” from which to work “outwards” towards raw inputs and final results...
...I tend to do the first ~20 data entry actions as variable constants in my code that I tweak by hand, then switch to the CSV format for the next 10^2 to 10^5 data entry tasks that my data labelers work on, based on how I think it might work best (while giving them space for positive creativity).
A semi-common transitional pattern during the CSV stage involves using cloud spreadsheets (with multiple people logged in who can edit together and watch each other edit (which makes it sorta web-based, and also lets you use data labelers anywhere on the planet)) and ends with a copypasta out of the cloud and into a CSV that can be checked into git. Data entry… leads to crashes… which leads to validation code… which leads to automated tooling to correct common human errors <3
If the label team does more than ~10^4 data entry actions, and the team is still using CSV, then I feel guilty about having failed to upgrade a step in the full pipeline (including the human parts) whose path of desire calls out for an infrastructure upgrade if it is being used that much. If they get to 10^5 labeling actions with that system and those resources then upper management is confused somehow (maybe headcount maxxing instead of result maxxing?) and fixing that confusion is… complicated.
This CSV growth stage is not perfect, but it is highly re-usable during exploratory sketch work on blue water projects because most of the components can be accomplished with a variety of non-trivial tools.
If you know of something better for these growth stages, I’d love to hear about your workflows, my own standard methods are mostly self constructed.
On edit: what I use instead is usually a real programming language. I won’t say which ones I favor, because it would be impolite to start even more of a language war. :-)
Different programming languages are for different things.
‘I use this instead of spreadsheet’ - that’s a use case I haven’t heard a war over. (‘I use this note taking app’ - that I have read a lot of different sides on.)
Spreadsheets make it really easy to set up a simple “mathematical model”. It really doesn’t take more than about 5 minutes to learn enough about spreadsheets to get something useful going, and actually starting a spreadsheet has very low overhead, both in terms of what you have to do and of how much thought you have to put into it.
The problem with that is that it’s easy to start using them for everything, including things that are really too complicated to be safely done in a spreadsheet. It’s also possible to have something that started out as a reasonable spreadsheet application grow from that into an unreliable, unmaintainable monstosity.
If you view a spreadsheet as a program, it’s written in a “write only language”. It’s really hard to come into a big spreadsheet and understand how everything works, or know how to safely make a change, or meaningfully review it for correctness, or even apply revision control to it. There’s no global view; you have to interact with the whole thing one cell at a time. And you’re not exactly encouraged to give things meaningful names, either.
… but it’s SO EASY to start with a spreadsheet that people are often lulled into making one, and then adding more and more to it. When your spreadsheet reaches a certain complexity level, you may then find yourself investing time into learning more and more arcane features so you can extend what it does… which means that, for you, a spreadsheet will become even more the default tool the next time you want to do something. But it’ll still be a bad programming language.
You can end up with “spreadsheet experts” who use them for everything. If I had a nickel for every spreadsheet I’ve seen that should have been a database, for example...
It’s sort of like writing shell scripts; it’s trivial to write a script to automate a few commands you do all the time, but if you keep adding features, then a year later you have a monstrosity that you wish you’d written in a regular, maintainable language.
And by a “cloud version”, I mean Google Sheets (or Office 365, for that matter). The data are controlled entirely by the host; it may or may not be feasible to extract all of the information you put in, and it’s definitely not going to be trivial if your spreadsheet has any complexity. The program that does the calculations is controlled entirely by the host, and may be changed at any time, including in ways that alter the results. The feature set is controlled entirely by the host; features you rely on may be changed or completely removed at any moment. Not really attractive as a long-term investment.
On edit: what I use instead is usually a real programming language. I won’t say which ones I favor, because it would be impolite to start even more of a language war. :-)
I would love a web-based tool that allowed me to enter data in a spreadsheet-like way, present it in a spreadsheet-like way, but use code to bridge the two.
Subtracting out the “web-based” part as a first class requirement, while focusing on the bridge made of code as a “middle” from which to work “outwards” towards raw inputs and final results...
...I tend to do the first ~20 data entry actions as variable constants in my code that I tweak by hand, then switch to the CSV format for the next 10^2 to 10^5 data entry tasks that my data labelers work on, based on how I think it might work best (while giving them space for positive creativity).
A semi-common transitional pattern during the CSV stage involves using cloud spreadsheets (with multiple people logged in who can edit together and watch each other edit (which makes it sorta web-based, and also lets you use data labelers anywhere on the planet)) and ends with a copypasta out of the cloud and into a CSV that can be checked into git. Data entry… leads to crashes… which leads to validation code… which leads to automated tooling to correct common human errors <3
If the label team does more than ~10^4 data entry actions, and the team is still using CSV, then I feel guilty about having failed to upgrade a step in the full pipeline (including the human parts) whose path of desire calls out for an infrastructure upgrade if it is being used that much. If they get to 10^5 labeling actions with that system and those resources then upper management is confused somehow (maybe headcount maxxing instead of result maxxing?) and fixing that confusion is… complicated.
This CSV growth stage is not perfect, but it is highly re-usable during exploratory sketch work on blue water projects because most of the components can be accomplished with a variety of non-trivial tools.
If you know of something better for these growth stages, I’d love to hear about your workflows, my own standard methods are mostly self constructed.
There are tools that let you do that. There is a whole unit testing paradigm called fixtures for it. A prominent example is Fitnesse: http://fitnesse.org/FitNesse.UserGuide.WritingAcceptanceTests
I’m not sure I see how this resembles what I described?
Maybe I misunderstand what you have in mind? The idea is to
enter data in a spreadsheet,
that is interpreted as row-wise input to function in a program (typically a unit test), and
the result of the function is added back into additional columns in the spreadsheet.
The idea is that I can do all this from my browser, including writing the code.
That would be cool. I think it should be relatively easy to set up with replit (online IDE).
Sounds a bit like AlphaSheets (RIP).
Different programming languages are for different things.
‘I use this instead of spreadsheet’ - that’s a use case I haven’t heard a war over. (‘I use this note taking app’ - that I have read a lot of different sides on.)