I would love a web-based tool that allowed me to enter data in a spreadsheet-like way, present it in a spreadsheet-like way, but use code to bridge the two.
Subtracting out the “web-based” part as a first class requirement, while focusing on the bridge made of code as a “middle” from which to work “outwards” towards raw inputs and final results...
...I tend to do the first ~20 data entry actions as variable constants in my code that I tweak by hand, then switch to the CSV format for the next 10^2 to 10^5 data entry tasks that my data labelers work on, based on how I think it might work best (while giving them space for positive creativity).
A semi-common transitional pattern during the CSV stage involves using cloud spreadsheets (with multiple people logged in who can edit together and watch each other edit (which makes it sorta web-based, and also lets you use data labelers anywhere on the planet)) and ends with a copypasta out of the cloud and into a CSV that can be checked into git. Data entry… leads to crashes… which leads to validation code… which leads to automated tooling to correct common human errors <3
If the label team does more than ~10^4 data entry actions, and the team is still using CSV, then I feel guilty about having failed to upgrade a step in the full pipeline (including the human parts) whose path of desire calls out for an infrastructure upgrade if it is being used that much. If they get to 10^5 labeling actions with that system and those resources then upper management is confused somehow (maybe headcount maxxing instead of result maxxing?) and fixing that confusion is… complicated.
This CSV growth stage is not perfect, but it is highly re-usable during exploratory sketch work on blue water projects because most of the components can be accomplished with a variety of non-trivial tools.
If you know of something better for these growth stages, I’d love to hear about your workflows, my own standard methods are mostly self constructed.
I would love a web-based tool that allowed me to enter data in a spreadsheet-like way, present it in a spreadsheet-like way, but use code to bridge the two.
Subtracting out the “web-based” part as a first class requirement, while focusing on the bridge made of code as a “middle” from which to work “outwards” towards raw inputs and final results...
...I tend to do the first ~20 data entry actions as variable constants in my code that I tweak by hand, then switch to the CSV format for the next 10^2 to 10^5 data entry tasks that my data labelers work on, based on how I think it might work best (while giving them space for positive creativity).
A semi-common transitional pattern during the CSV stage involves using cloud spreadsheets (with multiple people logged in who can edit together and watch each other edit (which makes it sorta web-based, and also lets you use data labelers anywhere on the planet)) and ends with a copypasta out of the cloud and into a CSV that can be checked into git. Data entry… leads to crashes… which leads to validation code… which leads to automated tooling to correct common human errors <3
If the label team does more than ~10^4 data entry actions, and the team is still using CSV, then I feel guilty about having failed to upgrade a step in the full pipeline (including the human parts) whose path of desire calls out for an infrastructure upgrade if it is being used that much. If they get to 10^5 labeling actions with that system and those resources then upper management is confused somehow (maybe headcount maxxing instead of result maxxing?) and fixing that confusion is… complicated.
This CSV growth stage is not perfect, but it is highly re-usable during exploratory sketch work on blue water projects because most of the components can be accomplished with a variety of non-trivial tools.
If you know of something better for these growth stages, I’d love to hear about your workflows, my own standard methods are mostly self constructed.
There are tools that let you do that. There is a whole unit testing paradigm called fixtures for it. A prominent example is Fitnesse: http://fitnesse.org/FitNesse.UserGuide.WritingAcceptanceTests
I’m not sure I see how this resembles what I described?
Maybe I misunderstand what you have in mind? The idea is to
enter data in a spreadsheet,
that is interpreted as row-wise input to function in a program (typically a unit test), and
the result of the function is added back into additional columns in the spreadsheet.
The idea is that I can do all this from my browser, including writing the code.
That would be cool. I think it should be relatively easy to set up with replit (online IDE).
Sounds a bit like AlphaSheets (RIP).