I’m not seeing why the ProgID and TaskID variables need to be booleans—or maybe R implicitly converts them to that. I’ve left them in symbolic form.
Here is a subset of the PatMain data massaged (by hand!) into the format I thought would let me get a regression, and the regression results as a comment. I got this into a data frame variable named z2 and ran the commands:
fit = lm(Time ~ .,data=z2)
summary(fit)
I suck at statistics so I may be talking nonsense here, and you’re welcome to check my results. The bottom line seems to be that the task coefficients do a much better job of predicting the completion time than do the programmer coefficients, with t-values that suggest you could easily not care about who performs the task with the exception of programmer A6 who was the slowest of the lot.
(For instance the coefficients say that the best prediction for the time taken is “40 minutes”, then you subtract 25 minutes if the task is ST2. This isn’t a bad approximation, except for programmer A4 who takes 40 minutes on ST2. It’s not that A4 is slow—just slow on that task.)
You had asked for assistance and expertise on using R/RStudio. Unfortunately, I have never used them.
maybe R implicitly converts them
Judging from your results, I’m sure you are right.
The bottom line seems to be that the task coefficients do a much better job of predicting the completion time than do the programmer coefficients.
Yes, and if you added some additional tasks into the mix—tasks which took hours or days to complete—then programmer ID would seem to make even less difference. This points out the defect in my suggested data-analysis strategy. A better approach might have been to divide each time by the average time for the task (over all programmers), optionally also taking the log of that, and then exclude the task id as an independent variable. After all, the hypothesis is that Achilles is 10x as fast as the Tortoise, not that he takes ~30 minutes less time regardless of task size.
I’m not seeing why the ProgID and TaskID variables need to be booleans—or maybe R implicitly converts them to that. I’ve left them in symbolic form.
Here is a subset of the PatMain data massaged (by hand!) into the format I thought would let me get a regression, and the regression results as a comment. I got this into a data frame variable named z2 and ran the commands:
I suck at statistics so I may be talking nonsense here, and you’re welcome to check my results. The bottom line seems to be that the task coefficients do a much better job of predicting the completion time than do the programmer coefficients, with t-values that suggest you could easily not care about who performs the task with the exception of programmer A6 who was the slowest of the lot.
(For instance the coefficients say that the best prediction for the time taken is “40 minutes”, then you subtract 25 minutes if the task is ST2. This isn’t a bad approximation, except for programmer A4 who takes 40 minutes on ST2. It’s not that A4 is slow—just slow on that task.)
You had asked for assistance and expertise on using R/RStudio. Unfortunately, I have never used them.
Judging from your results, I’m sure you are right.
Yes, and if you added some additional tasks into the mix—tasks which took hours or days to complete—then programmer ID would seem to make even less difference. This points out the defect in my suggested data-analysis strategy. A better approach might have been to divide each time by the average time for the task (over all programmers), optionally also taking the log of that, and then exclude the task id as an independent variable. After all, the hypothesis is that Achilles is 10x as fast as the Tortoise, not that he takes ~30 minutes less time regardless of task size.