This describes a high-severity operational event (network outage, crash bug, etc.) pretty well. The call leader is organizing teams and devs to focus on different parts of diagnosis and mitigation, and there’s tons of unknowns and both relevant and irrelevant information coming in all the time.
Many companies/teams do dry-runs of such things, but on-the-job training is most effective. The senior people guide the junior people through it as it happens, and after a few repetitions, the junior people (or the ones who display maturity and aptitude) become the leads.
For “regular” software engineering, I’d rather not encourage the idea that short, distinct scenarios are representative of the important parts. Making something that’s maintainable and extensible over many years isn’t something that can be trained in small bites.
On a separate note, I think that incidents are the opposite of this—that requires that people go through and find what is wrong immediately, because the response is urgent. If anything, a Root Cause Analysis after the fact would be more similar. Or possibly the outage investigation. You might be interested in Julia Evan’s debugging puzzles, which are small-scale but good introductions. I could imagine similar scenarios with real servers (and a senior dev moderating) being good training on debugging server issues and learning new tools.
I should have been more specific that the post-mortem is a critical part of the incident handling. I see a lot of similarity in tactical decision-making, both in the incident handling (the decisions made) and in the post-mortem (the analysis and rationale).
Strategic decision-making, tradeoffs about solving a narrow problem simply or leaving room for a class of problems, with more complexity (and structures to handle that complexity), is a related, but different set of skills.
There is some extent to which you need long-term software project experience to learn how to deal with maintenance and extensibility over multiple years. However, there is still some benefit for devs who are fresh out of college and haven’t done software maintenance. A lot of junior devs realize their designs are bad when asked “What if you need to add X later?”. And these decision making training games would help with that.
How would you feel about someone being given a pile of code, and having to add a feature that requires modifications throughout the codebase? That could be a decent simulation of working in a large legacy codebase, and could be used in a similar game context, where you do a debrief on what aspects of the existing code made it easy/hard to modify, and review the code that the dev playing the game wrote.
How would you feel about someone being given a pile of code, and having to add a feature that requires modifications throughout the codebase?
I think this describes many internships or onboarding projects for developers. My general opinion is that, when nobody’s shooting at you, it’s best to do this on real software, rather than training simulations. The best simulation is reality itself.
This describes a high-severity operational event (network outage, crash bug, etc.) pretty well. The call leader is organizing teams and devs to focus on different parts of diagnosis and mitigation, and there’s tons of unknowns and both relevant and irrelevant information coming in all the time.
Many companies/teams do dry-runs of such things, but on-the-job training is most effective. The senior people guide the junior people through it as it happens, and after a few repetitions, the junior people (or the ones who display maturity and aptitude) become the leads.
For “regular” software engineering, I’d rather not encourage the idea that short, distinct scenarios are representative of the important parts. Making something that’s maintainable and extensible over many years isn’t something that can be trained in small bites.
On a separate note, I think that incidents are the opposite of this—that requires that people go through and find what is wrong immediately, because the response is urgent. If anything, a Root Cause Analysis after the fact would be more similar. Or possibly the outage investigation. You might be interested in Julia Evan’s debugging puzzles, which are small-scale but good introductions. I could imagine similar scenarios with real servers (and a senior dev moderating) being good training on debugging server issues and learning new tools.
I should have been more specific that the post-mortem is a critical part of the incident handling. I see a lot of similarity in tactical decision-making, both in the incident handling (the decisions made) and in the post-mortem (the analysis and rationale).
Strategic decision-making, tradeoffs about solving a narrow problem simply or leaving room for a class of problems, with more complexity (and structures to handle that complexity), is a related, but different set of skills.
There is some extent to which you need long-term software project experience to learn how to deal with maintenance and extensibility over multiple years. However, there is still some benefit for devs who are fresh out of college and haven’t done software maintenance. A lot of junior devs realize their designs are bad when asked “What if you need to add X later?”. And these decision making training games would help with that.
How would you feel about someone being given a pile of code, and having to add a feature that requires modifications throughout the codebase? That could be a decent simulation of working in a large legacy codebase, and could be used in a similar game context, where you do a debrief on what aspects of the existing code made it easy/hard to modify, and review the code that the dev playing the game wrote.
I think this describes many internships or onboarding projects for developers. My general opinion is that, when nobody’s shooting at you, it’s best to do this on real software, rather than training simulations. The best simulation is reality itself.