User research as a barometer of software design
(Cross-posted on my personal blog.)
In UX design, they do something called user research.
Consider an example. You have two designs: A and B. You think A is better but John thinks B is better. Who’s right?
Well, you can debate a bunch of stuff. Design A breaks this guideline, Design B does breaks that guideline. Or… you can actually sit down next to users and watch them use each design. See how much difficulty they have and stuff. If users are all confused by Design A and content with Design B, then Design B is the winner, despite whatever the theoretical arguments might say.
I think the same thing is true in software design. For example, consider two designs: C and D. Design C has a lot of duplicate code, whereas Design D is, well, DRY.
Design D is better, right? Isn’t DRYing up duplicate code is a good thing?
Maybe. Maybe not. They might have taught you about DRY code in school, but that doesn’t matter. The barometer here is user research, where the users are the programmers. Imagine that you had some programmers try to build the app using “wet” code (Design C). And then imagine that you had other programmers try to build the app using DRY code (Design D). Who would have an easier time?
Why am I making this point? Isn’t it obvious? Well, I think that it can be easy to get lost in the weeds a bit with software design and lose sight of the bigger picture. The bigger picture is that you want code that is designed in such a way that it is easy to work with. Imagine having to implement a new feature and being able to easily read through the existing code, understand it, and figure out where you need to make the change for your feature. That is what we are aiming for.
What do I mean when I say I think people get lost in the weeds? Here’s what I mean. I propose that they fall victim to Lost Purposes.
Imagine a high school student who starts off trying to get good grades in order to get into a good college, in order to eventually get into a career that will make them happy, but who finds themselves stressed about getting a B+ in a class during the last semester of their senior year when they have already been accepted into the college they are going to attend. This student now cares about the grade in and of itself and has lost sight of the fact that it originally was just a means to the end of leading them to a happy and fulfilling life.
Similarly, the purpose of different design patterns like DRY is to make the codebase easy to work with. But, like the high school student, we lose sight of this. We start seeing them as ends in and of themselves, and we judge code according to how DRY, SOLID and ACIDic it is. Unfortunately the acronyms are just a proxy for what we are truly interested in. They’re not the real thing.
That doesn’t mean the acronyms aren’t useful though. I think that they are useful. I just think that they need to be combined with… something else. It is easy to lose sight of the forest for the trees when you think about the acronyms. User research, at least as a thought experiment, helps you keep sight of that forest, I propose.
Consider the DRY vs “wet” example. At first it might seem obvious that you’d want to dry up that duplicate code. But what happens when you do a little thought experiment. Imagine, in your minds eye, yourself working with the DRY code. And now, again, in your minds eye, imagine yourself working with that “wet” code.
What does it feel like? Maybe that “wet” code isn’t actually that bad? Maybe it’s kinda nice. Maybe you find that, despite what the professors said, the duplication isn’t actually causing you difficulty. Maybe you enjoy the fact that you get to deal with things at a more object/ground level rather than at the more abstract level that the DRY code forced you into. Maybe it lead you to some new acroyms like YAGNI, and some new rules like the Rule of Three.
Again, I’m not saying that this sort of thought experiment is The Answer. It’s not. It’s just a tool to help us avoid losing sight of the forest. But if it were our only tool, I don’t think we’d be able to get through the forest. It wouldn’t be enough. Imagine if you had never been taught any design patterns. Your instincts wouldn’t be as good. What I’m proposing here is that this thought experiment is quite effective when used in conjunction with the things we are already accustomed to using.
As both a programmer and a UX designer (and having studied and done UX/usability research), I have quite a few things to say about this. (The following points aren’t in any particular order, and don’t necessarily come together to form any larger point.)
Examples?
You give one example (‘DRY’) of a pattern that’s supposedly Correct™ but might not make code “easier to work with”. That makes the post read a bit like one of those innumerable “Against <Some Commonly Done Or Used Thing>” essays, where the author, annoyed by how everyone’s always telling him to do or use that thing, even when he kind of doesn’t want to do or use it, decides to go on a rant about the thing. (Which is not necessarily a bad thing; such rants are often entertaining, and sometimes quite necessary.)
So, are there other examples you are thinking of (or can easily think of), of software design patterns that are supposedly Correct™, but that actually (you think) might make code harder, and not easier, to work with?
How close “contact with reality”?
You talk about user research at the start of the post, but then you transition into talking about thought-experimentation. I note that these are not only not the same thing, but not even the same sort of thing!
Attempting to do user testing as a thought experiment, instead of doing it for real, would be useless. I can tell you, from personal experience, that the value of user testing comes from the fact that users can, and will, surprise you. You can sit back and imagine users interacting with your system as much as you like, and you will never be able to imagine the sorts of things that actual users will actually do when they try to use it. Never.
I see no reason why the same shouldn’t be true in the case of software design patterns and the coding experience…
Note that this is especially so given that what we imagine it is like to work with one or another sort of code, is affected quite strongly by our existing views on the design patterns (or other properties of a codebase) in question. If I think that DRY is a sine qua non of good code, then, when I imagine working with code that conforms to that principle, I’ll imagine a sense of ease and elegance; while the very thought of working with a duplication-heavy codebase will fill me with revulsion.
In your case, I suspect, what happens is the reverse: you don’t think much of DRY as a principle, so, when you do this thought experiment, you get more or less the opposite result. But in neither case does the thought experiment actually tell us anything (or rather, anything more than we already know) about how it’ll actually go, once we sit down and start coding…
The third user
You write:
But the question you neglect to ask (and you are in exalted company in this, I assure you!) is: easy for whom, exactly—and just what constitutes “working with” the code?
“The Third User” is a famous essay by Bruce Tognazzini (creator of the Apple Human Interface Guidelines, and a legend in the field of software usability). It explains the pattern behind many of Apple’s design choices over the last two decades or so (the essay was written in 2013, but it has become, if anything, even more true since then). That pattern is simple:
Design the product so that it appeals to (1) the person looking at it in the store and deciding whether to buy it, and (2) the person who’s just bought it, opened it, and is now using it for the first time (and not deciding to return it; and writing good reviews on Amazon and making enthusiastic posts about their new purchase on social media; etc.). Don’t bother designing the product so that it fits the needs of (3) the experienced user, who has long since mastered the basics, and now has more complex needs, and wants to do advanced things. (Why bother? You already have their money, after all.)
As the market for computers, smartphones, etc. has grown, and as startups (and, therefore, new products) have come to play an increasingly large role as sources of user-facing technological products, there has been a tendency in user testing to focus on new users. There are many factors that have contributed to this, too many to enumerate here, but the result has been that “how quickly can a person start doing something with the product, knowing nothing about it (and having absolutely no interest whatsoever in reading any instructions or taking any time to learn anything)” has come to dominate most other possible questions for user testing to answer.
The result is that the “third user effect” has become pervasive in consumer technology. The potential user is appealed to, and the new user—but not the experienced user; not the advanced user. (Remember the term “power user”? One hardly hears it, these days… there’s a reason for that—and it’s not that modern software is just so capable that it lets anyone do all the clever things that once called on advanced skills to accomplish!)
What you are proposing, in essence, is to enshrine the “third user effect” in how we think about software design.
Is it easy to work with a codebase? How does it feel? Both the answers to those questions, and their relevance, depends on the details! Are we talking about how easy it is to figure out just enough to start adding new code, how quickly one can “get up to speed” and start contributing? Or are we, instead, asking about long-term effectiveness—over a period of a year, two years, five years, of working with this codebase, how productive can you be with it (how long does it take you on average to add a feature, for example)?
What about long-term maintainability? DRY is a good example here: if you have much code duplication, and you want to change how the code does something, do you have to change it in one place, or in twelve? Is the duplicated code exactly the same in each of those twelve places, or is it slightly different in two of them? Do some of the instances of that code have side effects? Are you likelier to introduce bugs with your twelve changes than with the one?
And speaking of bugs—how easy is it to fix them, or find them in the first place? It’s a truism that fixing bugs in code is harder than writing the code in the first place, so if you make it easier to write code but harder to fix bugs in it, haven’t you made things worse?
More broadly: are you optimizing for the one-time contributor, or the long-term maintainer? Is your goal to get some code written now, or for that code to still be working in a year or a decade? Is the point merely to write code, or to write good code? To do something, or to do the right thing?
Yeah I see what you mean. I wanted to include other examples in the post but just struggled to come up with them and/or articulate them. Plus I didn’t want to spent too much time on this post. If you or anyone else have any though, I’d love to hear them!
Agreed. Like with the examples, this really is just a failure on my part as the writer for not clarifying and talking about this. I did in a previous version of the post, but had some trouble fitting it into this version.
What I was going for is something like this. In theory it would be great if you were able to do proper user research. But in practice that would involve an investment of time that might not be reasonable. And so as an alternative, the thought experiment thing is another option that is useful.
This strikes me as thinking in absolutes. You won’t be able to imagine all of the things that users will actually do, but you’ll still be able to imagine some of them. And that portion of things you are able to imagine serves some use. Right?
I agree. This is something to be aware of, and I think it would have been good for me to talk about in the post. But to be clear, I think that these biases are something that we can manage. “I recognize that I am biased against DRY code and therefore shift my beliefs accordingly.”
Yeah, I agree with all of this. And I did think a bit about it when I was writing the post. It didn’t feel like a path worth going down though:
It seems a little tangential and distracting to the core point of my post. I think my point applies to many of the interpretations of “easy to work with”, and talking about what makes code easy to work with is a large conversation.
I was sorta hoping/envisioning the word “easy” in “easy to work with” would do enough of the heavy lifting. If you are working in a codebase for five years and the code makes it easy to onboard but difficult to work with in the long run, then in aggregate, it’s not actually easy to work with since a good proportion of the time is spent as the Third User.
The words weren’t coming to me. I don’t want to shy away from acknowledging this. A writer with more skill and/or time would probably be able to incorporate the third user stuff into the post in such a way where it is a net positive.
It’s not impossible that imagining what users will do can serve some use. Of course, to the extent that such an exercise is useful at all, it’s best done systematically. This is the point of such usability evaluation techniques as the cognitive walkthrough, heuristic analysis, etc.
The danger—and it is a grave danger, which has been the ruin of many a project—is that (a) what you imagine users will do is not actually something they will do, and (b) what users will actually do is not something you imagine they will do. But you will think that you have learned something; and so your thought experiment will mislead you, and leave you worse off than before.
It is very easy to “imagine” yourself into creating something that nobody (not even you!) will want to use.
There is really no substitute for actual user testing—not even a partial one.
I have a feeling that we mostly agree with each other and are thinking about two different questions. Consider three options:
Think about whether a design matches various heuristics.
Think concretely about whether a design will actually be understandable to users (user research thought experiment).
Do user research to find out whether a design will actually be understandable to users (actual user research).
I think we are in agreement that 3 > 2 > 1.[1]
One question is whether 3 > 2. A separate question is 2 > 1. I am making the/a point in this post that 2 > 1.[2]
But since 3 > 2, shouldn’t we always be doing 3 instead of 2, making the point that 2 > 1 moot? I don’t think so.
3 takes more time than 2 and is not always practical. A given design is composed of lots of little decisions that are made[3], and there isn’t time to do proper user research on each of these component decisions. And so in practice, the status quo is that people currently do not do user research on all of these component decisions.
In which case, I think the question is whether 2 > 1. Or rather, as I mention in my second footnote to this comment, whether 2 can be used in addition to 1, adding value. I think it can and often does. Furthermore, I think that this point is underutilized/underappreciated/under-understood.
And there’s probably some sort of more general point to be made here that I’m struggling to think about and articulate that extends past usability and software design.
Although you seem to feel more strongly about how much 3 > 2 than I do. To make up numbers, how strongly I feel about it’s importance, I’d say is like a 7⁄10, whereas you seem to be more like a 9.5/10 or 10⁄10.
Well, they all work in conjunction with each other, but the idea of 3 > 2 > 1 still gestures at what I am trying to get at, I hope.
I’m having trouble articulating what I mean by this. Maybe you get it?
First, note that 1 is properly not just heuristic analysis, but also all the other formal and semi-formal methods of evaluation. That said:
1 > 2. (Because 1 is just 2 but systematized, with checklists, corrections for common biases, non-obvious considerations, etc.)
3 <> 1. (That is: 3 is incomparable with 1.) (As you say, a design is composed of many decisions, too many to effectively user-test, etc.)
3+1 > any of { 1, 2, 3 }. (Note that this constitutes the standard prescription for UX design practice.)
It’s a trap. It’s a trap in UX, and an analogous trap in software design.
The problem with UX research, as normally practiced, is that it prioritizes first impressions and neglects what happens when the user has been using the system for awhile and understands it well. So you wind up doing things like adjusting all the buttons to control their prominence, to guide new users to the main interactions and away from the long-tail interactions, at the expense of those buttons having any sort of coherent organization.
In the case of software, UX-style user testing is going to lead straight into a well known, known bad attractor: brittle magic. There is a common pattern in libraries and frameworks where they do a great job making a few common development tasks look really easy, shielding the developer from a lot of details that they probably should not have been shielded from. And what happens with these systems, almost invariably, is that as soon as you start using them for real instead of in toy examples, they reveal themselves to be overcomplicated garbage which impedes attempts to understand what’s really happening.
Indeed. This is, for example, possibly the biggest problem with Ruby on Rails.
I think I didn’t do a very good job of being clear about my point and thoughts.
The big thing I was trying to say is that user research should be thought of as the ultimate barometer, not whether the code you wrote happens to adhere to various acronyms like DRY.
Perhaps most UX-style user testing is done on new UIs, but you can also do such user testing on older UIs. For example, maybe you’ve had the same navbar for a long time, people have been using it, but you want to see if mobile users understand how to use it to navigate.
You make a great point about how one has to be careful to not only optimize for new users, and that the Third User (as Said describes) is also important. But I see this point as supplementary to the points I was trying to make, not in contrast to them. For example, suppose you have a bunch of DRY code in your codebase with weird abstractions that are confusing to the team. In that case, the user testing results are negative, and, I argue, those user testing results are what matter at the end of the day, not heuristics like whether the code adheres to principles like DRY.
Although, to caveat that point, there is probably some wisdom in giving those principles some weight as well, since our ability to introspect and answer the question of how easy it is to work with code is imperfect. Damn, that wasn’t very clear. It’s hard to explain what I mean here. Maybe this is a good analogy. Psychologists who study happiness find that long commutes make people unhappy. Maybe you have a long commute and think that it’s fine. So the user testing shows that it’s fine. But maybe you’re failing to introspect and the commute is actually causing you a good deal of frustration. In that case the user testing has mislead us. Because of this risk, it’s probably a good idea to give some weight to what things like happiness research have shown. Hopefully that makes some sense.
I agree with what you’re saying here.
I think you’ve noticed that the consequentialism vs deontology debate happens in software design, just like everywhere else. My preferred solution is to recognize that you don’t always have the time (or the willing customers, or the knowledge of what dimensions to test) to be empirical about an approach, and having a set of heuristics (aka rules) to guide you is incredibly helpful.
Predict the results, and test your intuitions whenever you can, and especially when it’s a pivotal decision you can’t change later. Taking on dependencies and locking in to a language or platform fall into these categories—be careful, prototype, measure. The vast majority of things (factoring of your codebase, location and sharing mechanism for libraries, etc.) you can make lighter-weight choices, and test as you go—if it becomes annoying, then change it.
UI design contains both. By the time you’re getting to user testing, you’ve ALREADY committed to a small number of options, and you’ve done so by your own use of the product, and intuitions about what different customer personae will want. And even when doing user testing, there is usually NO WAY to resolve the “beginner vs expert” conundrum. You can only test on beginners, so you only get evidence of how easy it is to discover how to do a new thing. You lose out on understanding how cumbersome your design is for someone who spends hours each day in the product and just wants to get their work done efficiently.
The way to avoid that is to override the user research with your intuitions, AND put enough telemetry into the product that you can learn over time how the product is actually used, to figure out the right balance (or make the investment to serve both sets of users).
Y’know… funnily enough I actually didn’t make that consequentialism vs deontology connection even though my second to last post was about exactly that! Thanks for bringing it up though! I think you’re right!
Yeah I agree here. And I think it’s a good point to bring up.
I think that in this situation, some form of what I was saying in this post regarding user research still applies. Say you make some lightweight decision. You’re then faced with the subsequent decision of whether you should change it. In making that subsequent decision, I think people probably err too strongly towards what the heuristics and acronyms say, and not strongly enough towards the user research of how easy to work with they find that the lightweight decision has made the code.
For example, maybe I see something duplicated once and DRY it up. I find that the abstraction I created is awkward to work with. But I keep it because I figure that DRY is good and is what I should do. Instead, it’d be better if I thought about user research moreso as the barometer, if that makes sense.
I agree that it is harder. Perhaps impractical in the majority of cases. But I’m thinking about a situation where, you have some design that has existed for a while, the users who have been using it have become experts, and you can do user testing on them at that point. Granted, you can’t compare two alternative designs this way (unless you do some sort of longitudinal A/B test).
Some people seem to do this automatically. They notice which things make code harder to work with and avoid them. Occasionally, they notice things that make working with code easier and make sure to include those bits in. I guess that’s how you get beautiful code like redis or Django.
But I’ve never seen any formal approach to this. I’ve gone down the software craftsmanship rabbit hole for a few years and learned a lot thanks to it, but none of it was based on any research—just people like Beck, Uncle Bob, Fowler, etc. distilling their experience into blog posts or books. The downside of that is that it would ignite furious debate that would go nowhere because there was no data to back it up, just anecdotes. These debates, I think, turned a lot of people off, even though there were gems of wisdom there.
Yeah I agree, there are definitely people who do this automatically. Well, it’s probably more like a spectrum. Some people do it a lot by default, some a medium amount, some a little. One of the claims I’m making is that that the spectrum leans too much towards “too little”.
Haha right, those debates can definitely tend to devolve and be unproductive. It’s a shame.
Attempting to set up a Django install was one of my worst and most frustrating experiences with any software, ever.
I haven’t looked at Django’s code. It might be beautiful—I wouldn’t know. But this goes to the point I made in my other comments on this post: “easy for whom?” is a critical question in such cases.