A number of these projects were already on our docket, but less visible is the projects which were delayed and the fact that those selected might not have been done now otherwise. For example, if we hadn’t been doing metric quarter, I’d like have spent more of my time continuing work on the Open Questions platform and much less of my time doing interviews and talking to authors. Admittedly, subscriptions and the new editor are projects we were already committed to and had been working on, but if we hadn’t thought they’d help with the metric, we’d have delayed it to the next quarter the way we did with many of other project ideas.
We did brainstorm however, but as Oli said, it wasn’t easy to come with any ideas which were obviously much better.
Responding to both of you with one comment again: I sort of alluded to it in the A/B testing comment, but it’s less about any particular feature that’s missing and more about the general mindset. If you want to drive up metrics fast, then the magic formula is a tight iteration loop: testing large numbers of small changes to figure out which little things have disproportionate impact. Any not-yet-optimized UI is going to have lots of little trivial inconveniences and micro-confusions; identifying and fixing those can move the needle a lot with relatively little effort. Think about how facebook or amazon A/B tests every single button, every item in every sidebar, on their main pages. That sort of thing is very easy, once a testing framework is in place, and it has high yields.
As far as bigger projects go… until we know what the key factors are which drive engagement on LW, we really don’t have the tools to prioritize big projects. For purposes of driving up metrics, the biggest project right now is “figure out which things matter that we didn’t realize matter”. A/B tests are one of the main tools for that—looking at which little tweaks have big impact will give hints toward the bigger issues. Recorded user sessions (a la FullStory) are another really helpful tool. Interviews and talking to authors can be a substitute for that, although users usually don’t understand their own wants/needs very well. Analytics in general is obviously useful, although it’s tough to know which questions to ask without watching user sessions directly.
I see the spirit of what you’re saying and think there’s something to it though it doesn’t feel completely correct. That said, I don’t think anyone on the team has experience with that kind of A/B testing loop and given that lack of experience, we should try it out for at least a while on some projects.
To date, I’ve been working just to get us to have more of an analytics-mindset plus basic thorough analytics throughout the app, e.g. tracking on each of the features/buttons we build, etc. (This wasn’t trivial to do with e.g. Google Tag Manager so we’ve ended up building stuff in-house.) I think trying out A/B testing would likely make sense soon, but as above, I think there’s a lot of value even before it with more dumb/naive analytics.
We trialled FullStory for a few weeks and I agree it’s good, but also we just weren’t using it enough to justify it. LogRocket offers monthly subscription though and likely we’ll sign up for that soon. (Once we’re actually using it fully, not just trialling, we’ll need to post about it properly, build opt-out, etc. and be good around privacy—already in trial we hid e.g. voting, usernames.)
To come back to the opening points in the OP, we probably shouldn’t get too bogged down trying to optimize specific simple metrics by getting all the buttons perfect, etc., given the uncertainty over which metrics are even correct to focus on. For example, there isn’t any clear metric (that I can think of) that definitely answers how much to focus on bringing in new users and getting them up to speed vs building tools for existing users already producing good intellectual progress. I think it’s correct that have to use high-level models and fuzzier techniques to think about big project prioritization. A/B tests won’t resolve the most crucial uncertainties we have though I do think they’re likely to hugely helpful in refining our design sense.
I actually agree with the overall judgement there—optimizing simple metrics really hard is mainly useful for things like e.g. landing pages, where the goals really are pretty simple and there’s not too much danger of Goodharting. Lesswrong mostly isn’t like that, and most of the value in micro-optimizing would be in the knowledge gained, rather than the concrete result of increasing a metric. I do think there’s a lot of knowledge there to gain, and I think our design-level decisions are currently far away from the pareto frontier in ways that won’t be obvious until the micro-optimization loop starts up.
I will also say that the majority of people I’ve worked with have dramatically underestimated the magnitude of impact this sort of thing has until they saw it happen first-hand, for whatever that’s worth. (I first saw it in action at a company which achieved supercritical virality for a short time, and A/B-test-driven micro-optimization was the main tool responsible for that.) If this were a start-up, and we needed strong new user and engagement metrics to get our next round of funding, then I’d say it should be the highest priority. But this isn’t a startup, and I totally agree that A/B tests won’t solve the most crucial uncertainties.
A number of these projects were already on our docket, but less visible is the projects which were delayed and the fact that those selected might not have been done now otherwise. For example, if we hadn’t been doing metric quarter, I’d like have spent more of my time continuing work on the Open Questions platform and much less of my time doing interviews and talking to authors. Admittedly, subscriptions and the new editor are projects we were already committed to and had been working on, but if we hadn’t thought they’d help with the metric, we’d have delayed it to the next quarter the way we did with many of other project ideas.
We did brainstorm however, but as Oli said, it wasn’t easy to come with any ideas which were obviously much better.
Responding to both of you with one comment again: I sort of alluded to it in the A/B testing comment, but it’s less about any particular feature that’s missing and more about the general mindset. If you want to drive up metrics fast, then the magic formula is a tight iteration loop: testing large numbers of small changes to figure out which little things have disproportionate impact. Any not-yet-optimized UI is going to have lots of little trivial inconveniences and micro-confusions; identifying and fixing those can move the needle a lot with relatively little effort. Think about how facebook or amazon A/B tests every single button, every item in every sidebar, on their main pages. That sort of thing is very easy, once a testing framework is in place, and it has high yields.
As far as bigger projects go… until we know what the key factors are which drive engagement on LW, we really don’t have the tools to prioritize big projects. For purposes of driving up metrics, the biggest project right now is “figure out which things matter that we didn’t realize matter”. A/B tests are one of the main tools for that—looking at which little tweaks have big impact will give hints toward the bigger issues. Recorded user sessions (a la FullStory) are another really helpful tool. Interviews and talking to authors can be a substitute for that, although users usually don’t understand their own wants/needs very well. Analytics in general is obviously useful, although it’s tough to know which questions to ask without watching user sessions directly.
I see the spirit of what you’re saying and think there’s something to it though it doesn’t feel completely correct. That said, I don’t think anyone on the team has experience with that kind of A/B testing loop and given that lack of experience, we should try it out for at least a while on some projects.
To date, I’ve been working just to get us to have more of an analytics-mindset plus basic thorough analytics throughout the app, e.g. tracking on each of the features/buttons we build, etc. (This wasn’t trivial to do with e.g. Google Tag Manager so we’ve ended up building stuff in-house.) I think trying out A/B testing would likely make sense soon, but as above, I think there’s a lot of value even before it with more dumb/naive analytics.
We trialled FullStory for a few weeks and I agree it’s good, but also we just weren’t using it enough to justify it. LogRocket offers monthly subscription though and likely we’ll sign up for that soon. (Once we’re actually using it fully, not just trialling, we’ll need to post about it properly, build opt-out, etc. and be good around privacy—already in trial we hid e.g. voting, usernames.)
To come back to the opening points in the OP, we probably shouldn’t get too bogged down trying to optimize specific simple metrics by getting all the buttons perfect, etc., given the uncertainty over which metrics are even correct to focus on. For example, there isn’t any clear metric (that I can think of) that definitely answers how much to focus on bringing in new users and getting them up to speed vs building tools for existing users already producing good intellectual progress. I think it’s correct that have to use high-level models and fuzzier techniques to think about big project prioritization. A/B tests won’t resolve the most crucial uncertainties we have though I do think they’re likely to hugely helpful in refining our design sense.
I actually agree with the overall judgement there—optimizing simple metrics really hard is mainly useful for things like e.g. landing pages, where the goals really are pretty simple and there’s not too much danger of Goodharting. Lesswrong mostly isn’t like that, and most of the value in micro-optimizing would be in the knowledge gained, rather than the concrete result of increasing a metric. I do think there’s a lot of knowledge there to gain, and I think our design-level decisions are currently far away from the pareto frontier in ways that won’t be obvious until the micro-optimization loop starts up.
I will also say that the majority of people I’ve worked with have dramatically underestimated the magnitude of impact this sort of thing has until they saw it happen first-hand, for whatever that’s worth. (I first saw it in action at a company which achieved supercritical virality for a short time, and A/B-test-driven micro-optimization was the main tool responsible for that.) If this were a start-up, and we needed strong new user and engagement metrics to get our next round of funding, then I’d say it should be the highest priority. But this isn’t a startup, and I totally agree that A/B tests won’t solve the most crucial uncertainties.