Trying to think about building some content organisations and filtering systems on LessWrong. I’m new to a bunch of the things I discuss below, so I’m interested in other people’s models of these subjects, or links to sites that solve the problems in different ways.
Two Problems
So, one problem you might try to solve is that people want to see all of a thing on a site. You might want to see all the posts on reductionism on LessWrong, or all the practical how-to guides (e.g. how to beat procrastination, Alignment Research Field Guide, etc), or all the literature reviews on LessWrong. And so you want people to help build those pages. You might also want to see all the posts corresponding to a certain concept, so that you can find out what that concept refers to (e.g. what is the term “goodhart’s law” or “slack” or “mesa-optimisers” etc).
Another problem you might try to solve, is that while many users are interested in lots of the content on the site, they have varying levels of interest in the different topics. Some people are mostly interested in the posts on big picture historical narratives, and less so on models of one’s own mind that help with dealing with emotions and trauma. Some people are very interested AI alignment, some are interested in only the best such posts, and some are interested in none.
I think the first problem is supposed to be solved by Wikis, and the second problem is supposed to be solved by Tagging.
Speaking generally, Wikis allow dedicated users to curated pages around certain types of content, highlighting the best examples, some side examples, writing some context for people arriving on the page to understand what the page is about. It’s a canonical, update-able, highly editable page built around one idea.
Tagging is much more about filtering than about curating.
Tagging
Let me describe some different styles of tagging.
One the site lobste.rs there are about 100 tags in total. Most tags give a very broad description of an area of interest such as “haskell” “databases” and “compilers”. These are shown next to posts on the frontpage. Most posts have 1-3 tags. This allows easy filtering by interest.
A site I’ve just been introduced to, and been fairly impressed by the tagging of, is called ‘Gelbooru’, an anime/porn image website where many images have over 100 tags, accurately describing everything contained in the image (e.g. “blue sky”, “leaf”, “person standing”, etc). That is a site where the purpose is to search-by-tags. A key element that allows Gelbooru to function is that, while I think it probably has limited dispute mechanisms for resolving whether a tag is appropriate, that’s fine because all tags are literal descriptions of objects in the image. There are no tags describing e.g. the emotions of people in the images, which would be much less easy to build common knowledge around. I do not really know how the site causes people to tag 100,000s of photos each with such scintillating tags as “arm rest”, “monochrome” and “chair”, but it seems to work quite well.
The first site uses tags as filters when looking at a single feed. As long as there is a manageable number of tags it’s easy for an author to tag things appropriately, or for readers to helpfully tag things correctly. The second site uses tagging as primary method of finding content on the site—the homepage of the site is a search bar for tags.
In the former style, tags are about filtering for fairly different kinds of content. You might wonder why one should have tags rather than just subreddits, which also filter posts by interest quite well. A key distinction is that subreddits are typically non-overlapping, whereas tags overlap often. In general, a single post can have multiple tags, but a post belongs to a single subreddit. I currently think of tags as different lenses with which to view a single subreddit, and only when your interests are sufficiently non-overlapping with the current subreddit should you go through the effort to build a new subreddit. (With its own tags.)
There are some other (key) questions of how to build incentives for users to tag things correctly, and how to solve disputes over whether a tag is correct for a post. If, as lobste.rs above, LW should have a tagging system that only has ~100 tags, and is not attempting to solve disputes on a much larger scale like Wikipedia does, then I think applying a fairly straightforward voting system might suffice. This would look like:
When a post is tagged with “AI alignment”, users can vote on the tag (with the same weight that they vote on a post), to indicate whether it’s a fit for that tag. (This means tag-post objects have their own karma.)
Whoever added the tag to that post gets the karma that the tag-post object gets. (Perhaps a smaller reward proportional to this karma score, if it seems too powerful, but definitely still positive.)
New tags cannot be created by most users. New tags are added by the moderation team, though users can submit new tags to the mod team.
If so, when we end up building a tagging system on LessWrong, the goal should be to distinguish the main types of post people are interested in viewing, and create a limited number of tags that determine this. I think that building that would mainly help users who are viewing new content on the frontpage, and that for much more granular sorting of historical content, a wiki would be better placed.
Afterthought on conceptual boundary
The conceptual boundary is something like the following: A tag is literally just a list of posts, where you can just determine whether something is in that list or not. A Wiki is an editable text-field, curate-able with much more depth than a simple list. A Tag is a communal list object, a Wiki Page is a communal text-body object.
I spent an hour or two talking about these problems with Ruby. Here are two further thoughts. I will reiterate that I have little experience with wikis and tagging, so I am likely making some simple errors.
Connecting Tagging and Wikis
One problem to solve is that if a topic is being discussed, users want to go from a page discussing that topic to find a page that explains that topic, and lists all posts that discuss that topic. This page should be easily update-able with new content on the topic.
Some more specific stories:
A user reads a post on a topic, and wants to better understand what’s already known about that topic and the basic ideas
A user is primarily interested in a topic, and wants to make sure to see all content about that topic
The solution for the first is to link to a page that contains all other posts on that topic. The solution to the second is to link to a wiki page on that topic. And one possible solution is to make both of those the same button.
This page is a combination of a Wiki and a Tag. It is a communally editable explanation of the concept, with links to key posts explaining it, and other pages that are related. And below that, it also has a post-list of every posts that is relevant, sortable by things like recency, karma, and relevancy. Maybe below that it even has its own Recent Discussion section, for comments on posts that have the tag. It’s a page you can subscribe to (e.g. via RSS), and come back to to see discussion of a particular topic.
Now, to make this work, it’s necessary that all posts that are in the category are successfully listed in the tag. One problem you will run into is that there are a lot of concepts in the space, so the number of such pages will quickly become unmanageable. “Inner Alignment”, “Slack”, “Game Theory”, “Akrasia”, “Introspection”, “Corrigibility”, etc, is a very large list, such that it is not reasonable to scroll through it and check if your post fits into any of them, and expect to do this successfully. You’ll end up with a lot of Wiki pages with very incomplete lists.
This is especially bad, because the other use of the tag system you might be hoping for is the one described in the parent to this comment, where you can see the most relevant tags directly from the frontpage, to help with figuring out what you want to read. If you want to make sure to read all the AI alignment posts, it’s not helpful to give you a tag that sometimes works, because then you still have to check all the other posts anyway.
However, there are three ways to patch this over. Firstly, the thing that will help the Wiki system the most here, is the ability to add posts to the Wiki page from the post page, instead of having to independently visit the Wiki page and then add it in. This helps the people who care about maintaining Wiki pages quite a bit, making their job much easier.
Secondly, you can help organise those tags in order of likely relevance. For example, if you link to a lot of posts that have the tag “AI alignment” then you probably are about AI alignment, so that tag should appear higher.
Thirdly, you can sort tags into two types. The first type is given priority, and is a very controlled set of concepts, that also get used for filtering on the frontpage. This is a small, stable set of tags that people learn and can easily confirm if you should be sorted by. The second is the much larger, user-generated set of tags that correspond to user-generated wiki pages, and there can be 100s of these.
In this world, wiki pages are split into two types: those that are tags and those that aren’t. Those which are tags have a big post-list item that is searchable, maybe even a recent discussion section, and can be used to tag posts. Those that are not tags do not have these features and properties.
This idea seems fairly promising to me, and I don’t see any problems with it yet. For the below, I’ll call such a page a ‘WikiTag’.
Conceptual updating
Speaking more generally, my main worry about a lot of systems like Wikis and Tagging is about something that is especially prevalent in science and in the sort of work we do on LessWrong, where we try to figure out better conceptual boundaries to draw in reality, and whereby old concepts get deprecated. I expect that on sites like lobste.rs and Gelbooru, tags rarely turn out to have been the wrong way to frame things. There are rarely arguments about whether something is really a blue sky, or just the absence of clouds. Whereas a lot of progress in science is this sort of subtle conceptual progress, where you maybe shouldn’t have said that the object fell to the ground, but instead that the object and the Earth fell into each other at rates proportional to some function of their masses.
On LessWrong I think we’ve done a lot of this sort of thing.
We used to talk about optimisation daemons, now we talk about the inner alignment problem.
We used to talk about people being stupid and the world being mad, and now we talk about coordination problems.
We used to talk about agent foundations and now we maybe think embedded agency is a better conceptualisation of the problem.
In places like the in-person CFAR space I’ve heard talk of akrasia often deprecated and instead ideas like ‘internal alignment’ are discussed.
We made progress from TDT to UDT.
So I’m generally worried about setting up infrastructure that makes concepts get stuck in place, by e.g. whoever picked the name first.
One problem I was worried about, was that all post would have to be categorised according to the old names. In particular, post that have already been tagged ‘optimisation daemons’ would now have a hard time changing to being tagged ‘inner alignment problem’.
However, after fleshing it out, I’m not so sure it’s going to be a problem.
Firstly, it’s not clear that old posts should have their tags updated. If there is a sequence of posts taking about akrasia and how to deal with it, it would be very confusing for those posts to have a tag for ‘internal alignment’, a term not mentioned anywhere in the post nor obviously related to the framing of the posts. Similarly for ‘optimisation daemons’ discussion to be called ‘the inner alignment problem’.
Secondly, there’s a fairly natural thing to do when such conceptual shifts in the conversation occur. You build a new WikiTag. Then you tag all the new posts, and write the wiki entry explaining the concept, and link back to the old concept. It just needs to say something like “Old work was done under the idea that objects fell down to the ground. We now think that the object and the Earth fall into each other, but you can see the old work and its experimental results on this page <link>. Plus here are some links to the key posts back then that you’ll still want to know about today.” And indeed if such a thing happens with agent foundations and embedded agency, or something, then it’ll be necessary to have posts explaining how the old work fits into the current paradigm. That translational work is not done by renaming a tag, but by a person who understands that domain writing some posts explaining how to think about and use the old work, in the new conceptual framework. And those should be prominently linked to on the wiki/tag page.
So I think that this system does not have the problems I thought that it had.
I guess I’m still fairly worried about subtle errors, like if instead of a tag for ‘Forecasting’ we have a tag called ‘Calibration’ or ‘Predictions’, these would shift the discourse in different ways. I’m a bit worried about that. But I think it’s likely that a small community like ours will overall be able to resist such small shifts, and that argument will prevail, even if the names are a little off sometimes. It sounds like a problem that makes progress a little slower but doesn’t push it off the rails. And if the tag is sufficiently wrong then I expect we can do the process above, where we start a new tag and link back to the old tag. Or, if the conceptual shift is sufficiently small (e.g. ‘Forecasting’ → ‘Predictions’) I can imagine renaming the tag directly.
So I’m no longer so worried about conceptual stickiness as a fundamental blocker to Wikis and Tagging as ways of organising the conceptual space.
As a general comment, StackExchange’s tagging system seems pretty perfect (and battle-tested) to me, and I suspect we should just copy their design as closely as we can.
So, on StackExchange any user can edit any of the tags, and then there is a whole complicated hierarchy that exists for how to revert changes, how to approve changes, how to lock posts from being edited, etc.
Which is a solution, but it sure doesn’t seem like an easy or elegant solution to the tagging problem.
I think the peer review queue is pretty sensible in any world where there’s “one ground truth” that you expect trusted users to have access to (such that they can approve / deny edits that cross their desk).
I’m currently working through my own thoughts and vision for tagging.
If and when we end up building a tagging system on LessWrong, the goal will be to distinguish the main types of post people are interested in viewing, and create a limited number of tags that determine this. I think building this will mainly help users who are viewing new content on the frontpage, and that for much more granular sorting of historical content, a wiki is better placed.
I’m pretty sure I disagree with this and object to you making an assertion that makes it sound like the team is definitely decided about what the goal of tagging system will be.
Hm, I think writing this and posting it at 11:35 lead to me phrasing a few things quite unclearly (and several of those sentences don’t even make sense grammatically). Let me patch with some edits right now, maybe more tomorrow.
On the particular thing you mention, never mind the whole team, I myself am pretty unsure that the above is right. The thing I meant to write there was something like “If the above is right, then when we end up building a tagging system on LessWrong, the goal should be” etc. I’m not clear on whether the above is right. I just wanted to write the idea down clearly so it could be discussed and have counterarguments/counterevidence brought up.
Trying to think about building some content organisations and filtering systems on LessWrong. I’m new to a bunch of the things I discuss below, so I’m interested in other people’s models of these subjects, or links to sites that solve the problems in different ways.
Two Problems
So, one problem you might try to solve is that people want to see all of a thing on a site. You might want to see all the posts on reductionism on LessWrong, or all the practical how-to guides (e.g. how to beat procrastination, Alignment Research Field Guide, etc), or all the literature reviews on LessWrong. And so you want people to help build those pages. You might also want to see all the posts corresponding to a certain concept, so that you can find out what that concept refers to (e.g. what is the term “goodhart’s law” or “slack” or “mesa-optimisers” etc).
Another problem you might try to solve, is that while many users are interested in lots of the content on the site, they have varying levels of interest in the different topics. Some people are mostly interested in the posts on big picture historical narratives, and less so on models of one’s own mind that help with dealing with emotions and trauma. Some people are very interested AI alignment, some are interested in only the best such posts, and some are interested in none.
I think the first problem is supposed to be solved by Wikis, and the second problem is supposed to be solved by Tagging.
Speaking generally, Wikis allow dedicated users to curated pages around certain types of content, highlighting the best examples, some side examples, writing some context for people arriving on the page to understand what the page is about. It’s a canonical, update-able, highly editable page built around one idea.
Tagging is much more about filtering than about curating.
Tagging
Let me describe some different styles of tagging.
One the site lobste.rs there are about 100 tags in total. Most tags give a very broad description of an area of interest such as “haskell” “databases” and “compilers”. These are shown next to posts on the frontpage. Most posts have 1-3 tags. This allows easy filtering by interest.
A site I’ve just been introduced to, and been fairly impressed by the tagging of, is called ‘Gelbooru’, an anime/porn image website where many images have over 100 tags, accurately describing everything contained in the image (e.g. “blue sky”, “leaf”, “person standing”, etc). That is a site where the purpose is to search-by-tags. A key element that allows Gelbooru to function is that, while I think it probably has limited dispute mechanisms for resolving whether a tag is appropriate, that’s fine because all tags are literal descriptions of objects in the image. There are no tags describing e.g. the emotions of people in the images, which would be much less easy to build common knowledge around. I do not really know how the site causes people to tag 100,000s of photos each with such scintillating tags as “arm rest”, “monochrome” and “chair”, but it seems to work quite well.
The first site uses tags as filters when looking at a single feed. As long as there is a manageable number of tags it’s easy for an author to tag things appropriately, or for readers to helpfully tag things correctly. The second site uses tagging as primary method of finding content on the site—the homepage of the site is a search bar for tags.
In the former style, tags are about filtering for fairly different kinds of content. You might wonder why one should have tags rather than just subreddits, which also filter posts by interest quite well. A key distinction is that subreddits are typically non-overlapping, whereas tags overlap often. In general, a single post can have multiple tags, but a post belongs to a single subreddit. I currently think of tags as different lenses with which to view a single subreddit, and only when your interests are sufficiently non-overlapping with the current subreddit should you go through the effort to build a new subreddit. (With its own tags.)
There are some other (key) questions of how to build incentives for users to tag things correctly, and how to solve disputes over whether a tag is correct for a post. If, as lobste.rs above, LW should have a tagging system that only has ~100 tags, and is not attempting to solve disputes on a much larger scale like Wikipedia does, then I think applying a fairly straightforward voting system might suffice. This would look like:
When a post is tagged with “AI alignment”, users can vote on the tag (with the same weight that they vote on a post), to indicate whether it’s a fit for that tag. (This means tag-post objects have their own karma.)
Whoever added the tag to that post gets the karma that the tag-post object gets. (Perhaps a smaller reward proportional to this karma score, if it seems too powerful, but definitely still positive.)
New tags cannot be created by most users. New tags are added by the moderation team, though users can submit new tags to the mod team.
If so, when we end up building a tagging system on LessWrong, the goal should be to distinguish the main types of post people are interested in viewing, and create a limited number of tags that determine this. I think that building that would mainly help users who are viewing new content on the frontpage, and that for much more granular sorting of historical content, a wiki would be better placed.
Afterthought on conceptual boundary
The conceptual boundary is something like the following: A tag is literally just a list of posts, where you can just determine whether something is in that list or not. A Wiki is an editable text-field, curate-able with much more depth than a simple list. A Tag is a communal list object, a Wiki Page is a communal text-body object.
I spent an hour or two talking about these problems with Ruby. Here are two further thoughts. I will reiterate that I have little experience with wikis and tagging, so I am likely making some simple errors.
Connecting Tagging and Wikis
One problem to solve is that if a topic is being discussed, users want to go from a page discussing that topic to find a page that explains that topic, and lists all posts that discuss that topic. This page should be easily update-able with new content on the topic.
Some more specific stories:
A user reads a post on a topic, and wants to better understand what’s already known about that topic and the basic ideas
A user is primarily interested in a topic, and wants to make sure to see all content about that topic
The solution for the first is to link to a page that contains all other posts on that topic. The solution to the second is to link to a wiki page on that topic. And one possible solution is to make both of those the same button.
This page is a combination of a Wiki and a Tag. It is a communally editable explanation of the concept, with links to key posts explaining it, and other pages that are related. And below that, it also has a post-list of every posts that is relevant, sortable by things like recency, karma, and relevancy. Maybe below that it even has its own Recent Discussion section, for comments on posts that have the tag. It’s a page you can subscribe to (e.g. via RSS), and come back to to see discussion of a particular topic.
Now, to make this work, it’s necessary that all posts that are in the category are successfully listed in the tag. One problem you will run into is that there are a lot of concepts in the space, so the number of such pages will quickly become unmanageable. “Inner Alignment”, “Slack”, “Game Theory”, “Akrasia”, “Introspection”, “Corrigibility”, etc, is a very large list, such that it is not reasonable to scroll through it and check if your post fits into any of them, and expect to do this successfully. You’ll end up with a lot of Wiki pages with very incomplete lists.
This is especially bad, because the other use of the tag system you might be hoping for is the one described in the parent to this comment, where you can see the most relevant tags directly from the frontpage, to help with figuring out what you want to read. If you want to make sure to read all the AI alignment posts, it’s not helpful to give you a tag that sometimes works, because then you still have to check all the other posts anyway.
However, there are three ways to patch this over. Firstly, the thing that will help the Wiki system the most here, is the ability to add posts to the Wiki page from the post page, instead of having to independently visit the Wiki page and then add it in. This helps the people who care about maintaining Wiki pages quite a bit, making their job much easier.
Secondly, you can help organise those tags in order of likely relevance. For example, if you link to a lot of posts that have the tag “AI alignment” then you probably are about AI alignment, so that tag should appear higher.
Thirdly, you can sort tags into two types. The first type is given priority, and is a very controlled set of concepts, that also get used for filtering on the frontpage. This is a small, stable set of tags that people learn and can easily confirm if you should be sorted by. The second is the much larger, user-generated set of tags that correspond to user-generated wiki pages, and there can be 100s of these.
In this world, wiki pages are split into two types: those that are tags and those that aren’t. Those which are tags have a big post-list item that is searchable, maybe even a recent discussion section, and can be used to tag posts. Those that are not tags do not have these features and properties.
This idea seems fairly promising to me, and I don’t see any problems with it yet. For the below, I’ll call such a page a ‘WikiTag’.
Conceptual updating
Speaking more generally, my main worry about a lot of systems like Wikis and Tagging is about something that is especially prevalent in science and in the sort of work we do on LessWrong, where we try to figure out better conceptual boundaries to draw in reality, and whereby old concepts get deprecated. I expect that on sites like lobste.rs and Gelbooru, tags rarely turn out to have been the wrong way to frame things. There are rarely arguments about whether something is really a blue sky, or just the absence of clouds. Whereas a lot of progress in science is this sort of subtle conceptual progress, where you maybe shouldn’t have said that the object fell to the ground, but instead that the object and the Earth fell into each other at rates proportional to some function of their masses.
On LessWrong I think we’ve done a lot of this sort of thing.
We used to talk about optimisation daemons, now we talk about the inner alignment problem.
We used to talk about people being stupid and the world being mad, and now we talk about coordination problems.
We used to talk about agent foundations and now we maybe think embedded agency is a better conceptualisation of the problem.
In places like the in-person CFAR space I’ve heard talk of akrasia often deprecated and instead ideas like ‘internal alignment’ are discussed.
We made progress from TDT to UDT.
So I’m generally worried about setting up infrastructure that makes concepts get stuck in place, by e.g. whoever picked the name first.
One problem I was worried about, was that all post would have to be categorised according to the old names. In particular, post that have already been tagged ‘optimisation daemons’ would now have a hard time changing to being tagged ‘inner alignment problem’.
However, after fleshing it out, I’m not so sure it’s going to be a problem.
Firstly, it’s not clear that old posts should have their tags updated. If there is a sequence of posts taking about akrasia and how to deal with it, it would be very confusing for those posts to have a tag for ‘internal alignment’, a term not mentioned anywhere in the post nor obviously related to the framing of the posts. Similarly for ‘optimisation daemons’ discussion to be called ‘the inner alignment problem’.
Secondly, there’s a fairly natural thing to do when such conceptual shifts in the conversation occur. You build a new WikiTag. Then you tag all the new posts, and write the wiki entry explaining the concept, and link back to the old concept. It just needs to say something like “Old work was done under the idea that objects fell down to the ground. We now think that the object and the Earth fall into each other, but you can see the old work and its experimental results on this page <link>. Plus here are some links to the key posts back then that you’ll still want to know about today.” And indeed if such a thing happens with agent foundations and embedded agency, or something, then it’ll be necessary to have posts explaining how the old work fits into the current paradigm. That translational work is not done by renaming a tag, but by a person who understands that domain writing some posts explaining how to think about and use the old work, in the new conceptual framework. And those should be prominently linked to on the wiki/tag page.
So I think that this system does not have the problems I thought that it had.
I guess I’m still fairly worried about subtle errors, like if instead of a tag for ‘Forecasting’ we have a tag called ‘Calibration’ or ‘Predictions’, these would shift the discourse in different ways. I’m a bit worried about that. But I think it’s likely that a small community like ours will overall be able to resist such small shifts, and that argument will prevail, even if the names are a little off sometimes. It sounds like a problem that makes progress a little slower but doesn’t push it off the rails. And if the tag is sufficiently wrong then I expect we can do the process above, where we start a new tag and link back to the old tag. Or, if the conceptual shift is sufficiently small (e.g. ‘Forecasting’ → ‘Predictions’) I can imagine renaming the tag directly.
So I’m no longer so worried about conceptual stickiness as a fundamental blocker to Wikis and Tagging as ways of organising the conceptual space.
As a general comment, StackExchange’s tagging system seems pretty perfect (and battle-tested) to me, and I suspect we should just copy their design as closely as we can.
So, on StackExchange any user can edit any of the tags, and then there is a whole complicated hierarchy that exists for how to revert changes, how to approve changes, how to lock posts from being edited, etc.
Which is a solution, but it sure doesn’t seem like an easy or elegant solution to the tagging problem.
I think the peer review queue is pretty sensible in any world where there’s “one ground truth” that you expect trusted users to have access to (such that they can approve / deny edits that cross their desk).
It’s also important to have the old concept link to the new concept.
I’m currently working through my own thoughts and vision for tagging.
I’m pretty sure I disagree with this and object to you making an assertion that makes it sound like the team is definitely decided about what the goal of tagging system will be.
I’ll write a proper response tomorrow.
Hm, I think writing this and posting it at 11:35 lead to me phrasing a few things quite unclearly (and several of those sentences don’t even make sense grammatically). Let me patch with some edits right now, maybe more tomorrow.
On the particular thing you mention, never mind the whole team, I myself am pretty unsure that the above is right. The thing I meant to write there was something like “If the above is right, then when we end up building a tagging system on LessWrong, the goal should be” etc. I’m not clear on whether the above is right. I just wanted to write the idea down clearly so it could be discussed and have counterarguments/counterevidence brought up.
That clarifies it and makes a lot of sense. Seems my objection rested upon a misunderstanding of your true intention. In short, no worries.
I look forwards to figuring this out together.