I work on similar designs for personalized/subjective curation of annotations and comment sections. Mainly orienting around topic-specific Webs of Trust (networks of endorsements about specific personal qualities).
WoTs are transparent, collaborative, scalable (or they will be when I’m done with them) and fully human-controlled, which I think is potentially really important for buy-in. If an algorithm disappoints someone, they may just give up on it, while if the network of humans who they respect disappoints them, they’re more likely to be patient with it, and more importantly, they’re more likely to feel like there’s something they can do about it and work proactively to improve it.
So that sense of complete transparency and controllability might actually be totally crucial, in which case an algorithmic approach might not get adoption. But if you can be transparent about what the algorithm is optimizing for and give users enough control over that or the algorithm, in theory this distrust or impatience shouldn’t be so much of an issue, could go either way. Of course it would be auspicious if human self-assembling structures could outperform aggregate algorithmic predictions but even I wouldn’t actually bet on it, I think the algorithmic side of things is important. Maybe a choice between both should be offered.
I’d urge you to design a UX where, rather than soliciting an absolute metric, ratings or likes, instead users submit relative comparisons between comments they have seen. Otherwise I’m fairly sure the system will optimize for the behavior of liking every comment that comes into view, which is an sort of unnatural state of stasis where the system is no longer receiving much information from ratings. It’s probably not where we want to end up.
The UX that springs to mind for me is showing the titles of comments on the left (or if on mobile, circles representing the comments) and allowing the user to reorder them by dragging as they go, to communicate their ranking. Holding a circle would show a preview of it as a reminder. I’m not sure what you’d do about all the noise you’d get from users frequently being too ambivalent or forgetting to do this, but hey, the same problem exists with likes.
Regarding echo chambers, I’d suggest a feedback with a meaning like “this comment most Advanced my Perspective” (sometimes called “Changed my View”). As long as people sometimes earnestly seek out interesting information (which is a common and natural behavior), they will cross paths with their outgroup often and sometimes learn something from them.
Hope to keep in touch. Right now I’m investigating the prospect of just building a new kind of browser that centers cross-language ocap wasm APIs, treats the DOM as a second class representation format, and presents a new UI API that tries to be a lot more friendly towards third party app extensions, code signing, and annotation stability, but if someone else is going to do it on the traditional web, I’ll still try to help.
I’d urge you to design a UX where, rather than soliciting an absolute metric, ratings or likes, instead users submit relative comparisons between comments they have seen. Otherwise I’m fairly sure the system will optimize for the behavior of liking every comment that comes into view, which is an sort of unnatural state of stasis where the system is no longer receiving much information from ratings. It’s probably not where we want to end up.
I think comment comparision is too demanding and still most often just doesn’t make sense. I don’t feel I ever want to compare comments on LessWrong, for instance.
The problem of junk voting is addressed here in the post:
Credit assignment to the users is not trivial here: to prevent abuse of the system with junk voting, it should be based on free energy reduction (FER), i. e., formally, just the difference in the free energy of the Story Node before and after receiving a reaction from the user[4].
The user who always upvotes everything will contribute zero information signal to the Story Node, thus the user will receive zero FER for its contribution.
Right now I’m investigating the prospect of just building a new kind of browser that centers cross-language ocap wasm APIs, treats the DOM as a second class representation format, and presents a new UI API that tries to be a lot more friendly towards third party app extensions, code signing, and annotation stability
I would first focus on finding a good niche and building a real, big WoT. Only then, once the network effect kicks in, you will have a real leverage and real chance to pull people to a new browser, which is extremely hard and should provide a lot of value from the beginning.
Consider starting with Telegram, by creating a bot that doesn’t permit posting in a chat unless the user has the minimum trust among the existing members of the chat. The problem of spam in Telegram is huge right now. It also provides organic starting points: selected communities could quickly solve the spam problem among themselves, so you don’t need to have the global network effect before kicking in the local network effect.
I don’t feel I ever want to compare comments on LessWrong, for instance.
It is the way I vote (looking through as many comments as I can bare to and deciding how I think the ordering could be improved), and I think it’s a better way to vote! The usual way has a pretty serious pathology where they’ll tend to vote on comments that’re already most upvoted, which actually decreases the usefulness of the vote scores (but I suppose that wouldn’t apply to a predictor system.)
Likes could be reframed of as a comparison over the comments that the user has looked at, sorting those comments into two buckets, with a dense.. network layer of comparison edges going from each unliked comment to each liked comment. If we consider strong and weak downvotes as feedback instead of the binary like/dislike, that could be treated as a sorting of the comments that the user has seen into five buckets, though it’s arguable that the unvoted bucket should be treated as an N/A, or “I didn’t read or have feelings about this” answer and not counted.
And I guess, now that I think about that, that’s a pretty good UX for this. I think having two buckets is too crude, while four might actually be the maximum detail we can expect. It’s kind of funny that lesswrong could implement this system without presenting any visible indication of it. If they did so, I would probably continue complaining about its absence for at least a year.
thus the user will receive zero FER for its contribution.
I haven’t been thinking in terms of paid review yet. It seems important! I guess I feel like a platform has to work for users who aren’t financially invested in it before they’ll be interested in paying for anything.
The problem of spam in Telegram is huge right now
That’s true, but is telegram important? if you wanted a more open system for groupchats, why not just use discord? I’d be a bit more interested in solving this for Element (which presumably doesn’t have discord’s algorithmic moderation system and will be overrun with spam as soon as anyone depends on it. Though, federation also offers a solution (at least outside of the default instances) as it’s essentially a two-layer web of trust, or a web of trust between instances.), but I guess due to the project I’m currently considering, I don’t feel like any of these platforms are going to be used in the future. They’re all woefully inflexible and high-friction, relative to what could be built on a better web.
So, when we have that better web, my current comfiest adoption path would be… I get a small community of creative hackers interested, they have a huge amount of fun with it, they develop loads of features to the point where it becomes seriously useful for organizing and managing an org’s data, some organizations start to adopt it, and after it refines and streamlines in response to their insights, it becomes a necessity for operating in the modern world. I should probably try to think of something better than this, but this is the trajectory I’m on.
Telegram is the dominant social, communication, and media platform in the Russian-speaking part of the internet. I think it is more dominant than Facebook was in the US in its heyday (and you surely heard that for many people, “Facebook meant the internet”). So currently, for many Russian speakers, the internet is basically YouTube for videos + Telegram for everything else.
My understanding (but not sure) is that Telegram is also dominant in Iran and Ethiopia (combined population > 200 million), but I have no idea what is the situation with spam in these sectors of Telegram.
I think Telegram is also huge in Brazil, but not dominant.
if you wanted a more open system for groupchats, why not just use discord?
This is a rhetorical question. I just tell you where a lot of people are right now, and where LLM-enabled spam is a huge problem right now. I think these are the conditions that you should be looking for if you want to test Web of Trust mechanisms at scale. But, of course, you might make a normative decision not try help Telegram to grow even bigger because you are not satisfied with its level of openness and decentralisation. Though, I want to note Telegram is more open than any other major messaging platform: its content API is open, anyone can create alternative clients.
But, of course, you might make a normative decision not try help Telegram to grow even bigger because you are not satisfied with its level of openness and decentralisation
It is likely. I don’t want to extend the reign of systems that aren’t deeply upgradeable/accountable/extensible.
And it’s not even as simple as proprietary vs open source, an open source project can be hostile to contributions, or lack processes for facilitating mass transitions in standards of use.
The usual way has a pretty serious pathology where they’ll tend to vote on comments that’re already most upvoted, which actually decreases the usefulness of the vote scores (but I suppose that wouldn’t apply to a predictor system.)
This is specifically one of the problems [BetterDiscourse] is conceived to address. Like, there are many “basically reasonable” positions/comments that I am happy to promote through an upvote (and most people vote this way, too), but is a low information content for me because it’s already my position, or close to my position. With separate upvote/downvote and insightful/not reactions, I can switch between looking at the most popular positions among the crowd (and Pol.is, Viewpoints.xyz, and Community Notes further remove political bias from this signal, thus prioritising the “greatest common denominator” position), and the comments that are most likely to have the greatest informational value for me personally.
And to make it clear, the claim that such “informational value first” comment ordering model is realistically trainable on user’s reactions to comments on different topics, and quickly, i.e., only on a few or a few dozen reactions from the user, is currently a hypothesis. I’m not sure there are good ways to test this hypothesis short of just trying to train such a model and see whether a large portion of people will find it useful.
In the beginning of the “Solution” section, I wrote that in principle, the information value of the comment should be in part predictable from “user’s levels of knowledge in this or that fields, beliefs, current interests, ethics, and aesthetics”, but there is a big question mark whether this information could be easily inferred from user’s reactions to other comments, or assessed for a comment in isolation when the prediction model is applied to it.
there is a big question mark whether this information could be easily inferred from user’s reactions to other comments
Right… I think it can’t, recognizing that is equivalent to being able to recognize surprising truth, it’s kind of AGI-complete. There are not so many top experts in any particular niche, and as soon as any are identified, there comes to be a huge bulk of users who will imitate them, so actual experts wont be an obviously important category to the recommender engine and it might not be able to tell them apart from their crowd.
For that we may depend on more explicit systems like webs of trust for expert recommendations. Users have to apply their own intelligence to identify the real (probable) experts, explicitly communicate that recognition, and they have to see that the experts have endorsed the comment being shown to them. We follow experts because their taste differs from ours, because their recommendations are not intuitive to us.
I should ask, is free energy reduction something we actually know how to train? I can see a way of measuring it, but it’s not economically feasible.
Thanks for sharing your work. Some comments on that:
Why Tastweb hasn’t been made already I think it’s mostly the fact that querying very large trust graphs is slow.
Oh I highly doubt that the reason was the technical problem :) If anything, we should interpret it as there wasn’t strong enough demand for solving that technical problem, as of yet. Perhaps now there will be, due to LLM-enabled spam. Telegram is drowning in spam in the last half a year, Twitter, YouTube struggle, too.
Of course it would be auspicious if human self-assembling structures could outperform aggregate algorithmic predictions but even I wouldn’t actually bet on it, I think the algorithmic side of things is important. Maybe a choice between both should be offered.
I see these systems as more complementary: Webs of Trust for moderation, filtering, and user gating (perhaps, as the key piece of decentralised content delivery platforms/networks), and algorithms for content ordering that has already passed the filter. In fact, WoT is one of the ways to do proof of personhood, and I recognise that it might be a critical foundation for [BetterDiscourse], while centralised proof-of-humanness systems such as WorldCoin may have too slow adoption.
No but I’m aware of them, what they’re doing sounds pretty cool, and yeah, it is the kind of moderation system that you need for bootstrapping big collaborative wikis.
[checks in on what they’re doing] … yeah this sounds like a good protocol, maybe the best. I should take a closer look at this. My project might be convergent with theirs. Maybe I should try to connect with them. Darn, I think what happened was I got them confused with Fission (who do a hosting system for wasm in IPFS, develop UCAN. Subconscious uses these things, and has the exact same colors in its logo), so I’ve been hanging out with Fission instead xD.
I see these systems as more complementary: Webs of Trust for moderation, filtering, and user gating (perhaps, as the key piece of decentralised content delivery platforms/networks), and algorithms for content ordering that has already passed the filter.
I was thinking the same thing. I ended up not mentioning it because it’s not immediately clear to me how users would police the introduction of non-human participants, in an algorithmic context, since users are interacting less directly; if someone starts misbehaving (IE, upvoting scam ads), it’s a hard for their endorsers to debug that. Do you know how you’d approach this?
Additionally, the tasteweb work is about making WoTs usable for subjective moderation, it seems to me that you actually need WoTs just to answer an objective question of who’s human or not (which you use to figure out which users to focus your training resources on), and then your algorithmic system does the subjective parts of moderation. Is that correct? In that case, it might make sense for you to use existing old fashioned O(n^2) energy propagation algorithms, you could talk to alignment ecosystem’s “eigenkarma network” people about that. Algorithm discussed here. Or, I note, you could instead use multi-origin dijkstra (O(n)) (or the min dijkstra from any of the known humans), to update metrics of who’s close to the network of a few confirmed human-controlled accounts. For some reason I seem to be the only one who’s noticed that distance is an adequate metric of trust that’s also much easier to compute than the prior approaches. I think maybe everyone else is looking for guidance from the prior art, even though there is very little of it and it obviously doesn’t scale (I’m pretty sure you could get that stuff to run on a minute-long cycle for 1M users, but 10M might be too much, and it’s never getting to a billion.)
Update, checked out the subconscious protocol. It’s just okay. Doesn’t have finality. I’m fairly sure something better will come along.
I’m kind of planning on not committing to a distributed state protocol at first. Maybe centralizing it at first while keeping all the code abstracted so it’ll be easy to switch later.
Edit: Might use it anyway though. It is okay, and it makes it especially easy to guarantee that it will be possible for users to switch to something more robust later. It has finality as long as you trust one of the relays (us).
I work on similar designs for personalized/subjective curation of annotations and comment sections. Mainly orienting around topic-specific Webs of Trust (networks of endorsements about specific personal qualities).
WoTs are transparent, collaborative, scalable (or they will be when I’m done with them) and fully human-controlled, which I think is potentially really important for buy-in. If an algorithm disappoints someone, they may just give up on it, while if the network of humans who they respect disappoints them, they’re more likely to be patient with it, and more importantly, they’re more likely to feel like there’s something they can do about it and work proactively to improve it.
So that sense of complete transparency and controllability might actually be totally crucial, in which case an algorithmic approach might not get adoption. But if you can be transparent about what the algorithm is optimizing for and give users enough control over that or the algorithm, in theory this distrust or impatience shouldn’t be so much of an issue, could go either way. Of course it would be auspicious if human self-assembling structures could outperform aggregate algorithmic predictions but even I wouldn’t actually bet on it, I think the algorithmic side of things is important. Maybe a choice between both should be offered.
I’d urge you to design a UX where, rather than soliciting an absolute metric, ratings or likes, instead users submit relative comparisons between comments they have seen. Otherwise I’m fairly sure the system will optimize for the behavior of liking every comment that comes into view, which is an sort of unnatural state of stasis where the system is no longer receiving much information from ratings. It’s probably not where we want to end up.
The UX that springs to mind for me is showing the titles of comments on the left (or if on mobile, circles representing the comments) and allowing the user to reorder them by dragging as they go, to communicate their ranking. Holding a circle would show a preview of it as a reminder. I’m not sure what you’d do about all the noise you’d get from users frequently being too ambivalent or forgetting to do this, but hey, the same problem exists with likes.
Regarding echo chambers, I’d suggest a feedback with a meaning like “this comment most Advanced my Perspective” (sometimes called “Changed my View”). As long as people sometimes earnestly seek out interesting information (which is a common and natural behavior), they will cross paths with their outgroup often and sometimes learn something from them.
Hope to keep in touch. Right now I’m investigating the prospect of just building a new kind of browser that centers cross-language ocap wasm APIs, treats the DOM as a second class representation format, and presents a new UI API that tries to be a lot more friendly towards third party app extensions, code signing, and annotation stability, but if someone else is going to do it on the traditional web, I’ll still try to help.
I think comment comparision is too demanding and still most often just doesn’t make sense. I don’t feel I ever want to compare comments on LessWrong, for instance.
The problem of junk voting is addressed here in the post:
The user who always upvotes everything will contribute zero information signal to the Story Node, thus the user will receive zero FER for its contribution.
I would first focus on finding a good niche and building a real, big WoT. Only then, once the network effect kicks in, you will have a real leverage and real chance to pull people to a new browser, which is extremely hard and should provide a lot of value from the beginning.
Consider starting with Telegram, by creating a bot that doesn’t permit posting in a chat unless the user has the minimum trust among the existing members of the chat. The problem of spam in Telegram is huge right now. It also provides organic starting points: selected communities could quickly solve the spam problem among themselves, so you don’t need to have the global network effect before kicking in the local network effect.
It is the way I vote (looking through as many comments as I can bare to and deciding how I think the ordering could be improved), and I think it’s a better way to vote! The usual way has a pretty serious pathology where they’ll tend to vote on comments that’re already most upvoted, which actually decreases the usefulness of the vote scores (but I suppose that wouldn’t apply to a predictor system.)
Likes could be reframed of as a comparison over the comments that the user has looked at, sorting those comments into two buckets, with a dense.. network layer of comparison edges going from each unliked comment to each liked comment. If we consider strong and weak downvotes as feedback instead of the binary like/dislike, that could be treated as a sorting of the comments that the user has seen into five buckets, though it’s arguable that the unvoted bucket should be treated as an N/A, or “I didn’t read or have feelings about this” answer and not counted.
And I guess, now that I think about that, that’s a pretty good UX for this. I think having two buckets is too crude, while four might actually be the maximum detail we can expect.
It’s kind of funny that lesswrong could implement this system without presenting any visible indication of it. If they did so, I would probably continue complaining about its absence for at least a year.
I haven’t been thinking in terms of paid review yet. It seems important! I guess I feel like a platform has to work for users who aren’t financially invested in it before they’ll be interested in paying for anything.
That’s true, but is telegram important? if you wanted a more open system for groupchats, why not just use discord? I’d be a bit more interested in solving this for Element (which presumably doesn’t have discord’s algorithmic moderation system and will be overrun with spam as soon as anyone depends on it. Though, federation also offers a solution (at least outside of the default instances) as it’s essentially a two-layer web of trust, or a web of trust between instances.), but I guess due to the project I’m currently considering, I don’t feel like any of these platforms are going to be used in the future. They’re all woefully inflexible and high-friction, relative to what could be built on a better web.
So, when we have that better web, my current comfiest adoption path would be… I get a small community of creative hackers interested, they have a huge amount of fun with it, they develop loads of features to the point where it becomes seriously useful for organizing and managing an org’s data, some organizations start to adopt it, and after it refines and streamlines in response to their insights, it becomes a necessity for operating in the modern world.
I should probably try to think of something better than this, but this is the trajectory I’m on.
Telegram is the dominant social, communication, and media platform in the Russian-speaking part of the internet. I think it is more dominant than Facebook was in the US in its heyday (and you surely heard that for many people, “Facebook meant the internet”). So currently, for many Russian speakers, the internet is basically YouTube for videos + Telegram for everything else.
My understanding (but not sure) is that Telegram is also dominant in Iran and Ethiopia (combined population > 200 million), but I have no idea what is the situation with spam in these sectors of Telegram.
I think Telegram is also huge in Brazil, but not dominant.
This is a rhetorical question. I just tell you where a lot of people are right now, and where LLM-enabled spam is a huge problem right now. I think these are the conditions that you should be looking for if you want to test Web of Trust mechanisms at scale. But, of course, you might make a normative decision not try help Telegram to grow even bigger because you are not satisfied with its level of openness and decentralisation. Though, I want to note Telegram is more open than any other major messaging platform: its content API is open, anyone can create alternative clients.
It is likely. I don’t want to extend the reign of systems that aren’t deeply upgradeable/accountable/extensible.
And it’s not even as simple as proprietary vs open source, an open source project can be hostile to contributions, or lack processes for facilitating mass transitions in standards of use.
This is specifically one of the problems [BetterDiscourse] is conceived to address. Like, there are many “basically reasonable” positions/comments that I am happy to promote through an upvote (and most people vote this way, too), but is a low information content for me because it’s already my position, or close to my position. With separate upvote/downvote and insightful/not reactions, I can switch between looking at the most popular positions among the crowd (and Pol.is, Viewpoints.xyz, and Community Notes further remove political bias from this signal, thus prioritising the “greatest common denominator” position), and the comments that are most likely to have the greatest informational value for me personally.
And to make it clear, the claim that such “informational value first” comment ordering model is realistically trainable on user’s reactions to comments on different topics, and quickly, i.e., only on a few or a few dozen reactions from the user, is currently a hypothesis. I’m not sure there are good ways to test this hypothesis short of just trying to train such a model and see whether a large portion of people will find it useful.
In the beginning of the “Solution” section, I wrote that in principle, the information value of the comment should be in part predictable from “user’s levels of knowledge in this or that fields, beliefs, current interests, ethics, and aesthetics”, but there is a big question mark whether this information could be easily inferred from user’s reactions to other comments, or assessed for a comment in isolation when the prediction model is applied to it.
Right… I think it can’t, recognizing that is equivalent to being able to recognize surprising truth, it’s kind of AGI-complete.
There are not so many top experts in any particular niche, and as soon as any are identified, there comes to be a huge bulk of users who will imitate them, so actual experts wont be an obviously important category to the recommender engine and it might not be able to tell them apart from their crowd.
For that we may depend on more explicit systems like webs of trust for expert recommendations. Users have to apply their own intelligence to identify the real (probable) experts, explicitly communicate that recognition, and they have to see that the experts have endorsed the comment being shown to them.
We follow experts because their taste differs from ours, because their recommendations are not intuitive to us.
I should ask, is free energy reduction something we actually know how to train? I can see a way of measuring it, but it’s not economically feasible.
Thanks for sharing your work. Some comments on that:
Oh I highly doubt that the reason was the technical problem :) If anything, we should interpret it as there wasn’t strong enough demand for solving that technical problem, as of yet. Perhaps now there will be, due to LLM-enabled spam. Telegram is drowning in spam in the last half a year, Twitter, YouTube struggle, too.
Are you in contact with https://subconscious.network/ developers? They may benefit from the algorithms that you develop.
I see these systems as more complementary: Webs of Trust for moderation, filtering, and user gating (perhaps, as the key piece of decentralised content delivery platforms/networks), and algorithms for content ordering that has already passed the filter. In fact, WoT is one of the ways to do proof of personhood, and I recognise that it might be a critical foundation for [BetterDiscourse], while centralised proof-of-humanness systems such as WorldCoin may have too slow adoption.
No but I’m aware of them, what they’re doing sounds pretty cool, and yeah, it is the kind of moderation system that you need for bootstrapping big collaborative wikis.
[checks in on what they’re doing] … yeah this sounds like a good protocol, maybe the best. I should take a closer look at this. My project might be convergent with theirs. Maybe I should try to connect with them. Darn, I think what happened was I got them confused with Fission (who do a hosting system for wasm in IPFS, develop UCAN. Subconscious uses these things, and has the exact same colors in its logo), so I’ve been hanging out with Fission instead xD.
I was thinking the same thing. I ended up not mentioning it because it’s not immediately clear to me how users would police the introduction of non-human participants, in an algorithmic context, since users are interacting less directly; if someone starts misbehaving (IE, upvoting scam ads), it’s a hard for their endorsers to debug that.
Do you know how you’d approach this?
Additionally, the tasteweb work is about making WoTs usable for subjective moderation, it seems to me that you actually need WoTs just to answer an objective question of who’s human or not (which you use to figure out which users to focus your training resources on), and then your algorithmic system does the subjective parts of moderation. Is that correct? In that case, it might make sense for you to use existing old fashioned O(n^2) energy propagation algorithms, you could talk to alignment ecosystem’s “eigenkarma network” people about that. Algorithm discussed here.
Or, I note, you could instead use multi-origin dijkstra (O(n)) (or the min dijkstra from any of the known humans), to update metrics of who’s close to the network of a few confirmed human-controlled accounts.
For some reason I seem to be the only one who’s noticed that distance is an adequate metric of trust that’s also much easier to compute than the prior approaches. I think maybe everyone else is looking for guidance from the prior art, even though there is very little of it and it obviously doesn’t scale (I’m pretty sure you could get that stuff to run on a minute-long cycle for 1M users, but 10M might be too much, and it’s never getting to a billion.)
Update, checked out the subconscious protocol. It’s just okay. Doesn’t have finality. I’m fairly sure something better will come along.
I’m kind of planning on not committing to a distributed state protocol at first. Maybe centralizing it at first while keeping all the code abstracted so it’ll be easy to switch later.
Edit: Might use it anyway though. It is okay, and it makes it especially easy to guarantee that it will be possible for users to switch to something more robust later. It has finality as long as you trust one of the relays (us).