Nathan Helm-Burger comments on How do you feel about LessWrong these days? [Open feedback thread]

Nathan Helm-Burger 6 Dec 2023 5:44 UTC
27 points
10
Ok, slightly off topic, but I just had a wacky notion for how to break-up groupthink as a social phenomenon. You know the cool thing from Audrey Tang’s ideas, Polis? What if we did that, but we found ‘thought groups’ of LessWrong users based on the agreement voting. And then posts/comments which were popular across thought-groups instead of just intensely within a thought group got more weight?
Niclas Kupper tried a LessWrong Polis to gather our opinions a while back. https://www.lesswrong.com/posts/fXxa35TgNpqruikwg/lesswrong-poll-on-agi
- Yoav Ravid 6 Dec 2023 6:25 UTC
  12 points
  0
  Parent
  So, something like the community notes algorithm?
  - Nathan Helm-Burger 6 Dec 2023 6:35 UTC
    5 points
    0
    Parent
    https://vitalik.eth.limo/general/2023/08/16/communitynotes.html
    Ah, as a non-Twitter user I hadn’t known about this. Neat.
    Quote
    For any given note, most users have not rated that note, so most entries in the matrix will be zero, but that’s fine. The goal of the algorithm is to create a four-column model of users and notes, assigning each user two stats that we can call “friendliness” and “polarity”, and each note two stats that we can call “helpfulness” and “polarity”. The model is trying to predict the matrix as a function of these values, using the following formula:
    Note that here I am introducing both the terminology used in the Birdwatch paper, and my own terms to provide a less mathematical intuition for what the variables mean
    μ is a “general public mood” parameter that accounts for how high the ratings are that users give in general
    is a user’s “friendliness”: how likely that particular user is to give high ratings
    is a note’s “helpfulness”: how likely that particular note is to get rated highly. Ultimately, this is the variable we care about.
    or is user or note’s “polarity”: its position among the dominant axis of political polarization. In practice, negative polarity roughly means “left-leaning” and positive polarity means “right-leaning”, but note that the axis of polarization is discovered emergently from analyzing users and notes; the concepts of leftism and rightism are in no way hard-coded.
    The algorithm uses a pretty basic machine learning model (standard gradient descent) to find values for these variables that do the best possible job of predicting the matrix values. The helpfulness that a particular note is assigned is the note’s final score. If a note’s helpfulness is at least +0.4, the note gets shown.
    The core clever idea here is that the “polarity” terms absorb the properties of a note that cause it to be liked by some users and not others, and the “helpfulness” term only measures the properties that a note has that caused it to be liked by all. Thus, selecting for helpfulness identifies notes that get cross-tribal approval, and selects against notes that get cheering from one tribe at the expense of disgust from the other tribe.
    - Seth Herd 7 Dec 2023 19:38 UTC
      4 points
      0
      Parent
      This is the formalization of the concept “left hand whuffy” from Charlie Stross’s “down and out in the magic kingdom”, 2003. When people who usually disagree with people like you actually agree with you or like what you’ve said, that’s special and deserves attention. I’ve always wanted to see it implemented. I don’t usually tweet but I’ll have to look at this.
      - Said Achmiz 7 Dec 2023 23:58 UTC
        3 points
        0
        Parent
        Down and Out in the Magic Kingdom was by Cory Doctorow, not Stross.
        Seth Herd 8 Dec 2023 0:31 UTC
        2 points
        0
        Parent
        Good catch. I’d genuinely misremembered. I lump the two together, but generally far prefer Stross as a storyteller, even though Doctorow’s futurism is also first-rate, in a different dimension. I found the story in Down and Out to be Stross-quality.
        
        That sort of good idea for a social network improvement is definitely signature Doctorow, though.
- Ebenezer Dukakis 9 Dec 2023 6:58 UTC
  7 points
  2
  Parent
  Another idea is to upweight posts if they’re made by a person in thought group A, but upvoted by people in thought group B.
- jacobjacob 22 Dec 2023 5:34 UTC
  4 points
  2
  Parent
  Yeah, I’m interested in features in this space!
  Another idea is to implement a similar algorithm to Twitter’s community votes: identify comments that have gotten upvotes by people who usually disagree with each other, and highlight those.
- Roman Leventov 6 Dec 2023 20:10 UTC
  4 points
  0
  Parent
  This idea is definitely simmering in many people’s heads at the moment :)
- Nathan Young 20 Dec 2023 18:51 UTC
  3 points
  0
  Parent
  How private are the LessWrong votes?
  
  Would you want to do it overall or blog by blog. Seems pretty doable.
  - Nathan Helm-Burger 22 Dec 2023 2:02 UTC
    3 points
    0
    Parent
    Currently, the information about who voted which way on what things is private to the individual who made the vote in question and the LW admins.
    So if doing this on LW votes, it’d need to be done in cooperation with the LW team.
- Nathan Helm-Burger 22 Dec 2023 2:08 UTC
  2 points
  0
  Parent
  I’m pasting this here because it’s the sort of thing I’d like to see. I’d like to see where I fall in it, and at least the anonymized position of others. Also, it’d be cool to track how I move over time. Movement over time should be expected unless we fall into the ‘wrong sort of updateless decision theory’ as jokingly described by TurnTrout (and term coined by Wei Dai). https://www.lesswrong.com/posts/j2W3zs7KTZXt2Wzah/how-do-you-feel-about-lesswrong-these-days-open-feedback?commentId=X7iBYqQzvEgsppcTb
  Richard’ comment about thought groups:
  Richard Ngo 4mo15
  Five clusters of alignment researchers
  Very broadly speaking, alignment researchers seem to fall into five different clusters when it comes to thinking about AI risk:
  MIRI cluster. Think that P(doom) is very high, based on intuitions about instrumental convergence, deceptive alignment, etc. Does work that’s very different from mainstream ML. Central members: Eliezer Yudkowsky, Nate Soares.
  Structural risk cluster. Think that doom is more likely than not, but not for the same reasons as the MIRI cluster. Instead, this cluster focuses on systemic risks, multi-agent alignment, selective forces outside gradient descent, etc. Often work that’s fairly continuous with mainstream ML, but willing to be unusually speculative by the standards of the field. Central members: Dan Hendrycks, David Krueger, Andrew Critch.
  Constellation cluster. More optimistic than either of the previous two clusters. Focuses more on risk from power-seeking AI than the structural risk cluster, but does work that is more speculative or conceptually-oriented than mainstream ML. Central members: Paul Christiano, Buck Shlegeris, Holden Karnofsky. (Named after Constellation coworking space.)
  Prosaic cluster. Focuses on empirical ML work and the scaling hypothesis, is typically skeptical of theoretical or conceptual arguments. Short timelines in general. Central members: Dario Amodei, Jan Leike, Ilya Sutskever.
  Mainstream cluster. Alignment researchers who are closest to mainstream ML. Focuses much less on backchaining from specific threat models and more on promoting robustly valuable research. Typically more concerned about misuse than misalignment, although worried about both. Central members: Scott Aaronson, David Bau.
  Remember that any such division will be inherently very lossy, and please try not to overemphasize the differences between the groups, compared with the many things they agree on.
  Depending on how you count alignment researchers, the relative size of each of these clusters might fluctuate, but on a gut level I think I treat all of them as roughly the same size.