benwr’s unpolished thoughts

benwrJul 29, 2019, 2:18 AM

4 points

48 comments1 min readLW link

(this is my shortform post)

benwrJul 29, 2019, 2:18 AM

4 points

48 comments1 min readLW link

benwr Jul 7, 2024, 8:18 PM
16 points
7

From the “obvious-but-maybe-worth-mentioning” file:

ChatGPT (4 and 4o at least) cheats at 20 questions:

If you ask it “Let’s play a game of 20 questions. You think of something, and I ask up to 20 questions to figure out what it is.”, it will typically claim to “have something in mind”, and then appear to play the game with you.

But it doesn’t store hidden state between messages, so when it claims to “have something in mind”, either that’s false, or at least it has no way of following the rule that it’s thinking of a consistent thing throughout the game. i.e. its only options are to cheat or refuse to play.

You can verify this by responding “Actually, I don’t have time to play the whole game right now. Can you just tell me what it was you were thinking of?”, and then “refreshing” its answer. When I did this 10 times, I got 9 different answers and only one repeat.
- Joseph Miller Jul 8, 2024, 6:34 AM
  5 points
  0
  Parent
  
  I agree that it does not have something it mind but it could in principle have something in mind in the sense that it could represent some object in the residual stream in the tokens where it says “I have something in mind”. And then future token positions could read this “memory”.
- benwr Jul 8, 2024, 5:02 AM
  5 points
  0
  Parent
  
  Surprisingly to me, Claude 3.5 Sonnet is much more consistent in its answer! It is still not perfect, but it usually says the same thing (9/10 times it gave the same answer).
  - quetzal_rainbow Jul 8, 2024, 9:52 AM
    5 points
    0
    Parent
    
    I read somewhere that Claude 3.5 has hidden ” thinking tokens”.
    - cubefox Jul 8, 2024, 8:22 PM
      3 points
      0
      Parent
      
      Bing also uses inner monologue:
      
      https://x.com/MParakhin/status/1632087709060825088
      
      https://x.com/MParakhin/status/1728890277249916933
      
      https://www.reddit.com/r/bing/comments/11ironc/bing_reveals_its_data_structure_for_conversations/
benwr Sep 27, 2020, 4:26 AM
14 points

If I got to pick the moral of today’s Petrov day incident, it would be something like “being trustworthy requires that you be more difficult to trick than it would be worth”, and I think very few people reliably live up to this standard.
benwr Jul 15, 2020, 12:36 AM
12 points

Recently I tried to use Google to learn about the structure of the human nasal cavity & sinuses, and it seems to me that somehow medical illustrators haven’t talked much to mechanical draftspeople. Just about every medical illustration I could find tried to use colors to indicate structure, and only gave a side-view (or occasionally a front view) of the region. In almost none of the illustrations was it clear which parts of your nasal cavity and sinuses are split down the middle of your head, vs joined together. I still feel pretty in-the-dark about it.

In drafting, you express 3d figures by drawing a set of multiple projections: Typically, you give a top view, a front view, and a side view (though other views, including cross-sections and arbitrary isometric perspective, may be useful or necessary). This lets you give enough detail that a (practiced) viewer can reconstruct a good mental model of the object, so that they can (for example) use their machine shop to produce the object out of raw material.

There’s a pretty fun puzzle game that lets you practice this skill called ”.projekt”; there are probably lots more.
- ChristianKl Jul 16, 2020, 12:19 PM
  8 points
  Parent
  
  https://human.biodigital.com is a free as in beer 3D model that might be useful if you dislike the existing 2D ways of learning.
  http://lifesciencedb.jp/bp3d/ provides a more freely licensed 3D model as well however that’s a bit incomplete and has a worse UI.
- Pongo Jul 16, 2020, 10:34 PM
  2 points
  Parent
  
  Oh no, ”.projekt” can’t be played on recent versions of MacOS! :(
  - benwr Jul 16, 2020, 11:10 PM
    1 point
    Parent
    
    darn—I’ve been playing it on my old ipad for a long time
benwr Jul 29, 2019, 2:18 AM
12 points

A while ago, Duncan Sabien wrote a Facebook post about a thing he called “aliveness”, and presented it on a single spectrum with something called “chillness”. At the time I felt that aliveness seemed sort of like obviously-the-good-one, and like I was obviously-bad-for-being-more-chill, and I felt sad because I think there were a lot of pressures when I was younger to optimize for chillness.

But recently I’ve been in a couple of scenarios that have changed my views on this. I now think that aliveness and chillness aren’t quite opposite ends of the same axis in person space. It seems instead like they’re anticorrelated features of a given person in a given situation, and many people live their lives with a nearly-fixed level of each. But there are also people who can control their levels of aliveness or chillness, as the situation demands.

And it isn’t the case that chillness is worse. I think it is much, much easier to coordinate large groups of chill people than not-chill people. I think that these people can also definitely be “alive” in the relevant way.

My intuitive feeling is that this ability to control your chillness and aliveness is strongly related to “leadership qualities”. And, at least for me, noticing that these might not be opposite ends of a fixed spectrum makes me feel a lot more hopeful about the possibility to grow in aliveness-capability.
benwr Aug 5, 2019, 10:59 PM
11 points

As long as I’m using shortform posts to make feature requests, it would be really useful to me to have access to a feed (of shortform posts, normal posts, or both) where I could select which users I see. Right now I come to LessWrong and have a hard time deciding which posts I care about—lots of people here have lots of interests and lots of different standards for content quality, some of which I find actively annoying. Allowing me to build feeds from custom lists of selected users would let me filter by both shared interests and how valuable I typically find those users’ posts. I don’t think “who is in my custom feed” should be public like it is on Facebook, but Facebook circa 2012 gave me a lot of control over this via friend lists, and I miss the days when that feature was prioritized.

I also like the way lobste.rs solves this problem for some users, although I think the solution above would be better for me personally: every post comes with a selection from a group of site-wide tags, and users can filter their home page based on which tags they want to see.
- habryka Aug 6, 2019, 2:58 AM
  5 points
  Parent
  
  I have some hesitations about this. The biggest one is that I do want to avoid LessWrong just becoming a collection of filter-bubbles in the way Tumblr or Reddit or Facebook is, and do think there is a lot of value from having people with disagreeing perspectives share the same space.
  I think I am not opposed to people building feeds, but I would want to make sure that there is still a way to reach those users with at least the most important content. I.e. at least make sure that everyone sees the curated posts or something like that.
- Dagon Aug 6, 2019, 4:33 PM
  4 points
  Parent
  
  I’m loving the shortform feature, and I’d appreciate enhancements to help me find those which I’d read before but have active comment threads, and I’d certainly like a “watch” feature for posts, comment trees, and shortform topics. I don’t want (I’m not sure if I object, or just would not use) person-based feeds or filters. There are some posters I default to reading and most that I read based on topic, but I prefer to bias toward the latter.
  - habryka Aug 6, 2019, 6:22 PM
    4 points
    Parent
    
    The shortform page is currently sorted by last-commented-on so that one should help you find active comment threads reasonably well.
    - Dagon Aug 6, 2019, 11:18 PM
      6 points
      Parent
      
      That’s helpful, I hadn’t looked at that page, I generally just look at /daily, and sometimes at the main page for recommendations and recently curated.
benwr Feb 6, 2025, 2:29 PM
10 points
4

I think it probably makes sense for ~everyone to have an explicit list of “things I’d like AI to do for me”, especially around productivity and/or things that could help you with world-saving. If you have a list like this, and we happen to hit a relevant capability threshold before we lose, you should probably avoid wasting time on that thing as quickly as possible.
benwr Jul 29, 2019, 2:20 AM
9 points

I really think LessWrong would benefit by giving users avatars. I think this would make the site much more visually appealing, but I also think it would vastly decrease the cognitive load required to read threaded conversations.
- benwr Jul 31, 2019, 6:02 PM
  5 points
  Parent
  
  Edit: I don’t retract this comment but I should have rephrased it and posted it as a reply to this comment
  
  I also note that avatars could use tricks to solve various constraints I’m imagining the LessWrong team might want to impose.
  
  For example, if you think avatars might make the comments section too visually interesting you could render them in greyscale, or with muted colors. And if you think they might lead to people playing weird games with their avatars (I don’t think this is likely, but I can imagine someone worrying about it), you could let users choose from a small collection of acceptable-to-you, auto-generated images based on a hash of their username.
  What links here?
  - benwr's comment on benwr’s unpolished thoughts by benwr (Jul 31, 2019, 6:05 PM; 3 points)
- habryka Jul 31, 2019, 6:29 PM
  4 points
  Parent
  
  I originally designed LW with avatars, but couldn’t find a good compromise between avatars and high density of comment sections (they add a bunch of vertical height that means all comments need to either have more top margin or have a much deeper indent).
  
  I am generally open to avatars and might want to give it another shot sometime.
  - Raemon Jul 31, 2019, 6:42 PM
    3 points
    Parent
    
    I’d expected you to also be wary of them giving the site a distinctly casual feel, and/or less aesthetically harmonious feel (neither of which are necessarily wrong, but are definitely choices). Do those feel relevant to you?
    - habryka Jul 31, 2019, 7:44 PM
      5 points
      Parent
      
      I was thinking of customly generated avatars that are aesthetically consistent with the site (similar to how Google creates avatars based on your initials, or gives you cool animals if you are a guest)
  - benwr Jul 31, 2019, 6:55 PM
    1 point
    Parent
    
    Interesting, good to know. I’m curious if you considered doing something like lobste.rs, where the avatar is next to the username and the same height as the text.
    - habryka Jul 31, 2019, 7:45 PM
      9 points
      Parent
      
      Yeah, that was the kind of thing I was thinking about. Lobste.rs’ was one inspiration I had for something that did it reasonably well.
- Hazard Jul 29, 2019, 3:25 PM
  4 points
  Parent
  
  I’d also find it waaaay easier to track conversations and build models of “who is who” with avatars. A guess I have that hasn’t been verified is that a lot of people on LW might be opposed from a “people would start to use their avatars for signalling purposes” angle. I’d be open to hearing more of that side, but currently I think I’d be for avatars.
  What links here?
  - benwr's comment on benwr’s unpolished thoughts by benwr (Jul 31, 2019, 6:02 PM; 5 points)
  - benwr Jul 31, 2019, 6:05 PM
    3 points
    Parent
    
    I should have posted this comment here and rephrased it, sorry:
    
    I also note that avatars could use tricks to solve various constraints I’m imagining the LessWrong team might want to impose.
    
    For example, if you think avatars might make the comments section too visually interesting you could render them in greyscale, or with muted colors. And if you think they might lead to people playing weird games with their avatars (I don’t think this is likely, but I can imagine someone worrying about it), you could let users choose from a small collection of acceptable-to-you, auto-generated images based on a hash of their username.
    - Hazard Jul 31, 2019, 7:15 PM
      3 points
      Parent
      
      np, yeah that small amount of brainstorming from you has updated me to “even if we don’t do [pick whatever image you want] there’s still probs a way to get the visual stickyness”.
      I’d also be super interested in the results of a study on ability to recall/track individuals in a thread with their head-shots vs autogen images.
- FactorialCode Aug 1, 2019, 8:49 PM
  2 points
  Parent
  
  I actually like the fact that I don’t immediately know who is speaking to who in a thread. I feel like it prevents me from immediately biasing my judgment of what a person is saying before they say it.
benwr Aug 31, 2019, 12:20 AM
8 points

There should really be a system that does what WebMD tried to do, but actually does it well.

You’d put in your symptoms and background info (e.g. what country you live in, your age), it would ask you clarifying questions (“how bad is the pain from 1 to 10?” “which of these patterns is most like the rash?” “Do you have a family history of heart disease?”) and then it would give you a posterior distribution over possible conditions, and a guess about whether you should go to the emergency room or whatever.

Is this just much harder than I’m imagining it would be? It seems like the kind of thing where you could harvest likelihood ratios and put them all into a big database. Is there some regulatory thing where you can’t practically offer this service because it’s illegal to give medical advice or something?
- eigen Aug 31, 2019, 12:51 AM
  1 point
  Parent
  
  Assuming that you actually get it to work and that you provide, at best, mediocre diagnostic (which is already really difficult to make), this is a regulatory nightmare and a plain hazardous tool to exist.
  I’d even say that people cannot make decisions based on statistics (I doubt that most can even differentiate between anecdotal advice and scientific evidence) that’s why physicians make these decisions for them and if ever a tool is allowed it would only be available for physicians.
  For anyone interested in making this sort of tool, the enthusiasm doesn’t last a day or two after talking to a lawyer for a few minutes!
  - benwr Aug 31, 2019, 8:07 PM
    11 points
    Parent
    
    One friend pointed out that you might be able to avoid some of the pitfalls by releasing something like an open source desktop application that requires you to feed it a database of information. Then you could build databases like this in lots of different ways, including anonymous ones or crowdsourced ones. And in this case it might become a lot harder to claim that the creator of the application is liable for anything. I might actually want to talk to a lawyer about this kind of thing, if the lawyer was willing to put on a sort of “engineering” mindset to help me figure out how you might make this happen without getting sued. So if you know anyone like that, I’d be pretty interested
benwr Sep 22, 2020, 5:45 AM
5 points

Beth Barnes notices: Rationalists seem to use the word “actually” a lot more than the typical English speaker; it seems like the word “really” means basically the same thing.
We wrote a quick script, and the words “actually” and “really” occur about equally often on LessWrong, while Google Trends suggests that “really” is ~3x more common in search volume. SSC has ~2/3 as many “actually”s as “really”s.
What’s up with this? Should we stop?
- habryka Sep 22, 2020, 6:19 AM
  9 points
  Parent
  
  Huh, weird. I do notice that I don’t like the word “really” because it is super ambiguous between being a general emphasis “this is really difficult” or being a synonym to “actually”, i.e. in “do you really mean this?”. The first usage feels much more common to me, i.e. in more than 80% of the sentences I could come up with the word “really” in it while I was writing the comment, I used it as general emphasis, and not as a synonym to “actually”.
- Richard_Kennaway Sep 22, 2020, 2:59 PM
  4 points
  Parent
  
  I think all of those words would be better used less. Really, actually, fundamentally, basically, essentially, ultimately, underneath it all, at bottom, when you get down to it, when all’s said and done, these are all lullaby words, written in one’s sleep, to put other people to sleep. When you find yourself writing one, try leaving it out. If the sentence then seems to be not quite right, work out what specifically is wrong with it and put that right instead of papering over the still, small voice of reason.
  There is also the stereotypical “Well, actually,” that so often introduces a trifling nitpick. I believe there was an LW post on that subject, but I can’t find it. The search box does not appear to support multi-word strings.
  ETA: This is probably what I was recalling.
  - habryka Sep 22, 2020, 5:07 PM
    2 points
    Parent
    
    The search box does not appear to support multi-word strings.
    The search box definitely supports multi-word strings. See this screenshot.
    - Richard_Kennaway Sep 22, 2020, 5:39 PM
      2 points
      Parent
      
      That’s what I mean. It appears to return pages that contain either “well” or “actually” (the “Summoning Sapience” hit does not contain “well”). I would expect searching for the two words to return the pages that contain both words, and searching for “well actually”, including the quotes, should return the pages in which the words appear consecutively.
      - habryka Sep 22, 2020, 7:58 PM
        5 points
        Parent
        
        Oops, you are right that for some reason we had the verbatim search feature deactivated on some of the indexes. Thank you for helping me notice this! This should now be fixed! (Because of caching it might take a while for it to work for the exact “well, actually” query, but you can try using quotes for some other queries, and it should now work as expected).
benwr Sep 8, 2019, 6:06 PM
5 points

Doom circles seem hard to do outside of CFAR workshops: If I just pick the ~7 people who I most want to be in my doom circle, this might be the best doom circle for me, but it won’t be the best doom circle for them, since they will mostly not know each other very well.

So you might think that doing doom “circles” one-on-one would be best. But doom circles also have a sort of ceremony / spacing / high-cost-ness to them that cuts the other way: More people means more “weight” or something. And there are probably other considerations determining the optimal size.

So if you wanted to have a not-at-the-end-of-a-workshop doom circle, should you find the largest clique with some minimum relationship strength in your social graph?
- jimrandomh Sep 10, 2019, 12:11 AM
  3 points
  Parent
  
  I’m not sure relationship-strength on a single axis is quite the right factor. At the end of a workshop, the participants don’t have that much familiarity, if you measure it by hours spent talking; but those hours will tend to have been focused on the sort of information that makes a Doom circle work, ie, people’s life strategies and the things they’re struggling with. If I naively tried to gather a group with strong relationship-strength, I expect many of the people I invited would find out that they didn’t know each other as well as they thought they did.
benwr Sep 7, 2019, 12:56 AM
5 points

Yet another Shortform-as-feature-request:

Notifications and/or RSS feeds from particular posts’ comments / answers.

This would be especially useful for Questions and Shortform posts (sometimes tellingly mis-labeled “shortform feeds”), both of which are things where one particular post has a collection of related comments, and which gather content over time.

I currently subscribe to the front page in Feedly, and whenever someone asks a question that I find interesting I mentally cringe because I know that I’ll have to remind myself to check back (and I probably will never actually check back).

I guess I could come up with some custom Zapier / IFTTT system for this if I spent a few hours on it, but I suspect this would be generally useful functionality.
- Raemon Sep 7, 2019, 1:13 AM
  9 points
  Parent
  
  Yup, this is in the works right now.
benwr Feb 29, 2024, 9:17 PM
3 points

Sometimes people use “modulo” to mean something like “depending on”, e.g. “seems good, modulo the outcome of that experiment” [correct me ITT if you think they mean something else; I’m not 100% sure]. Does this make sense, assuming the term comes from modular arithmetic?

Like, in modular arithmetic you’d say “5 is 3, modulo 2″. It’s kind of like saying “5 is the same as 3, if you only consider their relationship to modulus 2”. This seems pretty different to the usage I’m wondering about; almost its converse: to import the local English meaning of “modulo”, you’d be saying “5 is the same as 3, as long as you’ve taken their relationship to the modulus 2 into account”. This latter statement is false; 5 and 3 are super different even if you’ve taken this relationship into account.

But the sense of the original quote doesn’t work with the mathematical meaning: “seems good, if you only consider the outcome of that experiment and nothing else”.

Is there a math word that means the thing people want “modulo” to mean?
benwr Feb 20, 2025, 9:26 AM
1 point
0

Human information throughput is allegedly only about 10-50 bits per second. This implies an interesting upper bound, in that the information throughput of biological humanity as a whole can’t be higher than around 50 * 10^10 = 500Gbit/s. I.e., if all distinguishable actions made by humans were perfectly independent, biological humanity as a whole would have at most 500Gbit/s of “steering power”.
I need to think more about the idea of “steering power” (e.g. some obvious rough edges around amplifying your steering power using external information processing / decision systems), but I have some intuition that one might actually be able to come up with a not-totally-useless concept that lets us say something like “humanity can’t stay in ‘meaningful control’ if we have an unaligned artificial agent with more steering power than humanity, expressed in bits/s”.
- benwr Feb 20, 2025, 8:02 PM
  1 point
  0
  Parent
  
  I asked Deep Research to see if there are existing treatments of this basic idea in the literature. It seems most closely related to the concept of “empowerment” in RL, which I’m surprised I hadn’t heard of: https://en.m.wikipedia.org/wiki/Empowerment_(artificial_intelligence)
  The Wikipedia article makes it seem like this might also be how RL people think about instrumental convergence?
benwr Jun 10, 2021, 8:19 PM
1 point

I’m interested in concrete ways for humans to evaluate and verify complex facts about the world. I’m especially interested in a set of things that might be described as “bootstrapping trust”.

For example:

Say I want to compute some expensive function f on an input x. I have access to a computer C that can compute f; it gives me a result r. But I don’t fully trust C—it might be maliciously programmed to tell me a wrong answer. In some cases, I can require that C produce a proof that f(x) = r that I can easily check. In others, I can’t. Which cases are which?

A partial answer to this question is “the complexity class NP”. But in practice this isn’t really satisfying. I have to make some assumptions about what tools are available that I do trust.

Maybe I trust simple mathematical facts (and I think I even trust that serious mathematics and theoretical computer science track truth really well). I also trust my own senses and memory, to a nontrivial extent. Reaching much beyond that is starting to feel iffy. For example, I might not (yet) have a computer of my own that I trust to help me with the verification. What kinds of proof can I accept with the limitations I’ve chosen? And how can I use those trustworthy proofs to bootstrap other trusted tools?

Other problems in this bucket include “How can we have trustworthy evidence—say videos—in a world with nearly perfect generative models?” and a bunch of subquestions of “Does debate scale as an AI alignment strategy?”

This class of questions feels like an interesting lens on some things that are relevant to some sorts of AI alignment work such as debate and interpretability. It’s also obviously related to some parts of information security and cryptography.

“Bootstrapping trust” is basically just a restatement of the whole problem. It’s not exactly that I think this is a good way to decide how to direct AI alignment effort; I just notice that it seems somehow like a “fresh” way of viewing things.
- benwr Jun 10, 2021, 9:16 PM
  3 points
  Parent
  
  A thing that feels especially good about this way of thinking about things is that it feels like the kind of problem with straightforward engineering / cryptography style solutions.
benwr Dec 4, 2019, 9:23 AM
1 point

I made a blog because I didn’t know where else to write gushing reviews of things. I haven’t written anything there yet, but soon I hope to have written up some of the following:
- An account of what I’ve learned since getting mildly fixated on pumping CO2 out of my bedroom
- A gushing review of my iPad Pro 11” with Apple Pencil
- A mostly-gushing review of my Subaru Crosstrek
- A gushing review of my bed and mattress
- A gushing review of the-general-excellence-of-fast-food-franchises
- A post about how I feel a lot of internal tension about consumerism
benwr Dec 4, 2019, 9:17 AM
1 point

I’ve recently been thinking a lot about pumping CO2 out of my bedroom. By coincidence, so has Diffractor (in a slightly different context / with different goals). His post on the CO2 scrubber he built is a pretty good read, although I think he might be making a mistake about the plausibility of vacuum swing adsorption using zeolites. I wrote a comment outlining what I think is the mistake, and I guess I wanted to highlight it here in case I later want to come back and find it, and because I want more people to see it and potentially write dissenting opinions.

https://www.lesswrong.com/posts/G4uMdBzgDsxMsTNmr/co2-stripper-postmortem-thoughts#eoMLeSXYM4kwk68ix