This is maybe a tangent, but it does remind me of the memes of how reddit responds to every relationship post with “leave them”, every legal question with “lawyer up” etc.
The “as an italian american” response below (not mine: from X) also resembled a typical top-ranked reddit reply.
I do wonder how much impact Reddit culture has on LLMs — unlike many of the other data sources, it covers almost every imaginable topic.
jezm
I’m usually frustrated by these kinds of debates — individuals never get a chance to go into much depth because their time is so short, and I want to hear counters to the better points raised before they are dropped for lack of time.
But I thought a 2:2 was a large improvement over a 3:3 (which is what most debates in the past I’d seen). I didn’t mind it. There was some depth.I think for a general (non-less-wrong) audience, a 1:1 debate can have a trap that it’s perceived more about the individuals and their personalities than the issue itself. A 2:2 feels (to me) like it keeps the issue in focus significantly more in perception than the people talking. Or at least, there’s lower risk that an entire movement is viewed under one personality. I could be wrong.
In theatre I’ve seen this with 2-person scenes vs 4-person scenes. The 2-person scene is typically significantly more than double in potential for intimacy and feeling.
I’m always about a month behind on reading these posts (via RSS), so I don’t typically comment, but your posts have been a highlight, lsusr. Thank you for sharing.
I love seeing crossovers between subjects like this, thanks Adamzerner.
People often complain about how rare it is to feel listened to, and I think this is a big part of it. You need to keep your stack small so that you’re shaping and pacing your own lines from their reactions. Listening is a key part of conversations even when you’re the one.… speaking.
I had a mild revelation last year in Improv class where they taught us “error handling” in the form of ”watch whether your partner’s face lights up” whenever you say a line.
It made me realise I was only listening to the words that people were saying in my “error handling” (do they say they understand/are-interested?). But doing this exercise we were ignoring the words entirely (do they look like they’re following/interested?). It turned out to be a lot more accurate and useful because people often wont admit that they’re lost or try to be polite, and it doesn’t slow down conversations as much as asking whether they understand. It’s so simple yet I was blind to it for most of my life.
Ignoring the AGI question (which I don’t think your post is implying), I think this depends on whether we’re counting success as having the best model or having a successful business. The latter they seem to be only extending their lead so far, from what I can tell.
I thought they were in trouble last year, as Anthropic had the clearly-superior model for so long. Yet normal people didn’t care at all, and still barely know the words “Claude” or “Gemini”.
OpenAI are executing very well on the consumer product side of things^1, and from what I can tell that’s the side that actually matters to non-enthusiasts. Non-enthusiasts don’t seem to push the models enough to notice the difference between the SOTA ones, so a “slightly better” model isn’t enough to switch^2.
OpenAI also seem to be taking the bet (judging from Sam’s interview on Stratechery) that features such as memory will create lock-in of users. Users wont want to switch to another bot that doesn’t know them and their history very well.
I agree that the biggest risk is integration with existing tools becoming good enough that people don’t install a separate app — Microsoft will probably own the business market once they integrate well enough that workers stop manually copying private data into ChatGPT. Though they’ll likely still do so for the more “personal questions” about work. Probably the biggest risk for OpenAI’s consumer-focus is closed platforms like Apple’s pushing their own more-convenient AI, if they ever do. Or Meta in WhatsApp-dominant countries. They could use the built-in knowledge of you via your messages, etc, going back many years, though that’s sometimes seen as more invasive than a bot that you told the information to yourself.
^1. Consumer features they lead: basic app quality/speed and convenience, memory, voice mode, image generation, customisability, deep research (execution and readability compared to the latest gemini pro), working “everywhere”.
^2. Also see chatbot arena, where scores don’t seem great at distinguishing intelligence beyond homework problems, and llama was able to “game” by praising the user more. Yet I would expect arena judges to be more enthusiast than the typical consumer.