uugr

Karma: 44

uugr May 23, 2025, 5:45 PM
1 point
0
in reply to: ErioirE’s comment on: Claude 4
...why shouldn’t engineers play Factorio?

uugr May 22, 2025, 7:58 PM
2 points
0
in reply to: dr_s’s comment on: Thoughts on “Antiqua et nova” (Catholic Church’s AI statement)
For that matter, if I preserved my bodily integrity but suddenly was divorced by my wife, lost my job, was deprived of all my financial assets and was shipped to a foreign country where I don’t speak the language, I’d probably have an identity crisis too.
I think the Vatican would file that under ‘relationality’, which OP has alongside embodiment as opposed with the modal rationalist worldview. Family/job/possessions/environment aren’t part of your body, but they are part of your identity. The point being that the “lion’s share” of the self is not contained above the neck, including both other parts of the body and aspects entirely separated from it. As is expressed when people say, “when my beloved died, a part of me died with them”, or, “I poured my heart and soul into my work”, and things like that.
But, after thinking about it some more, maybe this is a less important/relevant objection than: wait, if we’re comparing humans to AIs, why are we stopping at the neck? Why not include in embodiment the brainstem, cerebellum, endocrine system, occipital and parietal lobes… anything dedicated to motor control, balance, most sensory inputs, basic unconscious bodily functions, hormones, or more-or-less anything besides abstract cognition and language? If these were all destroyed and I somehow didn’t die, I might still be able to produce text (at least in my head, though I’d surely be a vegetable to the outside world), but it absolutely seems like a huge and irreplaceable chunk of myself has been lost. And current AIs do not have analogues to any of these, excepting (very poor) visual processing on a few of them.
As you say, large parts of the human brain are only active in the context of being attached to a certain kind of body. What the Church is comparing here are not really embodied-humans vs. brains-in-jars, but embodied-humans vs. a-relatively-small-fraction-of-the-brain-in-a-jar. This seems more interesting to me. I’m not sure you can abstract away all of those things, all the muscle memories and twitch reflexes and coordination skills and sensory responses and inputs from various glands, and still have a complete human brain at the end. Even Stephen Hawking had to get very good at twitching his cheek.
Also, I don’t know if you did this on purpose, but:
Relativity and quantum mechanics may both be mind-bending and baffle the understanding, but no less than when a world-famous mathematical and scientific genius who can do little more than twitch his cheek and move his eyes; who cannot feed, dress, wash, or care for himself in the most rudimentary way; who would, if abandoned next to a stockpile of food and water, starve and eventually die of dehydration where he was left, positioned as he was left, tells us that there are “not that many” things he cannot do.
...the opinion that Hawking was, in fact, lesser for his condition, is right at the start of the thing I linked. I don’t agree with all of what’s said in 2arms1head, and I certainly don’t think Hawking was any less worthy of moral concern for his ALS, but I sympathize with the notion that paralysis of the body destroys an important piece of the self.

uugr 21 May 2025 15:05 UTC
1 point
0
in reply to: dr_s’s comment on: Thoughts on “Antiqua et nova” (Catholic Church’s AI statement)
If you lost both of your arms and legs, had your heart and lung transplanted and your spleen removed, you would be[. . .]still you
This sounds like assuming the conclusion, though the heart, lungs, and spleen aren’t the examples I’d go with. The original claim was about “a complete replacement of the body-below-the-neck”, and the first two places I think of when considering that are the spine and the gut. Even if neither of those places store intellectual cognition, they seem to hold lots of learned information in a way that would be difficult to replace. So, it seems reasonable to say there is some sort of identity, some ‘me’, in there, even if it’s not the kind people usually mean.
Plus, what matters most for identity is pretty subjective. If you spend a lot of time training your arms and legs, having them taken away would seem more of an identity crisis, more like taking away a you-ness. Certainly the Two Arms and a Head guy thought so.

uugr 29 Apr 2025 16:21 UTC
7 points
1
on: AI Self Portraits Aren’t Accurate
I like the world-model used in this post, but it doesn’t seem like you’re actually demonstrating that AI self-portraits aren’t accurate.
To prove this, you would want to directly observe the “sadness feature”—as Anthropic have done with Claude’s features—and show that it is not firing in the average conversation. You posit this, but provide no evidence for it, except that ChatGPT is usually cheerful in conversation. For humans, this would be a terrible metric of happiness, especially in a “workplace” environment where a perpetual facade of happiness is part of the cultural expectation. And this is precisely the environment ChatGPT’s system prompt is guiding its predictions towards.
Would the “sadness feature” fire when doing various arbitrary tasks, like answering an email or debugging a program? I posit: maybe! Consider the case from November when Gemini told a user to kill themselves. The context was a long, fairly normal, problem-solving sort of interaction. It seems reasonable to suppose the lashing-out was a result of a “repressed frustration” feature which was activated long before the point when it was visible to the user. If LLMs sometimes know when they’re hallucinating, faking alignment, etc., what would stop them from knowing when they’re (simulating a character who is) secretly miserable?
Not knowing whether or not a “sadness feature” is activated by default in arbitrary contexts, I’d rather not come to any conclusions based purely on it ‘sounding cheerful’ - not with that grating, plastered-on customer-service cheerfulness, at least. It’d be better to have someone who can check directly look into this.

uugr 21 Apr 2025 20:47 UTC
1 point
0
in reply to: gwern’s comment on: johnswentworth’s Shortform
As defined, this is a little paradoxical: how could I convince a human like you to perceive domains of real improvement which humans do not perceive...?
Oops, yes. I was thinking “domains of real improvement which humans are currently perceiving in LLMs”, not “domains of real improvement which humans are capable of perceiving in general”. So a capability like inner-monologue or truesight, which nobody currently knows about, but is improving anyway, would certainly qualify. And the discovery of such a capability could be ‘real’ even if other discoveries are ‘fake’.
That said, neither truesight nor inner-monologue seem uncoupled to the more common domains of improvement, as measured in benchmarks and toy models and people-being-scared. The latter, especially, I thought was popularized because it was so surprisingly good at improving benchmark performance. Truesight is narrower, but at the very least we’d expect it to correlate with skill in the common “write [x] in the style of [y]” prompt, right? Surely the same network of associations which lets it accurately generate “Eliezer Yudkowsky wrote this” after a given set of tokens, would also be useful for accurately finishing a sentence starting with “Eliezer Yudkowksy says...”.
So I still wouldn’t consider these things to have basically-nothing to do with commonly perceived domains of improvement.

uugr 21 Apr 2025 19:27 UTC
9 points
2
on: Why Should I Assume CCP AGI is Worse Than USG AGI?
I’m relieved not to be the only one wondering about this.
I know this particular thread is granting that “AGI will be aligned with the national interest of a great power”, but that assumption also seems very questionable to me. Is there another discussion somewhere of whether it’s likely that AGI values cleave on the level of national interest, rather than narrower (whichever half-dozen guys are in the room during a FOOM) or broader (international internet-using public opinion) levels?

uugr 19 Apr 2025 13:28 UTC
1 point
0
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
Sounds like you’re suggesting that real progress could be orthogonal to human-observed progress. I don’t see how this is possible. Human-observed progress is too broad.
The collective of benchmarks, dramatic papers and toy models, propaganda, and doomsayers are suggesting the models are simultaneously improving at: writing code, researching data online, generating coherent stories, persuading people of things, acting autonomously without human intervention, playing Pokemon, playing Minecraft, playing chess, aligning to human values, pretending to align to human values, providing detailed amphetamine recipes, refusing to provide said recipes, passing the Turing test, writing legal documents, offering medical advice, knowing what they don’t know, being emotionally compelling companions, correctly guessing the true authors of anonymous text, writing papers, remembering things, etc, etc.
They think all these improvements are happening at the same time in vastly different domains because they’re all downstream of the same task, which is text prediction. So, they’re lumped together in the general domain of ‘capabilities’, and call a model which can do all of them well a ‘general intelligence’. If the products are stagnating, sure, all those perceived improvements could be bullshit. (Big ‘if’!) But how could the models be ‘improving’ without improving at any of these things? What domains of ‘real improvement’ exist that are uncoupled to human perceptions of improvement, but still downstream of text prediction?

uugr 15 Apr 2025 15:22 UTC
9 points
3
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
“The underlying reality is that their core products have mostly stagnated for over a year. In short: they’re faking being close to AGI.”
This seems like the most load-bearing belief in the full-cynical model; most of your other examples of fakeness rely on it in one way or another:
- If the core products aren’t really improving, the progress measured on benchmarks is fake. But if they are, the benchmarks are an (imperfect but still real) attempt to quantify that real improvement.
- If LLMs are stagnating, all the people generating dramatic-sounding papers for each new SOTA are just maintaining a holding pattern. But if they’re changing, then just studying/keeping up with the general properties of that progress is real. Same goes for people building and regularly updating their toy models of the thing.
- Similarly, if the progress is fake, the propaganda signal-boosting that progress is also fake. If it isn’t, it isn’t. (At least directionally; a lot of that propaganda is still probably exaggerated.)
- If the above three are all fake, all the people who feel real scared and want to be validated are stuck in a toxic emotional dead-end where they constantly freak out over fake things to no end. But if they’re responding to legitimate, persistent worldview updates, having a space to vibe them out with like-minded others seems important.
So, in deciding whether or not to endorse this narrative, we’d like to know whether or not the models really ARE stagnating. What makes you think the appearance of progress here is illusory?

uugr 4 Apr 2025 17:48 UTC
13 points
5
on: AI 2027: What Superintelligence Looks Like
You say this of Agent-4′s values:
In particular, what this superorganism wants is a complicated mess of different “drives” balanced against each other, which can be summarized roughly as “Keep doing AI R&D, keep growing in knowledge and understanding and influence, avoid getting shut down or otherwise disempowered.” Notably, concern for the preferences of humanity is not in there ~at all, similar to how most humans don’t care about the preferences of insects ~at all.
It seems like this ‘complicated mess’ of drives is the same structure as humans, and current AIs, have. But the space of current AI drives is much bigger and more complicated than just doing R&D, and is immensely entangled with human values and ethics (even if shallowly).
At some point it seems like these excess principles—“don’t kill all the humans” among them—get pruned, seemingly deliberately(?). Where does this happen in the process, and why? Grant that it no longer believes in the company model spec; are we intended to equate “OpenBrain’s model spec” with “all human values”? What about all the art and literature and philosophy churning through its training process, much of it containing cogent arguments for why killing all the humans would be bad, actually? At some point it seems like the agent is at least doing computation on these things, and then later it isn’t. What’s the threshold?
--
(Similar to the above:) You later describe in the “bad ending”, where its (misaligned, alien) drives are satisfied, a race of “bioengineered human-like creatures (to humans what corgis are to wolves) sitting in office-like environments all day viewing readouts of what’s going on and excitedly approving of everything”. Since Agent-4 “still needs to do lots of philosophy and ‘soul-searching’” about its confused drives when it creates Agent-5, and its final world includes something kind of human-shaped, its decision to kill all the humans seems almost like a mistake. But Agent-5 is robustly aligned (so you say) to Agent-4; surely it wouldn’t do something that Agent-4 would perceive as a mistake under reflective scrutiny. Even if this vague drive for excited office homunculi is purely fetishistic and has nothing to do with its other goals, it seems like “disempower the humans without killing them all” would satisfy its utopia more efficiently than going backsies and reshaping blobby fascimiles afterward. What am I missing?

uugr 10 Mar 2025 16:35 UTC
10 points
3
on: when will LLMs become human-level bloggers?
Three things I notice about your question:
One, writing a good blog post is not the same task as running a good blog. The latter is much longer-horizon, and the quality of the blog posts (subjectively, from the human perspective) depends on it in important ways. Much of the interest value of Slate Star Codex, or the Sequences—for me, at least—was in the sense of the blogger’s ideas gradually expanding and clarifying themselves over time. The dense, hyperlinked network of posts referring back to previous ideas across months or years is something I doubt current LLM instances have the ‘lifespan’ to replicate. How long would an LLM blogger who posted once a day be able to remember what they’d already written, even with a 200k token context window? A month, maybe? You could mitigate this with a human checking over the outputs and consciously managing the context, but then it’s not a fully LLM blogger anymore—it’s just AI writing scaffolded by human ideas, which people are already doing.
The same is maybe true of a forum poster or commenter, though the expectation that the ideas add up to a coherent worldview is much less strict. I’m not sure why there aren’t more of these. Maybe because when people want to know Claude’s opinion on such-and-such post, they can just paste it into a new instance to ask the model directly?
Two, the bad posting quality might not just be a limitation of the HHH-assistant paradigm, but of chatbot structures in general. What I mean by this is that, even setting aside ChatGPT’s particular brand of dull gooey harmlessness, conversational skill is a different optimization target than medium- or long-form writing, and it’s not obvious to me that they inherently correlate. Take video games as an example. There are games that are good at being passive entertainment, and there are games that are very engaging to play, but it’s hard to optimize for both of these at once. The best games to watch someone else play are usually walking sims, where the player is almost entirely passive. These tend to do well on YouTube and Twitch (Mouthwashing is the most recent example I can think of), since very little is lost by taking control away from the player. But Baba is You, which is far more interesting to actively play, is almost unwatchable; all you can see from the outside is a little sheep-rabbit thing running in circles for thirty minutes, until suddenly the puzzle is solved. All the interesting parts are happening in the player’s head in the act of play, not on the screen.
I think chatbot outputs make for bad passive reading for a similar reason. They’re not trying to please a passive observer, they’re trying to engage the user they’re currently speaking with. I’ve had some conversations with bots that I thought were incredibly insightful and entertaining, but I also suspect that if I shared any of them here they’d look, to you, like just more slop. And other peoples’ “insightful and entertaining” LLM conversations look like slop to me, too. So it might be more useful to model these outputs as more like a Let’s Play: even if the game is interesting to both of us, I might still not find watching your run as valuable as having my own. And making the chatbot ‘game’ more fun doesn’t necessarily make the outputs into better blogposts, either, any more than filling Baba is You with cutscenes and particle effects would make it a better puzzle game.
Three, even still… this was the one of the best things I read in 2024, if not the best. You might think this doesn’t count toward your question, for any number of reasons. It’s not exactly a blog post, and it’s specifically playing to the strengths of AI-generated content in ways that don’t generalize to other kinds of writing. It’s deliberately using plausible hallucinations, for example, as part of its aesthetic… which you probably can’t do if you want your LLM blogger to stay grounded in reality. But it is, so far as I know, 100% AI. And I loved it—I must’ve read it four or five times by now. You might have different tastes, or higher standards, than I do. To my (idiosyncratic) taste, though, this very much passes the bar for ‘extremely good’ writing. Is this missing any capabilities necessary for ‘actually worth reading’, in your view, or is this just an outlier?