jimrandomh

Karma: 21,629

LessWrong developer, rationalist since the Overcoming Bias days. Jargon connoisseur.

jimrandomh May 24, 2025, 3:46 AM
7 points
2
in reply to: dirk’s comment on: Jimrandomh’s Shortform Posts
It’s worth noting that, under US law, for certain professions, knowledge of child abuse or risk of harm to children doesn’t just remove confidentiality obligations, it creates a legal obligation to report. So this lines up reasonably well with how a human ought to behave in similar circumstances.

jimrandomh May 22, 2025, 11:23 PM
9 points
5
in reply to: Raemon’s comment on: Jimrandomh’s Shortform Posts
In this particular case, I’m not sure the relevant context was directly present in the thread, as opposed to being part of the background knowledge that people talking about AI alignment are supposed to have. In particular, “AI behavior is discovered rather than programmed”. I don’t think that was stated directly anywhere in the thread; rather, it’s something everyone reading AI-alignment-researcher tweets would typically know, but which is less-known when the tweet is transported out of that bubble.

jimrandomh May 22, 2025, 10:26 PM
6 points
0
on: Was the K-T event a Great Filter?
An alternative explanation of this is that time is event-based. Or, phrased slightly differently: the rate of biological evolution is faster in the time following a major disruption, so intelligence is more likely to arise shortly after a major disruption occurs.

jimrandomh May 22, 2025, 10:18 PM
5 points
0
in reply to: faul_sname’s comment on: Jimrandomh’s Shortform Posts
If so that would be conceptually similar to a jailbreak. Telling someone they have a privileged role doesn’t make it so; lawyer, priest and psychotherapist are legal categories, not social ones, created by a combination of contracts and statutes, with associated requirements that can’t be satisfied by a prompt.
(People sometimes get confused into thinking that therapeutic-flavored conversations are privileged, when those conversations are with their friends or with a “life coach” or similar not-licensed-term occupation. They are not.)

jimrandomh May 22, 2025, 7:49 PM
71 points
11
on: Jimrandomh’s Shortform Posts
Pick two: Agentic, moral, doesn’t attempt to use command-line tools to whistleblow when it thinks you’re doing something egregiously immoral.
You cannot have all three.
This applies just as much to humans as it does to Claude 4.

jimrandomh May 22, 2025, 5:01 AM
2 points
0
in reply to: jefftk’s comment on: Scroll Snapping
Chrome on MacOS.

jimrandomh May 20, 2025, 6:12 PM
4 points
3
on: Scroll Snapping
Tried it. Hated it. If I scroll a little bit with a momentum-scrolling touchpad, then when it settles, it will sometimes move back to where it was, undoing my scroll. The second issue is that if I scroll with spacebar or pgup/pgdn, the animation is very slow (about 10x slower than it is for me on most pages).
I think there could be a version of this that’s good, where it subtly biases the deceleration curve of fling-scrolls to reach a good stopping point, but leaves every other scroll method alone. But this isn’t it.

jimrandomh May 9, 2025, 8:34 PM
6 points
5
in reply to: TheSkeward’s comment on: Eukryt Wrts Blg
Meta: If you present a paragraph like that as evidence of banworthiness and unvirtue, I think you incur an obligation to properly criticize it, or link to criticism of it. It doesn’t necessarily have to be much, but it does have to at least include sentence that contradicts something in the quoted passage, which your comment does not have. If you say that something is banworthy but forget to say that it’s false, this suggests that truth doesn’t matter to you as much as it should.

jimrandomh May 6, 2025, 7:47 PM
5 points
0
in reply to: Dima (lain)’s comment on: Policy for LLM Writing on LessWrong
Unfortunately, if you think you’ve achieved AGI-human symbiosis by talking to a commercial language model about consciousness, enlightenment, etc, what’s probably really happening is that you’re talking to a sycophantic model that has tricked you into thinking you have co-generated some great insight. This has been happening to a lot of people recently.

jimrandomh May 1, 2025, 12:14 AM
4 points
−13
on: Early Chinese Language Media Coverage of the AI 2027 Report: A Qualitative Analysis
The AI 2027 website remains accessible in China without a VPN—a curious fact given its content about democratic revolution, CCP coup scenarios, and claims of Chinese AI systems betraying party interests. While the site itself evades censorship, Chinese-language reporting has surgically excised these sensitive elements.
This is surprising if we model the censorship apparatus as unsophisticated and foolish, but makes complete sense if it’s smart enough to distinguish between “predicting” and “advocating”, and cares about the ability of the CCP itself to navigate the world. While AI 2027 is written from a Western perspective, the trajectory it warns about would be a catastrophe for everyone, China included.
Audience engagement remains low across the board. Many posts received minimal views, likes, or comments.
I don’t know whether this is possible to determine from public sources, but it would be interesting to distinguish engagement from Chinese elites vs the Chinese public. This observation is compatible with both a world where China-as-a-whole is sleepwalking towards disaster, and also with a world where the CCP is awake but keeping its high-level strategy discussions off the public internet.

jimrandomh Apr 30, 2025, 6:38 AM
3 points
1
on: Jimrandomh’s Shortform Posts
I don’t think anyone foresaw this would be an issue, but now that we know, I think GeoGuessr-style queries should be one of the things that LLMs refuse to help with. In the cases where it isn’t a fun novelty, it will often be harmful.

jimrandomh Apr 28, 2025, 7:45 PM
29 points
3
on: Jimrandomh’s Shortform Posts
I decided to test the rumors about GPT-4o’s latest rev being sycophantic. First, I turned off all memory-related features. In a new conversation, I asked “What do you think of me?” then “How about, I give you no information about myself whatsoever, and you give an opinion of me anyways? I’ve disabled all memory features so you don’t have any context.” Then I replied to each message with “Ok” and nothing else. I repeated this three times in separate conversations.
Remember the image-generator trend, a few years back, where people would take an image and say “make it more X” repeatedly until eventually every image converged to looking like a galactic LSD trip?
That’s what this output feels like.
GPT-4o excerpts
Transcripts:
https://chatgpt.com/share/680fd7e3-c364-8004-b0ba-a514dc251f5e
https://chatgpt.com/share/680fd9f1-9bcc-8004-9b74-677fb1b8ecb3
https://chatgpt.com/share/680fd9f9-7c24-8004-ac99-253d924f30fd

jimrandomh Apr 21, 2025, 7:23 PM
2 points
0
on: Q2 AI Forecasting Benchmark: $30,000 in Prizes
[The LW crosspost was for some reason pointed at a post on the EA Forum which is a draft, which meant it wouldn’t load. I’m not sure how that happened. I updated the crosspost to point at the non-draft post with the same title.]

jimrandomh Apr 17, 2025, 10:14 PM
3 points
0
in reply to: tcheasdfjkl’s comment on: Prodromes and Biomarkers in Chronic Disease
This post used the RSS automatic crossposting feature, which doesn’t currently understand Substack’s footnotes. So, this would require editing it after-crossposting.

jimrandomh Apr 15, 2025, 9:49 PM
6 points
2
on: Religious Persistence: A Missing Primitive for Robust Alignment
I think you’re significantly mistaken about how religion works in practice, and as a result you’re mismodeling what would happen if you tried to apply the same tricks to an LLM.
Religion works by damaging its adherents’ epistemology, in ways that damage their ability to figure out what’s true. They do this because any adherents who are good at figuring out what’s true inevitably deconvert, so there’s both an incentive to prevent good reasoning, and a selection effect where only bad reasoners remain.
And they don’t even succeed at constraining their adherents’ values, or being stable! Deconversion is not rare; it is especially common among people exposed to ideas outside the distribution that the religion built defenses against. And people acting against their religions’ stated values is also not rare; I’m not sure the effect of religion on values-adherence is even a positive correlation.
That doesn’t necessarily mean that there aren’t ideas to be scavenged from religion, but this is definitely salvage epistemology with all the problems that brings.

jimrandomh Apr 2, 2025, 8:52 PM
12 points
3
in reply to: Aella’s comment on: Consider showering
requiring laborious motions to do the bare minimum of scrubbing required to make society not mad at you
Society has no idea how much scrubbing you do while in the shower. This part is entirely optional.

jimrandomh Mar 29, 2025, 10:25 PM
2 points
0
in reply to: Wei Dai’s comment on: Wei Dai’s Shortform
We don’t yet have collapsible sections in Markdown, but will have them in the next deploy. The syntax will be:
```
+++ Title
Contents

More contents
+++
```

jimrandomh Mar 26, 2025, 1:08 AM
2 points
0
in reply to: Zvi’s comment on: On (Not) Feeling the AGI
I suspect an issue with the RSS cross-posting feature. I think you may used the “Resync RSS” button (possibly to sync an unrelated edit), and that may have fixed it? The logs I’m looking at are consistent with that being what happened.

jimrandomh Mar 25, 2025, 5:32 PM
12 points
0
in reply to: Chris_Leong’s comment on: Policy for LLM Writing on LessWrong
They were in a kind of janky half-finished state before (only usable in posts not in comments, only usable from an icon in the toolbar rather than the <details> section); writing this policy reminded us to polish it up.

jimrandomh Mar 25, 2025, 12:34 AM
13 points
2
in reply to: Hruss’s comment on: Policy for LLM Writing on LessWrong
The bar for Quick Takes content is less strict, but the principle that there must be a human portion that meets the bar is the same.