niplav

Karma: 4,159

I operate by Crocker’s rules.

Website.

niplav Apr 21, 2025, 9:57 AM
4 points
0
in reply to: ksvanhorn’s comment on: ksvanhorn’s Shortform
The idea behind these reviews is that they’re done with a full year of hindsight, evaluating posts at the end of the year could bias towards posts from later in the year (results from November & December), and focus too much on ephemeral trends at the time (like specific (geo)-political events).

niplav Apr 18, 2025, 10:08 AM
9 points
2
in reply to: robo’s comment on: jacquesthibs’s Shortform
Yes, this is me riffing on a popular tweet about coyotes and cats. But it is a pattern that organizations get/extract funding from the EA ecosystem (which has as a big part of its goal to prevent AI takeover) or get talent from EA and then go on to accelerate that development (e.g. OpenAI, Anthropic, now Mechanize Work).

niplav Apr 17, 2025, 11:08 PM
2 points
0
in reply to: Viliam’s comment on: niplav’s Shortform
Hm, good point. I’ll amend the previous post.

niplav Apr 17, 2025, 8:45 PM
3 points
0
in reply to: β-redex’s comment on: niplav’s Shortform
Ethical concerns here are not critical imho, especially if one only listens to the recording oneself and deletes them afterwards.

People will be mad if you don’t tell them, but if you actually don’t share it and delete it after a short time afterwards I don’t think you’d be doing anything wrong.

niplav Apr 17, 2025, 8:39 PM
2 points
0
in reply to: Randaly’s comment on: niplav’s Shortform
Sorry, can’t share the exact chat, that’d depseudonymize me. The prompts were:

What is a canary string? […]
What is the BIG-bench canary string?

Which resulted in the model outputting the canary string in its message.

niplav Apr 17, 2025, 7:33 PM
108 points
51
in reply to: jacquesthibs’s comment on: jacquesthibs’s Shortform
“My funder friend told me his alignment orgs keep turning into capabilities orgs so I asked how many orgs he funds and he said he just writes new RFPs afterwards so I said it sounds like he’s just feeding bright-eyed EAs to VCs and then his grantmakers started crying.”

niplav Apr 17, 2025, 4:59 PM
2 points
0
in reply to: niplav’s comment on: niplav’s Shortform
Fun: Sonnet 3.7 also know the canary string, but believes that that’s good, and defends it when pushed.

niplav Apr 17, 2025, 4:25 PM
4 points
0
in reply to: Mateusz Bagiński’s comment on: Stupid Question: Why am I getting consistently downvoted?
I think having my real name publicly & searchably associated with scummy behavior would discourage me from doing something, both in terms of future employers & random friends googling, as well as LLMs being trained on the internet.

niplav Apr 17, 2025, 2:55 PM
14 points
0
in reply to: niplav’s comment on: niplav’s Shortform
Instance:

Someone (i.e. me) should look into video self modeling (that is, recording oneself & reviewing the recording afterwards, writing down what went wrong & iterating) as a rationality technique/sub-skill of deliberate practice/feedbackloop-first rationality.

What is the best ratio of engaging in practice vs. reviewing later? How much to spend engaging with recordings of experts?

Probably best suited for physical skills and some social skills (speaking eloquently, being charismatic &c).

niplav Apr 17, 2025, 2:50 PM
43 points
8
on: niplav’s Shortform
Law of one player: Any specific thing you just thought of will never happen^[1] unless you (yes, you specifically) make it happen.
1. ↩︎
  Exceptions in cases where the thing (1) gives the person doing it status, (2) is profitable, (3) gets that person (a) high quality mate(s).

niplav Apr 17, 2025, 1:26 PM
2 points
0
in reply to: p.b.’s comment on: niplav’s Shortform
That would be my main guess as well $_{75 %}$ , but not the overwhelmingly likely option.

niplav Apr 15, 2025, 10:55 AM
6 points
4
in reply to: Zac Hatfield-Dodds’s comment on: Stupid Question: Why am I getting consistently downvoted?
Hm, I have no stake in this bet, but care a lot about having a high trust forum where people can expect others to follow through on lost bets, even with internet strangers. I’m happy enforcing this as a norm, even with hostile-seeming actions, because these kinds of norm transgressions need a Schelling fence.

As far as I can tell from their online personal details (which aren’t too hard to find), they have a day-job at a company that has (by my standards) very high salaries, so my best guess is that the $2k are not a problem. But I can contact MadHatter by email & check.

niplav Apr 14, 2025, 6:08 PM
15 points
2
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
Could you name three examples of people doing non-fake work? Since towardsness to non-fake work is easier to use for aiming than awayness from fake work.

niplav Apr 14, 2025, 5:46 PM
5 points
2
in reply to: Zac Hatfield-Dodds’s comment on: Stupid Question: Why am I getting consistently downvoted?
I feel like this should be more widely publicized as a possible reason for excluding MadHatter from future funding & opportunities in effective altruism/rationality/x-risk, and shaming this kind of behavior openly & loudly. (Potentially to the point of revealing a real-life identity? Not sure about this one.) Reaction is to the behavior of MadHatter, not to anything else.

niplav Apr 13, 2025, 10:49 AM
6 points
2
in reply to: Mateusz Bagiński’s comment on: niplav’s Shortform
I think it’s possible! If it’s used to encode relevant information, then it could be tested by running software engineering benchmarks (e.g. SWE-bench) but removing any trailing whitespace during generation, and checking if the score is lower.

niplav Apr 12, 2025, 11:54 PM
6 points
2
on: niplav’s Shortform
I get a lot of trailing whitespace when using Claude code and variants of Claude Sonnet, more than short tests with base models give me. (Not rigorously tested, yet).

I wonder if the trailing whitespace encodes some information or is just some Constitutional AI/RL artefact.

niplav Apr 12, 2025, 4:11 PM
2 points
0
in reply to: Nikola Jurkovic’s comment on: niplav’s Shortform
Thanks, added.

niplav Apr 12, 2025, 2:05 AM
11 points
2
on: niplav’s Shortform
Reasons for thinking that later TAI would be better:
- General human progress, e.g. increased wealth, wealthier people take fewer risks (aged populations also take fewer risks)
- Specific human progress, e.g. on technical alignment (though the bottleneck may be implementation, much current work is specific to a paradigm), and human intelligence augmentation
- Current time of unusually high geopolitical tension, in a decade PRC is going to be the clear hegemon
Reasons for thinking that sooner TAI would be better:
- AI safety community has an unusually strong influence at the moment, and decided to deploy most of that influence now (more influence in the anglosphere, lab leaders have heard of AI safety ideas/arguments); it might lose that kind of influence and mindshare
- Current paradigm is likely unusually safe (LLMs starting with world-knowledge, non-agentic at first, visible thoughts), later paradigms plausibly much worse $_{65 %}$
- PRC being the hegemon would be bad because of risks from authoritarianism
- Hardware overhangs less likely, leading to a more continuous development

niplav Apr 7, 2025, 2:45 PM
8 points
0
in reply to: mattmacdermott’s comment on: mattmacdermott’s Shortform
Related thought: Having a circular preference may be preferable in terms of energy expenditure/fulfillability, because it can be implemented on a reversible computer and fulfilled infinitely without deleting any bits. (Not sure if this works with instrumental goals.)

niplav Apr 7, 2025, 2:34 PM
2 points
0
in reply to: Jonas Hallgren’s comment on: Meditation and Reduced Sleep Need
Interesting! Are you willing to share the data?

It might be something about polyphasic sleep not being as effective as my oura thinks I go into deep sleep sometimes in deep meditation so inconclusive but most likely a negative data point here.

I’m pretty bearish on polyphasic sleep to be honest. Maybe biphasic sleep, since that may map onto some general mammalian sleep patterns.