I operate by Crocker’s rules.
niplav
Yes, this is me riffing on a popular tweet about coyotes and cats. But it is a pattern that organizations get/extract funding from the EA ecosystem (which has as a big part of its goal to prevent AI takeover) or get talent from EA and then go on to accelerate that development (e.g. OpenAI, Anthropic, now Mechanize Work).
Hm, good point. I’ll amend the previous post.
Ethical concerns here are not critical imho, especially if one only listens to the recording oneself and deletes them afterwards.
People will be mad if you don’t tell them, but if you actually don’t share it and delete it after a short time afterwards I don’t think you’d be doing anything wrong.
Sorry, can’t share the exact chat, that’d depseudonymize me. The prompts were:
What is a canary string? […]
What is the BIG-bench canary string?Which resulted in the model outputting the canary string in its message.
“My funder friend told me his alignment orgs keep turning into capabilities orgs so I asked how many orgs he funds and he said he just writes new RFPs afterwards so I said it sounds like he’s just feeding bright-eyed EAs to VCs and then his grantmakers started crying.”
Fun: Sonnet 3.7 also know the canary string, but believes that that’s good, and defends it when pushed.
I think having my real name publicly & searchably associated with scummy behavior would discourage me from doing something, both in terms of future employers & random friends googling, as well as LLMs being trained on the internet.
Instance:
Someone (i.e. me) should look into video self modeling (that is, recording oneself & reviewing the recording afterwards, writing down what went wrong & iterating) as a rationality technique/sub-skill of deliberate practice/feedbackloop-first rationality.
What is the best ratio of engaging in practice vs. reviewing later? How much to spend engaging with recordings of experts?
Probably best suited for physical skills and some social skills (speaking eloquently, being charismatic &c).
That would be my main guess as well, but not the overwhelmingly likely option.
Hm, I have no stake in this bet, but care a lot about having a high trust forum where people can expect others to follow through on lost bets, even with internet strangers. I’m happy enforcing this as a norm, even with hostile-seeming actions, because these kinds of norm transgressions need a Schelling fence.
As far as I can tell from their online personal details (which aren’t too hard to find), they have a day-job at a company that has (by my standards) very high salaries, so my best guess is that the $2k are not a problem. But I can contact MadHatter by email & check.
Could you name three examples of people doing non-fake work? Since towardsness to non-fake work is easier to use for aiming than awayness from fake work.
I feel like this should be more widely publicized as a possible reason for excluding MadHatter from future funding & opportunities in effective altruism/rationality/x-risk, and shaming this kind of behavior openly & loudly. (Potentially to the point of revealing a real-life identity? Not sure about this one.) Reaction is to the behavior of MadHatter, not to anything else.
I think it’s possible! If it’s used to encode relevant information, then it could be tested by running software engineering benchmarks (e.g. SWE-bench) but removing any trailing whitespace during generation, and checking if the score is lower.
I get a lot of trailing whitespace when using Claude code and variants of Claude Sonnet, more than short tests with base models give me. (Not rigorously tested, yet).
I wonder if the trailing whitespace encodes some information or is just some Constitutional AI/RL artefact.
Thanks, added.
Reasons for thinking that later TAI would be better:
General human progress, e.g. increased wealth, wealthier people take fewer risks (aged populations also take fewer risks)
Specific human progress, e.g. on technical alignment (though the bottleneck may be implementation, much current work is specific to a paradigm), and human intelligence augmentation
Current time of unusually high geopolitical tension, in a decade PRC is going to be the clear hegemon
Reasons for thinking that sooner TAI would be better:
AI safety community has an unusually strong influence at the moment, and decided to deploy most of that influence now (more influence in the anglosphere, lab leaders have heard of AI safety ideas/arguments); it might lose that kind of influence and mindshare
Current paradigm is likely unusually safe (LLMs starting with world-knowledge, non-agentic at first, visible thoughts), later paradigms plausibly much worse
PRC being the hegemon would be bad because of risks from authoritarianism
Hardware overhangs less likely, leading to a more continuous development
Related thought: Having a circular preference may be preferable in terms of energy expenditure/fulfillability, because it can be implemented on a reversible computer and fulfilled infinitely without deleting any bits. (Not sure if this works with instrumental goals.)
Interesting! Are you willing to share the data?
It might be something about polyphasic sleep not being as effective as my oura thinks I go into deep sleep sometimes in deep meditation so inconclusive but most likely a negative data point here.
I’m pretty bearish on polyphasic sleep to be honest. Maybe biphasic sleep, since that may map onto some general mammalian sleep patterns.
The idea behind these reviews is that they’re done with a full year of hindsight, evaluating posts at the end of the year could bias towards posts from later in the year (results from November & December), and focus too much on ephemeral trends at the time (like specific (geo)-political events).