Technical staff at Anthropic (views my own), previously #3ainstitute; interdisciplinary, interested in everything, ongoing PhD in CS, bets tax bullshit, open sourcerer, more at zhd.dev
Zac Hatfield-Dodds
I’d run the numbers for higher-throughput, lower-filtration filters—see eg cleanairkits writeup—but this looks great!
Hey @MadHatter—Eliezer confirms that I’ve won our bet.
I ask that you donate my winnings to GiveWell’s All Grants fund, here, via credit card or ACH (preferred due to lower fees). Please check the box for “I would like to dedicate this donation to someone” and include zac@zhd.dev as the notification email address so that I can confirm here that you’ve done so.
IMO “major donors won’t fund this kind of thing” is a pretty compelling reason to look into it, since great opportunities which are illegible or structurally-hard-to-fund definitely exist (as do illegible-or-etc terrible options; do your diligence). On the other hand I’m pretty nervous about the community dynamics that emerge when you’re granting money and also socially engaged with and working in the field. Caveat donor!
I think your argument also has to establish that the cost of simulating any that happen to matter is also quite high.
My intuition is that capturing enough secondary mechanisms, in sufficient-but-abstracted detail that the simulated brain is behaviorally normal (e.g. a sim of me not-more-different than a very sleep-deprived me), is likely to be both feasible by your definition and sufficient for consciousness.
Why do you focus on this particular guy?
Because I saw a few posts discussing his trades, vs none for anyone else’s, which in turn is presumably because he moved the market by ten percentage points or so. I’m not arguing that this “should” make him so salient, but given that he was salient I stand by my sense of failure.
https://www.cleanairkits.com/products/luggables is basically one side of a Corsi-Rosenthal box, takes up very little floor space if placed by a wall, and is quiet, affordable, and effective.
SQLite is ludicrously well tested; similar bugs in other databases just don’t get found and fixed.
I don’t remember anyone proposing “maybe this trader has an edge”, even though incentivising such people to trade is the mechanism by which prediction markets work. Certainly I didn’t, and in retrospect it feels like a failure not to have had ‘the multi-million dollar trader might be smart money’ as a hypothesis at all.
(4) is infeasible, because voting systems are designed so that nobody can identify which voter cast which vote—including that voter. This property is called “coercion resistance”, which should immediately suggest why it is important!
I further object that any scheme to “win” an election by invalidating votes (or preventing them, etc) is straightforwardly unethical and a betrayal of the principles of democracy. Don’t give the impression that this is acceptable behavior, or even funny to joke about.
let’s not kill the messenger, lest we run out of messengers.
Unfortunately we’re a fair way into this process, not because of downvotes[1] but rather because the comments are often dominated by uncharitable interpretations that I can’t productively engage with.[2]. I’ve had researchers and policy people tell me that reading the discussion convinced them that engaging when their work was discussed on LessWrong wasn’t worth the trouble.
I’m still here, sad that I can’t recommend it to many others, and wondering whether I’ll regret this comment too.
- ↩︎
I also feel there’s a double standard, but don’t think it matters much. Span-level reacts would make it a lot easier to tell what people disagree with though.
- ↩︎
Confidentiality makes any public writing far more effortful than you might expect. Comments which assume ill-faith are deeply unpleasant to engage with, and very rarely have any actionable takeaways. I’ve written and deleted a lot of other stuff here, and can’t find an object-level description that I think is worth posting, but there are plenty of further reasons.
- ↩︎
I’d find the agree/disagree dimension much more useful if we split out “x people agree, y disagree”—as the EA Forum does—rather than showing the sum of weighted votes (and total number on hover).
I’d also encourage people to use the other reactions more heavily, including on substrings of a comment, but there’s value in the anonymous dis/agree counts too.
(2) ✅ … The first is from Chief of Staff at Anthropic.
The byline of that piece is “Avital Balwit lives in San Francisco and works as Chief of Staff to the CEO at Anthropic. This piece was written entirely in her personal capacity and does not reflect the views of Anthropic.”
I do not think this is an appropriate citation for the claim. In any case, They publicly state that it is not a matter of “if” such artificial superintelligence might exist, but “when” simply seems to be untrue; both cited sources are peppered with phrases like ‘possibility’, ‘I expect’, ‘could arrive’, and so on.
If grading I’d give full credit for (2) on the basis of “documents like these” referring to Anthopic’s constitution + system prompt and OpenAI’s model spec, and more generous partials for the others. I have no desire to litigate details here though, so I’ll leave it at that.
Proceeding with training or limited deployments of a “potentially existentially catastrophic” system would clearly violate our RSP, at minimum the commitment to define and publish ASL-4-appropriate safeguards and conduct evaluations confirming that they are not yet necessary. This footnote is referring to models which pose much lower levels of risk.
And it seems unremarkable to me for a US company to ‘single out’ a relevant US government entity as the recipient of a voluntary non-public disclosure of a non-existential risk.
More importantly, the average price per plate is not just a function of costs, it’s a function of the value that people receive.
No, willingness to pay is (ideally) a function of value, but under reasonable competition the price should approach the cost of providing the meal. “It’s weird” that a city with many restaurants and consumers, easily available information, low transaction costs, lowish barriers to entry, minimal externalities or returns to atypical scale, and good factor mobility (at least for labor, capital, and materials) should still have substantially elevated prices. My best guess is barriers to entry aren’t that low, but mostly that profit-seekers prefer industries with fewer of there conditions!
A more ambitious procedural approach would involve strong third-party auditing.
I’m not aware of any third party who could currently perform such an audit—e.g. METR disclaims that here. We committed to soliciting external expert feedback on capabilities and safeguards reports (RSP §7), and fund new third-party evaluators to grow the ecosystem. Right now though, third-party audit feels to me like a fabricated option rather than lack of ambition.
Thanks Daniel (and Dean) - I’m always glad to hear about people exploring common ground, and the specific proposals sound good to me too.
I think Anthropic already does most of these, as of our RSP update this morning! While I personally support regulation to make such actions universal and binding, I’m glad that we have voluntary commitments in the meantime:
Disclosure of in-development capabilities—in section 7.2 (Transparency and External Input) of our updated RSP, we commit to public disclosures for deployed models, and to notify a relevant U.S. Government entity if any model requires stronger protections than the ASL-2 Standard. I think this is a reasonable balance for a unilateral commitment.
Disclosure of training goal / model spec—as you note, Anthropic publishes both the constitution we train with and our system prompts. I’d be interested in also exploring model-spec-style aspirational documents too.
Public discussion of safety cases and potential risks—there’s some discussion in our Core Views essay and RSP; our capability reports and plans for safeguards and future evaluations are published here starting today (with some redactions for e.g. misuse risks).
Whistleblower protections—RSP section 7.1.5 lays out our noncompliance reporting policy, and 7.1.6 a commitment not to use non-disparagement agreements which could impede or discourage publicly raising safety concerns.
Anthropic’s updated Responsible Scaling Policy
This sounds to me like the classic rationalist failure mode of doing stuff which is unusually popular among rationalists, rather than studying what experts or top performers are doing and then adopting the techniques, conceptual models, and ways of working that actually lead to good results.
Or in other words, the primary thing when thinking about how to optimize a business is not being rationalist; it is to succeed in business (according to your chosen definition).
Happily there’s considerable scholarship on business, and CommonCog has done a fantastic job organizing and explaining the good parts. I highly recommend reading and discussing and reflecting on the whole site—it’s a better education in business than any MBA program I know of.
We’ve been in touch, and agreed that MatHatter will make the donation by end of February. I’ll post a final update in this thread when I get the confirmation from GiveWell.