Technical staff at Anthropic (views my own), previously #3ainstitute; interdisciplinary, interested in everything, ongoing PhD in CS, bets tax bullshit, open sourcerer, more at zhd.dev
Zac Hatfield-Dodds
https://www.cleanairkits.com/products/luggables is basically one side of a Corsi-Rosenthal box, takes up very little floor space if placed by a wall, and is quiet, affordable, and effective.
SQLite is ludicrously well tested; similar bugs in other databases just don’t get found and fixed.
I don’t remember anyone proposing “maybe this trader has an edge”, even though incentivising such people to trade is the mechanism by which prediction markets work. Certainly I didn’t, and in retrospect it feels like a failure not to have had ‘the multi-million dollar trader might be smart money’ as a hypothesis at all.
(4) is infeasible, because voting systems are designed so that nobody can identify which voter cast which vote—including that voter. This property is called “coercion resistance”, which should immediately suggest why it is important!
I further object that any scheme to “win” an election by invalidating votes (or preventing them, etc) is straightforwardly unethical and a betrayal of the principles of democracy. Don’t give the impression that this is acceptable behavior, or even funny to joke about.
let’s not kill the messenger, lest we run out of messengers.
Unfortunately we’re a fair way into this process, not because of downvotes[1] but rather because the comments are often dominated by uncharitable interpretations that I can’t productively engage with.[2]. I’ve had researchers and policy people tell me that reading the discussion convinced them that engaging when their work was discussed on LessWrong wasn’t worth the trouble.
I’m still here, sad that I can’t recommend it to many others, and wondering whether I’ll regret this comment too.
- ↩︎
I also feel there’s a double standard, but don’t think it matters much. Span-level reacts would make it a lot easier to tell what people disagree with though.
- ↩︎
Confidentiality makes any public writing far more effortful than you might expect. Comments which assume ill-faith are deeply unpleasant to engage with, and very rarely have any actionable takeaways. I’ve written and deleted a lot of other stuff here, and can’t find an object-level description that I think is worth posting, but there are plenty of further reasons.
- ↩︎
I’d find the agree/disagree dimension much more useful if we split out “x people agree, y disagree”—as the EA Forum does—rather than showing the sum of weighted votes (and total number on hover).
I’d also encourage people to use the other reactions more heavily, including on substrings of a comment, but there’s value in the anonymous dis/agree counts too.
(2) ✅ … The first is from Chief of Staff at Anthropic.
The byline of that piece is “Avital Balwit lives in San Francisco and works as Chief of Staff to the CEO at Anthropic. This piece was written entirely in her personal capacity and does not reflect the views of Anthropic.”
I do not think this is an appropriate citation for the claim. In any case, They publicly state that it is not a matter of “if” such artificial superintelligence might exist, but “when” simply seems to be untrue; both cited sources are peppered with phrases like ‘possibility’, ‘I expect’, ‘could arrive’, and so on.
If grading I’d give full credit for (2) on the basis of “documents like these” referring to Anthopic’s constitution + system prompt and OpenAI’s model spec, and more generous partials for the others. I have no desire to litigate details here though, so I’ll leave it at that.
Proceeding with training or limited deployments of a “potentially existentially catastrophic” system would clearly violate our RSP, at minimum the commitment to define and publish ASL-4-appropriate safeguards and conduct evaluations confirming that they are not yet necessary. This footnote is referring to models which pose much lower levels of risk.
And it seems unremarkable to me for a US company to ‘single out’ a relevant US government entity as the recipient of a voluntary non-public disclosure of a non-existential risk.
More importantly, the average price per plate is not just a function of costs, it’s a function of the value that people receive.
No, willingness to pay is (ideally) a function of value, but under reasonable competition the price should approach the cost of providing the meal. “It’s weird” that a city with many restaurants and consumers, easily available information, low transaction costs, lowish barriers to entry, minimal externalities or returns to atypical scale, and good factor mobility (at least for labor, capital, and materials) should still have substantially elevated prices. My best guess is barriers to entry aren’t that low, but mostly that profit-seekers prefer industries with fewer of there conditions!
A more ambitious procedural approach would involve strong third-party auditing.
I’m not aware of any third party who could currently perform such an audit—e.g. METR disclaims that here. We committed to soliciting external expert feedback on capabilities and safeguards reports (RSP §7), and fund new third-party evaluators to grow the ecosystem. Right now though, third-party audit feels to me like a fabricated option rather than lack of ambition.
Thanks Daniel (and Dean) - I’m always glad to hear about people exploring common ground, and the specific proposals sound good to me too.
I think Anthropic already does most of these, as of our RSP update this morning! While I personally support regulation to make such actions universal and binding, I’m glad that we have voluntary commitments in the meantime:
Disclosure of in-development capabilities—in section 7.2 (Transparency and External Input) of our updated RSP, we commit to public disclosures for deployed models, and to notify a relevant U.S. Government entity if any model requires stronger protections than the ASL-2 Standard. I think this is a reasonable balance for a unilateral commitment.
Disclosure of training goal / model spec—as you note, Anthropic publishes both the constitution we train with and our system prompts. I’d be interested in also exploring model-spec-style aspirational documents too.
Public discussion of safety cases and potential risks—there’s some discussion in our Core Views essay and RSP; our capability reports and plans for safeguards and future evaluations are published here starting today (with some redactions for e.g. misuse risks).
Whistleblower protections—RSP section 7.1.5 lays out our noncompliance reporting policy, and 7.1.6 a commitment not to use non-disparagement agreements which could impede or discourage publicly raising safety concerns.
Anthropic’s updated Responsible Scaling Policy
This sounds to me like the classic rationalist failure mode of doing stuff which is unusually popular among rationalists, rather than studying what experts or top performers are doing and then adopting the techniques, conceptual models, and ways of working that actually lead to good results.
Or in other words, the primary thing when thinking about how to optimize a business is not being rationalist; it is to succeed in business (according to your chosen definition).
Happily there’s considerable scholarship on business, and CommonCog has done a fantastic job organizing and explaining the good parts. I highly recommend reading and discussing and reflecting on the whole site—it’s a better education in business than any MBA program I know of.
I further suggest that if using these defined terms, instead of including a table of definitions somewhere you include the actual probability range or point estimate in parentheses after the term. This avoids any need to explain the conventions, and makes it clear at the point of use that the author had a precise quantitative definition in mind.
For example: it’s likely (75%) that flipping a pair of fair coins will get less than two heads, and extremely unlikely (0-5%) that most readers of AI safety papers are familiar with the quantitative convention proposed above—although they may (>20%) be familiar with the general concept. Note that the inline convention allows for other descriptions if they make the sentence more natural!
For what it’s worth, I endorse Anthopic’s confidentiality policies, and am confident that everyone involved in setting them sees the increased difficulty of public communication as a cost rather than a benefit. Unfortunately, the unilateralist’s curse and entangled truths mean that confidential-by-default is the only viable policy.
This feels pretty nitpick-y, but whether or not I’d be interested in taking a bet will depend on the odds—in many cases I might take either side, given a reasonably wide spread. Maybe append
at p >= 0.5
to the descriptions to clarify?The shorthand trading syntax “$size @ $sell_percent / $buy_percent” is especially nice because it expresses the spread you’d accept to take either side of the bet, e.g. “25 @ 85⁄15 on rain tomorrow” to offer a bet of $25 dollars, selling if you think probability of rain is >85%, buying if you think it’s <15%. Seems hard to build this into a reaction though!
Locked in! Whichever way this goes, I expect to feel pretty good about both the process and the outcome :-)
Nice! I look forward to seeing how this resolves.
Ah, by ‘size’ I meant the stakes, not the number of locks—did you want to bet the maximum $1k against my $10k, or some smaller proportional amount?
Because I saw a few posts discussing his trades, vs none for anyone else’s, which in turn is presumably because he moved the market by ten percentage points or so. I’m not arguing that this “should” make him so salient, but given that he was salient I stand by my sense of failure.