Primarily interested in agent foundations. AI macrostrategy. and enhancement of human intelligence, sanity, and wisdom.
I endorse and operate by Crocker’s rules.
I have not signed any agreements whose existence I cannot mention.
Primarily interested in agent foundations. AI macrostrategy. and enhancement of human intelligence, sanity, and wisdom.
I endorse and operate by Crocker’s rules.
I have not signed any agreements whose existence I cannot mention.
Then we train to match the original model’s output by minimising an MSE loss
I think you wanted
I interpret you as insinuating that not disclosing that it was a project commissioned by industry was strategic.
I’m not necessarily implying that they explicitly/deliberately coordinated on this.
Perhaps there was no explicit “don’t mention OpenAI” policy but there was no “person X is responsible for ensuring that mathematicians know about OpenAI’s involvement” policy either.
But given that some of the mathematicians haven’t heard a word about OpenAI’s involvement from the Epoch team, it seems like Epoc at least had a reason not to mention OpenAI’s involvement (though this depends on how extensive communication between the two sides was). Possibly because they were aware of how they might react, both before the project started, as well as in the middle of it.
[ETA: In short, I would have expected this information to reach the mathematicians with high probability, unless the Epoch team had been disinclined to inform the mathematicians.]
Obviously, I’m just speculating here and the non-Epoch mathematicians involved in creation of FrontierMath know better than whatever I might speculate out of this.
The analogy is that I consider living for eternity to be scary, and you say, “well, you can stop any time”. True, but it’s always going to be rational for me to live for one more year, and that way lies eternity.
The distinction you want is probably not rational/irrational but CDT/UDT or whatever,
Also,
insurance against the worst outcomes lasting forever
well, it’s also insurance against the best outcomes lasting forever (though you’re probably going to reply that bad outcomes are more likely than good outcomes and/or that you care more about preventing bad outcomes than ensuring good outcomes)
Our agreement did not prevent us from disclosing to our contributors that this work was sponsored by an AI company. Many contributors were unaware of these details, and our communication with them should have been more systematic and transparent.
So… why did they not disclose to their contributors that this work was sponsored by an AI company?
Specifically, not just any AI company, but the AI company that has (deservedly) perhaps the worst rep among all the frontier AI companies.[1]
I can’t help but think that some of the contributors would decline the offer to contribute had they been told that it was sponsored by an AI capabilities company.
I can’t help but think that many more would decline the offer had they been told that it was sponsored by OpenAI specifically.
I can’t help but think that this is the reason why they were not informed.
Though Meta also has a legitimate claim to having the worst rep, albeit with different axes of worseness contributing to their overall score.
When’s the application deadline?
This is not quite deathism but perhaps a transition in the direction of “my own death is kinda not as bad”:
a big motivator for me used to be some kind of fear of death. But then I thought about philosophy of personal identity until I shifted to the view that there’s probably no persisting identity over time anyway and in some sense I probably die and get reborn all the time in any case.
and in a comment:
I’m clearly doing things that will make me better off in the future. I just feel less continuity to the version of me who might be alive fifty years from now, so the thought of him dying of old age doesn’t create a similar sense of visceral fear. (Even if I would still prefer him to live hundreds of years, if that was doable in non-dystopian conditions.)
to the extent this is feasible for us
Was [keeping FrontierMath entirely private and under Epoch’s control] feasible for Epoch in the same sense of “feasible” you are using here?
Strong agree.
For a more generalized version, see: https://www.lesswrong.com/posts/4gDbqL3Tods8kHDqs/limits-to-legibility
(caveat they initially distil from a much larger model, which I see as a little bit of a cheat)
Another little bit of a cheat is that they only train Qwen2.5-Math-7B according to the procedure described. In contrast, for the other three models (smaller than Qwen2.5-Math-7B), they instead use the fine-tuned Qwen2.5-Math-7B to generate the training data to bootstrap round 4. (Basically, they distill from DeepSeek in round 1 and then they distill from fine-tuned Qwen in round 4.)
They justify:
Due to limited GPU resources, we performed 4 rounds of self-evolution exclusively on Qwen2.5-Math-7B, yielding 4 evolved policy SLMs (Table 3) and 4 PPMs (Table 4). For the other 3 policy LLMs, we fine-tune them using step-by-step verified trajectories generated from Qwen2.5-Math-7B’s 4th round. The final PPM from this round is then used as the reward model for the 3 policy SLMs.
TBH I’m not sure how this helps them with saving on GPU resources. For some reason it’s cheaper to generate a lot of big/long rollouts with the Qwen2.5-Math-7B-r4 than three times with [smaller model]-r3?)
I donated $1k.
Lighthaven is the best venue I’ve been to. LessWrong is the best place on the internet that I know of and it hosts an intellectual community that was crucial for my development as a thinker and greatly influenced my life decisions over the last 3 years.
I’m grateful for it.
I wish you all the best and hope to see you flourish and prosper.
The vNM axioms constrain the shape of an agent’s preferences, they say nothing about how to make decisions
Suppose your decision in a particular situation comes down to choosing between some number of lotteries (with specific estimated probabilities over their outcomes) and there’s no complexity/nuance/tricks on top of that. In that case, vNM says that you should choose the one with the highest expected utility as this is the one you prefer the most.
At least assuming that choice is the right operationalization of preferences but if it isn’t, then the Dutch book / money-pump arguments don’t follow.
ETA: I guess I could just say:
What are your preferences if not your idealized evaluations of decision-worthiness of options (modulo “being a corrupted piece of software running on corrupted hardware”)?
1. Introduce third-party mission alignment red teaming.
Anthropic should invite external parties to scrutinize and criticize Anthropic’s instrumental policy and specific actions based on whether they are actually advancing Anthropic’s stated mission, i.e. safe, powerful, and beneficial AI.
Tentatively, red-teaming parties might include other AI labs (adjusted for conflict of interest in some way?), as well as AI safety/alignment/risk-mitigation orgs: MIRI, Conjecture, ControlAI, PauseAI, CEST, CHT, METR, Apollo, CeSIA, ARIA, AI Safety Institutes, Convergence Analysis, CARMA, ACS, CAIS, CHAI, &c.
For the sake of clarity, each red team should provide a brief on their background views (something similar to MIRI’s Four Background Claims).
Along with their criticisms, red teams would be encouraged to propose somewhat specific changes, possibly ordered by magnitude, with something like “allocate marginally more funding to this” being a small change and “pause AGI development completely” being a very big change. Ideally, they should avoid making suggestions that include the possibility of making a small improvement now that would block a big improvement later (or make it more difficult).
Since Dario seems to be very interested in “race to the top” dynamics: if this mission alignment red-teaming program successfully signals well about Anthropic, other labs should catch up and start competing more intensely to be evaluated as positively as possible by third parties (“race towards safety”?).
It would also be good to have a platform where red teams can converse with Anthropic, as well as with each other, and the logs of their back-and-forth are published to be viewed by the public.
Anthropic should commit to taking these criticisms seriously. In particular, given how large the stakes are, they should commit to taking something like “many parties believe that Anthropic in its current form might be net-negative, even increasing the risk of extinction from AI” as a reason to pause or slow down, even if that’s contrary to their inside view.
2. Anthropic should make an explicit statement about its infohazard policy.
This statement should include how Anthropic thinks about and how it handles doing and publishing research that advances AGI development and doesn’t benefit safety/alignment/x-risk reduction to an extent sufficient to offset its contribution to (likely unsafe by default) AGI development.
I wish this was posted as a question, ideally by you together with other Anthropic people, including Dario.
Figure out a way to show users the CoT of reasoning/agent models that you release in the future. (i.e. don’t do what OpenAI did with o1). Doesn’t have to be all of it, just has to be enough—e.g. each user gets 1 CoT view per day.
What would be the purpose of 1 CoT view per user per day?
Where does China fit into this picture
Unlike the West, China enjoys unconditional love from the Heavens. /j
[After I wrote down the thing, I became more uncertain about how much weight to give to it. Still, I think it’s a valid consideration to have on your list of considerations.]
“AI alignment”, “AI safety”, “AI (X-)risk”, “AInotkilleveryoneism”, “AI ethics” came to be associated with somewhat specific categories of issues. When somebody says “we should work (or invest more or spend more) on AI {alignment,safety,X-risk,notkilleveryoneism,ethics}”, they communicate that they are concerned about those issues and think that deliberate work on addressing those issues is required or otherwise those issues are probably not going to be addressed (to a sufficient extent, within relevant time, &c.).
“AI outcomes” is even broader/[more inclusive] than any of the above (the only step left to broaden it even further would be perhaps to say “work on AI being good” or, in the other direction, work on “technology/innovation outcomes”) and/but also waters down the issue even more. Now you’re saying “AI is not going to be (sufficiently) good by default (with various AI outcomes people having very different ideas about what makes AI likely not (sufficiently) good by default)”.
It feels like we’re moving in the direction of broadening our scope of consideration to (1) ensure we’re not missing anything, and (2) facilitate coalition building (moral trade?). While this is valid, it risks (1) failing to operate on the/an appropriate level of abstraction, and (2) diluting our stated concerns so much that coalition building becomes too difficult because different people/groups endorsing stated concerns have their own interpretations/beliefs/value systems. (Something something find an optimum (but also be ready and willing to update where you think the optimum lies when situation changes)?)
I’m not claiming it’s feasible (within decades). That’s just what a solution might look like.
This is great.
Now, given that you’re already talking about instrumental goals “trying not to step on each other’s toes”, what else would they need to deserve the name of “subagents”?