LessWrong Team
I have signed no contracts or agreements whose existence I cannot mention.
LessWrong Team
I have signed no contracts or agreements whose existence I cannot mention.
Thanks! Sounds good. Yeah, I’ll check for those variants.
Regarding PRS quality, indeed. There’s a table in a collapsible section with an analysis of the quality of PRSs used. Interesting regarding your own conversion from GWAS to PRS.
Your VCF is not enough to calcualte your PRS, because the reference (rather than variant) is the “effect allele” for many scores of interest; the default behaviour of tools like PRSKB CLI and PSG_Calc is to impute the missing alleles based on your VCF but this is going to be wrong, you likely want to re-call variants from your CRAM (this is dicussed at length in the blog post I linked earlier)
Ah, cool, yes. Interestingly Claude/GPT left a comment in its code mentioning exactly this problem, and then punted on it and I didn’t notice.
CNVs are fairly tractable to calculate from CRAMs
We attempted this and it failed because of contig mismatch with the reference on the CRAM. Going back to it, we could have just downloaded the appropriate one? (DRAGEN/Lumina?) Another thing not done for no good reason that I didn’t catch. (Other things I did catch, but not this.)
Cool, that gives me some things to do.
(To clarify, I don’t think that I have default poor executive function, but that it is still possible to have yet more. I do think the ups and downs of bipolar make habits harder to establish.)
I don’t have all my logs fit into the context window, they’d never fit. I ask it to load things when relevant, and handle things via search or other scripts for operating over the past. Todo items is more relevant. They technically fit but it’s costly to have a full-context window so I preload only recent and high-priority ones.
My system does have a bunch of offline processing where raw transcripts get mined and parsed in other documents. In one case, there’s two layers of that.
I’ve been doing a lot of large research projects with LLMs recently. A project might involve 10-20 “Claude Research Mode” runs or equivalent analyses with code, where each one is maxing out context.
The key thing is “session handoff” and “subagent calls”. I do all of this by hand, and I avoid having the harness automatically compact conversations (I don’t like opaque compression, I want to know what’s in context for any given chat instance.)
Whether or not I’m doing it well, I think there’s metis in where and how I split up the work (and have the model split up the work for itself), and for research this is worth working on.
I don’t have the perfect process up yet. But it’s definitely the case there are research projects worth tackling that are 20x as big as context windows, and LLMs are useful, and it’s not a bad process to be involved in the process connect the convos and calls together.
I’d expect the LLM to raise the considerations that matter most, but those are:
Too much oxygen is also a problem. You can both under-dose and overdose.
If you lack experience, you might miss cases where you ought to be going to the ER, and instead you’re self-treating poorly at home.
If you could find a doctor who informally signed off on self-administering, or you really really trust you research skills, then maybe try it, but I’d caution against.
Thanks for the great comment. In my mind, the situation is something like Anthropic is pausing but also hoping to resume eventually, and continues its general approach of saying less rather than more about strategy and what’s going on internally. If that were the case, though, maybe they’d just not do further training runs internally without advertising this fact. One guess is preventing leaks is hard and safety-minded (quite senior) employees might be forcing action, though I admit the pause requires better motivation than the story provides.
Regarding communicable evidence: on the one hand, there are public statements about pausing going along with compelling shareable evidence, but on the other, I think historically it’s just been a pretty secretive company. Many of my conversations with Anthropic employees hit “I can’t talk about that.” It feels hard to see that changing.
Even in the story, it’s not clear the world is on track for meaningful global governance of the kind that would stop OAI/GDM though.
Completely. The point of the story was not to imply that the pausing results in everything working out great, merely that pausing would be a Highly Significant Event with Some Pretty Large Effects. Adequate legislation conditional on significant global legislation efforts feels like < 50% to me, at least.
Parts of the story that are now feeling implausible to me in the reaction are (a) the rapidity of reaction, especially the joint statement, which, if it did happen, would probably take much longer, (b) the conjunction of all those things happening – I think any of them individually or a subset is plausible.
My sense is that China mostly doesn’t believe in x-risk and mainly wants to be freed from US export controls so it can compete fairly,
That seems plausible. Mostly, it seems no one is trying to talk to China and take it for granted that they’ll want to compete and not take x-risk seriously enough to sign a treaty. I’m curious about your sources/evidence here.
Meanwhile, safety research at Anthropic basically collapses because they can no longer afford compute
I’m curious what the cost of ongoing safety research with existing models is if you’re just doing inference. My guess is not nearly as much as the big training runs? Other variables are how long do their products stay frontrunners, and are they profitable vs subsidized by investment still at this point. But that’d be like 6-12 months max I’m guessing. I don’t have the best models here.
It’s something I built. I think theoretically I should be fine open-sourcing it, but it’s not very turnkey and probably 6-12 (??) hours of setup for someone else to use (like setting up DB, Vercel, modifying prompts). Perhaps at some point I’ll get it into a state where it’s not like that.
This is a good comment and the kind of discussion I hoped this post would start.
I find what you say pretty compelling that the nature of the pause isn’t actually realistic to how Anthropic would behave.
To defend the piece though the goal wasn’t to answer “conditioning on an Anthropic pause, what would that pause most likely look like?” and instead “conditioning on a pause, how would the world react?”
The intention is to respond to the claim that unilaterally pausing would be foolish as worse actors would simply move ahead. I think that’s really far from obvious. A unilateral pause would have a huge impact. It could be debated whether that’s better than winning, but I think it’s crazy to speak as though a unilateral pause would do definitely do nothing. If this piece feels at all believable conditioning on a pause (e.g. what would happen if Dario acted like a LW-reader), then it’s succeeded at my goal.
Aside; the lack of mention of the Anthropic/DoW conflict is a hole in the post.
Done!
Please sir, mercy. I’m in Pacific Time and it was 11pm here!
Curated! I think the immense capability and usefulness of current LLMs, and specifically their increasing ability to take over tasks from humans, distracts from the ways in which they are strange minds different from human minds. I like this post for digging into that. It’s known that of course LLMs lack memory and now we give them scratchpads and other files they can reference as a substitute, and yet it’s not the same (as I keep experiencing in my own use). I appreciate this post for digging in and making claims like no amount of context window or scratchpads, etc., substitutes for actual continual learning. Without asserting this is correct, it’s a discussion I like. One reason for that is I think significant and scary things might happen if/when we move beyond current architectures – which are already very capable – to those without these limitations. Good predictions there will come from understanding what is going on with the current models. Kudos. I like this line of work.
So you don’t think they’re enough capability to replace 50-100M jobs[1] in the US over the next 5-10 years? (I think this could happen from just the current generation with better scaffolding/products/diffusion, and even more so if the models continue to improve).
This is measuriing in term of people’s occupations, but could instead weight it by fraction of the economy. I’m not sure how that’ll net out.
@Neel Nanda Not software you can try, but a detailed description: My Exobrain Software (forays into cyborgism)