I won’t comment on your specific startup, but I wonder in general how an AI Safety startup becomes a successful business. What’s the business model? Who is the target customer? Why do they buy? Unless the goal is to get acquired by one of the big labs, in which case, sure, but again, why or when do they buy, and at what price? Especially since they already don’t seem to be putting much effort into solving the problem themselves despite having better tools and more money to do so than any new entrant startup.
AnthonyC
I really, really hope at some point the Democrats will acknowledge the reason they lost is that they failed to persuade the median voter of their ideas, and/or adopt ideas that appeal to said voters. At least among those I interact with, there seems to be a denial of the idea that this is how you win elections, which is a prerequisite for governing.
That seems very possible to me, and if and when we can show whether something like that is the case, I do think it would represent significant progress. If nothing else, it would help tell us what the thing we need to be examining actually is, in a way we don’t currently have an easy way to specify.
If you can strike in a way that prevents retaliation that would, by definition, not be mutually assured destruction.
Correct, which is in part why so much effort went into developing credible second strike capabilities, building up all parts of the nuclear triad, and closing the supposed missile gap. Because both the US and USSR had sufficiently credible second strike capabilities, it made a first strike much less strategically attractive and reduced the likelihood of one occurring. I’m not sure how your comment disagrees with mine? I see them as two sides of the same coin.
If you live in Manhattan or Washington DC today, you basically can assume you will be nuked first, yet people live their lives. Granted people could behave differently under this scenario for non-logical reasons.
My understanding is that in the Cold War, a basic MAD assumption was that if anyone were going to launch a first strike, they’d try to do so with overwhelming force sufficient to prevent a second strike, hitting everything at once.
I agree that consciousness arises from normal physics and biology, there’s nothing extra needed, even if I don’t yet know how. I expect that we will, in time, be able to figure out the mechanistic explanation for the how. But right now, this model very effectively solves the Easy Problem, while essentially declaring the Hard Problem not important. The question of, “Yes, but why that particular qualia-laden engineered solution?” is still there, unexplained and ignored. I’m not even saying that’s a tactical mistake! Sometimes ignoring a problem we’re not yet equipped to address is the best way to make progress towards getting the tools to eventually address it. What I am saying is that calling this a “debunking” is misdirection.
I’ve read this story before, including and originally here on LW, but for some reason this time it got me thinking: I’ve never seen a discussion about what this tradition meant for early Christianity, before the Christians decided to just declare (supposedly after God sent Peter a vision, an argument that only works by assuming the conclusion) that the old laws no longer applied to them? After all, the Rabbi Yeshua ben Joseph (as the Gospels sometimes called him) explicitly declared the miracles he performed to be a necessary reason for why not believing in him was a sin.
We apply different standards of behavior for different types of choices all the time (in terms of how much effort to put into the decision process), mostly successfully. So I read this reply as something like, “Which category of ‘How high a standard should I use?’ do you put ‘Should I lie right now?’ in?”
A good starting point might be: One rank higher than you would for not lying, see how it goes and adjust over time. If I tried to make an effort-ranking of all the kinds of tasks I regularly engage in, I expect there would be natural clusters I can roughly draw an axis through. E.g. I put more effort into client-facing or boss-facing tasks at work than I do into casual conversations with random strangers. I put more effort into setting the table and washing dishes and plating food for holidays than for a random Tuesday. Those are probably more than one rank apart, but for any given situation, I think the bar for lying should be somewhere in the vicinity of that size gap.
One of the factors to consider, that contrasts with old-fashioned hostage exchanges as described, is that you would never allow your nation’s leaders to visit any city that you knew had such an arrangement. Not as a group, and probably not individually. You could never justify doing this kind of agreement for Washington DC or Beijing or Moscow, in the way that you can justify, “We both have missiles that can hit anywhere, including your capital city.” The traditional approach is to make yourself vulnerable enough to credibly signal unwillingness to betray one another, but only enough that there is still a price at which you would make the sacrifice.
Also, consider that compared to the MAD strategy of having launchable missiles, this strategy selectively disincentivizes people from wanting to move to whatever cities were the subject of such agreements, which were probably your most productive and important cities.
It’s a subtle thing. I don’t know if I can eyeball two inches of height.
Not from a picture, but IRL, if you’re 5′11“ and they claim 6′0”, you can. If you’re 5′4“, probably not so much. Which is good, in a sense, since the practical impact of this brand of lying on someone who is 5′4” is very small, whereas unusually tall women may care whether their partner is taller or shorter than they are.
This makes me wonder what the pattern looks like for gay men, and whether their reactions to it and feelings about it are different than straight women.
Lie by default whenever you think it passes an Expected Value Calculation to do so, just as for any other action.
How do you propose to approximately carry out such a process, and how much effort do you put into pretending to do the calculation?
I’m not as much a stickler/purist/believer in honest-as-always-good as many around here, I think there are many times that deception of some sort is a valid, good, or even morally required choice. I definitely think e.g. Kant was wrong about honesty as a maxim, even within his own framework. But, in practice, I think your proposed policy sets much too low a standard, and in practice the gap between what you proposed vs “Lie by default whenever it passes an Expected Value Calculation to do so, just as for any other action,” is enormous in both the theoretical defensibility, and in the skillfulness (and internal levels of honesty and self-awareness) required to successfully execute it.
I personally wouldn’t want to do a PhD that didn’t achieve this!
Agreed. It was somewhere around reason #4 I quit my PhD program as soon as I qualified for a masters in passing.
Any such question has to account for the uncertainty about what US trade policies and tariffs will be tomorrow, let alone by the time anyone currently planning a data center will actually be finished building it.
Also, when you say offshore, do you mean in other countries, or actually in the ocean? Assuming the former, I think that would imply using the data center by anyone in the US would be an import of services. If this starting happening at scale, I would expect the current administration to immediately begin applying tariffs to those services.
@Garrett Baker Yes electronics are exempt (for now?) but IIUC all the other stuff (HVAC, electrical, etc.) that goes into the data center is not, and that’s often a majority or at least a high proportion of total costs.
Do you really expect that the project would then fail at the “getting funded”/”hiring personnel” stages?
Not at all, I’d expect them to get funded and get people. Plausibly quite well, or at least I hope so!
But when I think about paths by which such a company shapes how we reach AGI, I find it hard to see how that happens unless something (regulation, hitting walls in R&D, etc.) either slows the incumbents down or else causes them to adopt the new methods themselves. Both of which are possible! I’d just hope anyone seriously considering pursuing such a venture has thought through what success actually looks like.
“Independently develop AGI through different methods before the big labs get there through current methods” is a very heavy lift that’s downstream of but otherwise almost unrelated to “Could this proposal work if pursued and developed enough?”
I think, “Get far enough fast enough to show it can work, show it would be safer, and show it would only lead to modest delays, then find points of leverage to get the leaders in capabilities to use it, maybe by getting acquired at seed or series A” is a strategy not enough companies go for (probably because VCs don’t think its as good for their returns).
You’re right, but creating unexpected new knowledge is not a PhD requirement. I expect it’s pretty rare that a PhD students achieves that level of research.
It wasn’t a great explanation, sorry, and there are definitely some leaps, digressions, and hand-wavy bits. But basically: Even if current AI research were all blind mutation and selection, we already know that that can yield general intelligence from animal-level-intelligence because evolution did it. And we already have various examples of how human research can apply much greater random and non-random mutation, larger individual changes, higher selection pressure in a preferred direction, and more horizontal transfer of traits than evolution can, enabling (very roughly estimated) ~3-5 OOMs greater progress per generation with fewer individuals and shorter generation times.
Saw your edit above, thanks.
I’m not a technical expert by any means, but given what I’ve read I’d be surprised if that kind of research were harmful. Curious to hear what others say.
I recently had approximately this conversation with my own employer’s HR department. We’re steadily refactoring tasks to find what can be automated, and it’s a much larger proportion of what our entry-level hires do. Current AI is an infinite army of interns we manage, three years ago they were middle school age interns and now they’re college or grad school interns. At some point, we don’t know when, actually adding net economic value will require having the kinds of skills that we currently expect people to take years to build. This cuts off the pipeline of talent, because we can’t afford to pay people for years before getting anything in return. Luckily (?) that is a temporary state of affairs until the AI automates the next levels away too, and the entire human economy disappears up its own orifices long before most of use would have retired.
In the intervening months or years, though, I expect a lot of finger-pointing and victim-blaming and general shaming from those who don’t understand what’s going on, just as I recall happening to many of my friends around my own college graduation in 2009 in the midst of a global recession. “No, mom, there’s literally no longer any field hiring anyone with less than a decade of experience. No, even if I wanted to go back to school, there’s a thousands times as many applicants as spots now, and most of those that get accepted will find the fields they picked are gone by the time they graduate and they have even more non-dischargeable debt. Sorry, but yes, I have to move back in with you. Also, most likely in a year or five you and dad will get fired and we’ll all be living off grandma’s savings that are growing at 80% a year.”
I also don’t have a principled reason to expect that particular linear relationship, except in general in forecasting tech advancements, I find that a lot of such relationships seem to happen and sustain themselves for longer than I’d expect given my lack of principled reasons for them.
I did just post another comment reply that engages with some things you said.
To the first argument: I agree with @Chris_Leong’s point about interest rates constituting essentially zero evidence, especially compared to the number of data points on the METR graph.
To the second: I do not think the PhD thesis is a fair comparison. That is not a case where we expect anyone to successfully complete a task on their own. PhD students, post-docs, and professional researchers break a long task into many small ones, receive constant feedback, and change course in response to intermediate successes and failures. I don’t think there are actually very many tasks en route to a PhD tat can’t be broken down into predictable, well defined subtasks that take less than a month, and the task of doing the breaking down is itself a fairly short-time-horizon task that gets periodically revised. Even still, many PhD theses end up being, “Ok, you’ve done enough total work, how do we finagle these papers into a coherent narrative after the fact?” Plus, overall, PhD students, those motivated to go to grad school with enough demonstrated ability to get accepted into PhD programs, fail to get a PhD close to half the time even with all that.
I imagine you could reliably complete a PhD in many fields with a week-long time horizon, as long as you get good enough weekly feedback from a competent advisor. 1: Talk to advisor about what it takes to get a PhD. 2: Divide into a list of <1 week-long tasks. 3) Complete task 1, get feedback, revise list. 4) Either repeat the current task or move on to the new next task, depending on feedback. 5) Loop until complete. 5a) Every ten or so loops, check overall progress to date against the original requirements. Evaluate whether overall pace of progress is acceptable. If not, come up with possible new plans and get advisor feedback.
As far as not believing the current paradigm could reach AGI, which paradigm do you mean? I don’t think “random variation and rapid iteration” is a fair assessment of the current research process. But even if it were, what should I do with that information? Well, luckily we have a convenient example of what it takes for blind mutations with selection pressure to raise intelligence to human levels: us! I am pretty confident saying that current LLMs would outperform, say, Australopithecus, on any intellectual ability, but not Home sapiens. So that happens in a few million years, let’s say 200k generations of 10-100k individuals each, in which intelligence was one of many, many factors weakly driving selection pressure with at most a small number of variations per generation. I can’t really quantify how much human intelligence and directed effort speed up progress compared to blind chance, but consider that 1) a current biology grad student can do things with genetics in an afternoon that evolution needs thousands of generations and millions of individuals or more to do, and 2) the modern economic growth rate, essentially a sum of the impacts of human insight on human activity, is around 15000x faster than it was in the paleolithic. Naively extrapolated, this outside view would tell me that science and engineering can take us from Australopithecus-level to human-level in about 13 generations (unclear which generation we’re on now). The number of individuals needed per generation is dependent on how much we vary each individual, but plausibly in the single or double digits.
My disagreement with your conclusion from your third objection is that scaling inference time compute increases performance within a generation, but that’s not how the iteration goes between generations. We use reasoning models with more inference time compute to generate better data to train better base models to more efficiently reproduce similar capability levels with less compute to build better reasoning models. So if you build the first superhuman coder and find it’s expensive to run, what’s the most obvious next step in the chain? Follow the same process as we’ve been following for reasoning models and if straight lines on graphs hold, then six months later we’ll plausibly have one that’s a tenth the cost to run. Repeat again for the next six months after that.
Personally I think 2030 is possible but aggressive, and my timeline estimate it more around 2035. Two years ago I would have said 2040 or a bit later, and capabilities gains relevant to my own field and several others I know reasonably well have shortened that, along with the increase in funding for further development.
The Claude/Pokemon thing is interesting, and overall Pokemon-playing trend across Anthropic’s models is clearly positive. I can’t say I had any opinion at all about how far along an LLM would get at Pokemon before that result got publicized, so I’m curious if you did. What rate of progress on that benchmark would you expect in a short-timelines world? If there’s an LLM agent that can beat Pokemon in six months, or a year, or two years?
Self-driving vehicles are already more of a manufacturing and regulatory problem than a technical one. For example, as long as the NHTSA only lets manufacturers deploy 2500 self-driving vehicles a year each in the US, broad adoption cannot happen, regardless of technical capabilities or willingness to invest and build.
I also don’t think task length is a perfect metric. But it’s a useful one, a lower bound on what’s needed to be able to complete all human-complete intellectual tasks. Like everything else to date, there is likely something else to look at as we saturate the benchmark.
I agree novel insights (or more of them, I can’t say there haven’t been any) will be strong evidence. I don’t understand the reason for thinking this should already be observable. Very, very few humans ever produce anything like truly novel insights at the forefront of human knowledge. “They have not yet reached the top <0.1% of human ability in any active research field” is an incredibly high bar I wouldn’t expect to pass until we’re already extremely close to AGI, and it should be telling that that late bar is on the short list of signs you are looking for. I would also add two other things: First, how many research labs do you think there are that have actually tried to use AI to make novel discoveries, given how little calendar time there has been to actually figure out how to adopt and use the models we do have? If Gemini 2.5 could do this today, I don’t think we’d necessarily have any idea. And second, do you believe it was a mistake that two of the 2024 Nobel prizes went to AI researchers, for work that contributes to the advancement of chemistry and physics?
AI usefulness is strongly field dependent today. In my own field, it went from a useful supplementary tool to “This does 50-80% of what new hires did and 30-50% of what I used to do, and were scrambling to refactor workflows to take advantage of it.”
Hallucinations are annoying, but good prompting strategy, model selection, and task definition can easily get the percentages down to the low single digits. In many cases the rates can easily be lower than those of a smart human given a similar amount of context. I can often literally just tell an LLM “Rewrite this prompt in such a way as to reduce the risk of hallucinations or errors, answer that prompt, then go back and check for and fix any mistakes” and that’ll cut it down a good 50-90% depending on the topic and the question complexity. I can ask the model to cite sources for factual claims, dump the sources back into the next prompt, and ask if there are any factual claims not supported by the sources. It’s a little circular, but also a bit Socratic and not really any worse than when I’ve tried to teach difficult mental skills to some bright human adults
As things stand today, if AGI is created (aligned or not) in the US, it won’t be by the USG or agents of the USG. I’ll be by a private or public company. Depending on the path to get there, there will be more or less USG influence of some sort. But if we’re going to assume the AGI is aligned to something deliberate, I wouldn’t assume AGI built in the US is aligned to the current administration, or at least significantly less so than the degree to which I’d assume AGI built in China by a Chinese company would be aligned to the current CCP.
For more concrete reasons regarding national ideals, the US has a stronger tradition of self-determination and shifting values over time, plausibly reducing risk of lock-in. It has a stronger tradition (modern conservative politics notwithstanding) of immigration and openness.
In other words, it matters a lot whether the aligned US-built AGI is aligned to the Trump administration, the Constitution, the combined writings of the US founding fathers and renowned leaders and thinkers, the current consensus of the leadership at Google or OpenAI, the overall gestalt opinions of the English-language internet, or something else. I don’t have enough understanding to make a similar list of possibilities for China, but some of the things I’d expect it would include don’t seem terrible. For example, I don’t think a genuinely-aligned Confucian sovereign AGI is anywhere near the worst outcome we could get.