Sure, space colonies happen faster—but AI-enabled and AI-dependent space colonies don’t do anything to make me think disempowerment risk gets uncorrelated.
Davidmanheim
Aside from the fact that I disagree that it helps, given that an AI takeover that’s hostile to humans isn’t a local problem, we’re optimistically decades away from such colonies being viable independent of earth, so it seems pretty irrelevant.
I admitted that it’s possible the problem is practically unsolvable, or worse; you could have put the entire world on Russell and Whitehead’s goal of systematizing math, and you might have gotten to Gödel faster, but you’d probably just waste more time.
And on Scott’s contributions, I think they are solving or contributing towards solving parts of the problems that were posited initially as critical to alignment, and I haven’t seen anyone do more. (With the possible exception of Paul Christiano, who hasn’t been focusing on research for solving alignment as much recently.) I agree that the work doesn’t don’t do much other than establish better foundations, but that’s kind-of the point. (And it’s not just Logical induction—there’s his collaboration on Embedded Agency, and his work on finite factored sets.) But the fact that the work done to establish the base for the work is more philosophical and doesn’t align AGI seems like it is moving the goalposts, even if I agree it’s true.
I don’t think I disagree with you on the whole—as I said to start, I think this is correct. (I only skimmed the full paper, but I read the post; on looking at it, the full paper does discuss this more, and I was referring to the response here, not claiming the full paper ignores the topic.)
That said, in the paper you state that the final steps require something more than human disempowerment due to other types of systems, but per my original point, seem to elide how the process until that point is identical by saying that these systems have largely been aligned with humans until now, while I think that’s untrue; humans have benefitted despite the systems being poorly aligned. (Misalignment due to overoptimization failures would look like this, and is what has been happening when economic systems are optimizing for GDP and ignoring wealth disparity, for example; the wealth goes up, but as it becomes more extreme, the tails diverge, and at this point, maximizing GDP looks very different from what a democracy is supposed to do.)
Back to the point, to the extent that the unique part is due to cutting the last humans out of the decision loop, it does differ—but it seems like the last step definitionally required the initially posited misalignment with human goals, so that it’s an alignment or corrigibility failure of the traditional type, happening at the end of this other process that, again, I think is not distinct.
Again, that’s not to say I disagree, just that it seems to ignore the broader trend by saying this is really different.
But since I’m responding, as a last complaint, you do all of this without clearly spelling out why solving technical alignment would solve this problem, which seems unfortunate. Instead, the proposed solutions try to patch the problems of disempowerment by saying you need to empower humans to stay in the decision loop—which in the posited scenario doesn’t help when increasingly powerful but fundamentally misaligned AI systems are otherwise in charge. But this is making a very different argument, and one I’m going to be exploring when thinking about oversight versus control in a different piece I’m writing.
I don’t think that covers it fully. Corporations “need… those bureaucracies,” but haven’t done what would be expected otherwise.
I think we need to add both that corporations are limited by only doing things they can convince humans to do, are aligned with at least somewhat human directors / controllers, have a check and balance system of both the people being able to whistleblow and the company being constrained by law to an extent that the people need to worry when breaking it blatantly.
But I think that breaking these constraints is going to be much closer to the traditional loss-of-control scenario than what you seem to describe.
Apologies—when I said genius, I had a very high bar in mind, no more than a half dozen people alive today, who each have single-handedly created or materially advanced an entire field. And I certainly hold Scott in very high esteem, and while I don’t know Sam or Jessica personally, I expect they are within throwing distance—but I don’t think any of them meet this insanely high bar. And Scott’s views on this, at least from ca. 2015, was a large part of what informed my thinking about this; I can’t tell the difference between him and Terry Tao when speaking with them, but he can, and he said there is clearly a qualitative difference there. Similarly for other people clearly above my league, including a friend who worked with Thurston at Cornell back in 2003-5. (It’s very plausible that Scott Aaronson is in this bucket as well, albeit in a different areas, though I can’t tell personally, and have not heard people say this directly—but he’s not actually working on the key problems, and per him, he hasn’t really tried to work on agent foundations. Unfortunately.)
So to be clear, I think Scott is a genius, but not one of the level that is needed to single-handedly advance the field to the point where the problem might be solved this decade, if it is solvable. Yes, he’s brilliant, and yes, he has unarguably done a large amount of the most valuable work in the area in the past decade, albeit mostly more foundational that what is needed to solve the problem. So if we had another dozen people of his caliber at each of a dozen universities working on this, that would be at least similar in magnitude to what we have seen in fields that have made significant progress in a decade—though even then, not all fields like hat see progress.
But the Tao / Thurston level of genius, usually in addition to the above-mentioned 100+ top people working on the problem, is what has given us rapid progress in the past in fields where such progress was possible. This may not be one of those areas—but I certainly don’t expect that we can do much better than other areas with much less intellectual firepower, hence my above claim that humanity as a whole hasn’t managed even what I’d consider a half-assed semi-serious attempt at solving a problem that deserves an entire field of research working feverishly to try our best to actually not die—and not just a few lone brilliant researchers.
One thing though I kept thinking: Why doesn’t the article mention AI Safety research much?
Because almost all of current AI safety research can’t make future agentic ASI that isn’t already aligned with human values safe, as everyone who has looked at the problem seems to agree. And the Doomers certainly have been clear about this, even as most of the funding goes to prosaic alignment.
I hate to be insulting to a group of people I like and respect, but “the best agent foundations work that’s happened over ~10 years of work” was done by a very small group of people who, despite being very smart, certainly smarter than myself, aren’t academic superstars or geniuses (Edit to add: on a level that is arguably sufficient, as I laid out in my response below.) And you agree about this. The fact that they managed to make significant progress is fantastic, but substantial progress on deep technical problems is typically due to (ETA: only-few-in-a-generation level) geniuses, large groups of researchers tackling the problem, or usually both. And yes, most work on the topic won’t actually address the key problem, just like most work in academia does little or nothing to advance the field. But progress happens anyways, because intentionally or accidentally, progress on problems is often cumulative, and as long as a few people understand the problem that matters, someone usually actually notices when a serious advance occurs.
I am not saying that more people working on the progress and more attention would definitely crack the problems in the field this decade, but I certainly am saying that humanity as a whole hasn’t managed even what I’d consider a half-assed semi-serious attempt.
I think this is correct, but doesn’t seem to note the broader trend towards human disempowerment in favor of bureaucratic and corporate systems, which this gradual disempowerment would continue, and hence elides or ignores why AI risk is distinct.
“when, if ever, our credences ought to capture indeterminacy in how we weigh up considerations/evidence”
The obvious answer is only when there is enough indeterminacy to matter; I’m not sure if anyone would disagree. Because the question isn’t whether there is indeterminacy, it’s how much, and whether it’s worth the costs of using a more complex model instead of doing it the Bayesian way.I’d be surprised if many/most infra-Bayesians would endorse suspending judgment in the motivating example in this post
You also didn’t quite endorse suspending judgement in that case—“If someone forced you to give a best guess one way or the other, you suppose you’d say “decrease”. Yet, this feels so arbitrary that you can’t help but wonder whether you really need to give a best guess at all…” So, yes, if it’s not directly decision relevant, sure, don’t pick, say you’re uncertain. Which is best practice even if you use precise probability—you can have a preference for robust decisions, or a rule for withholding judgement when your confidence is low. But if it is decision relevant, and there is only a binary choice available, your best guess matters. And this is exactly why Eliezer says that when there is a decision, you need to focus your indeterminacy, and why he was dismissive of DS and similar approaches.
I’m not merely saying that agents shouldn’t have precise credences when modeling environments more complex than themselves
You seem to be underestimating how pervasive / universal this critique is—essentially every environment is more complex than we are, at the very least when we’re embedded agents, or other humans are involved. So I’m not sure where your criticism (which I agree with) is doing more than the basic argument is in a very strong way—it just seems to be stating it more clearly.
The problem is that Kolmogorov complexity depends on the language in which algorithms are described. Whatever you want to say about invariances with respect to the description language, this has the following unfortunate consequence for agents making decisions on the basis of finite amounts of data: For any finite sequence of observations, we can always find a silly-looking language in which the length of the shortest program outputting those observations is much lower than that in a natural-looking language (but which makes wildly different predictions of future data).
Far less confident here, but I think this isn’t correct as a mater of practice. Conceptually, Solomonoff doesn’t say “pick an arbitrary language once you’ve seen the data and then do the math” it says “pick an arbitrary language before you’ve seen any data and then do the math.” And if we need to implement the silly looking language, there is a complexity penalty to doing that, one that’s going to be similarly large regardless of what baseline we choose, and we can determine how large it is in reducing the language to some other language. (And I may be wrong, but picking a language cleverly should not means that Kolmogorov complexity will change something requiring NP programs to encode into something that P programs can encode, so this criticism seems weak anyways outside of toy examples.)
Strongly agree. I was making a narrower point, but the metric is clearly different than the goal—if anything it’s more surprising that we see so much correlation as we do, given how much it has been optimized.
Toby Ord writes that “the required resources [for LLM training] grow polynomially with the desired level of accuracy [measured by log-loss].” He then concludes that this shows “very poor returns to scale,” and christens it the “Scaling Paradox.” (He continues to point out that this doesn’t imply it can’t create superintelligence, but I agree with him about that.)
But what would it look like if this were untrue? That is, what would be the conceptual alternative, where required resources grow more slowly?I think the answer is that it’s conceptually impossible.
To start, there is a fundamental bound on loss at zero, since the best possible model perfectly predicts everything—it exactly learns the distribution. This can happen when overfitting a model, but it can also happen when there is a learnable ground truth; models that are trained to learn a polynomial function can learn them exactly.
But there is strong reason to expect the bound to be significantly above zero loss. The training data for LLMs contains lots of aleatory randomness, things that are fundamentally conceptually unpredictable. I think it’s likely that things like RAND’s random number book are in the training data, and it’s fundamentally impossible to predict randomness. I think something similar is generally true for many other things—predicting world choice for semantically equivalent words, predicting where typos occur, etc.
Aside from being bound well above zero, there’s a strong reason to expect that scaling is required to reduce loss for some tasks. In fact, it’s mathematically guaranteed to require significant computation to get near that level for many tasks that are in the training data. Eliezer pointed out that GPTs are predictors, and gives the example of a list of numbers followed by their two prime factors. It’s easy to generate such a list by picking pairs of primes and multiplying them, the writing the answer first—but decreasing loss for generating the next token to predict the primes from the product is definitionally going to require exponentially more computation to perform better for larger primes.
And I don’t think this is the exception, I think it’s at least often the rule. The training data for LLMs contains lots of data where the order of the input doesn’t follow the computational order of building that input. When I write an essay, I sometimes arrive at conclusions and then edit the beginning to make sense. When I write code, the functions placed earlier often don’t make sense until you see how they get used later. Mathematical proofs are another example where this would often be true.
An obvious response is that we’ve been using exponentially more compute for better accomplishing tasks that aren’t impossible in this way—but I’m unsure if that is true. Benchmarks keep getting saturated, and there’s no natural scale for intelligence. So I’m left wondering whether there’s any actual content in the “Scaling Paradox.”
(Edit: now also posted to my substack.)
Davidmanheim’s Shortform
True, and even more, if optimizing for impact or magnitude has Goodhart effects, of various types, then even otherwise good directions are likely to be ruined by pushing on them too hard. (In large part because it seems likely that the space we care about is not going to have linear divisions into good and bad, there will be much more complex regions, and even when pointed in a directino that is locally better, pushing too far is possible, and very hard to predict from local features even if people try, which they mostly don’t.)
I think the point wasn’t having a unit norm, it was that impact wasn’t defined as directional, so we’d need to remove the dimensionality from a multidimensionally defined direction.
So to continue the nitpicking, I’d argue impact = || Magnitude * Direction ||, or better, ||Impact|| = Magnitude * Direction, so that we can talk about size of impact. And that makes my point in a different comment even clearer—because almost by assumption, the vast majority of those with large impact are pointed in net-negative directions, unless you think either a significant proportion of directions are positive, or that people are selecting for it very strongly, which seems not to be the case.
I think some of this is on target, but I also think there’s insufficient attention to a couple of factors.
First, in the short and intermediate term, I think you’re overestimating how much most people will actually update their personal feelings around AI systems. I agree that there is a fundamental reason that fairly near-term AI will be able to function as better companion and assistant than humans—but as a useful parallel, we know that nuclear power is fundamentally better than most other power sources that were available in the 1960s, but people’s semi-irrational yuck reaction to “dirty” or “unclean” radiation—far more than the actual risks—made it publicly unacceptable. Similarly, I think the public perception of artificial minds will be generally pretty negative, especially looking at current public views of AI. (Regardless of how appropriate or good this is in relation to loss-of-control and misalignment, it seems pretty clearly maladaptive for generally friendly near-AGI and AGI systems.)
Second, I think there is a paperclip maximizer aspect to status competition, in the sense Eliezer uses the concept. That is, Specifically, given massively increased wealth, abilities, and capacity, even if a implausibly large 99% of humans find great ways to enhance their lives in ways that don’t devolve into status competition, there are few other domains where an indefinite amount of wealth and optimization power can be applied usefully. Obviously, this is at best zero-sum, but I think there aren’t lots of obvious alternative places for positive sum indefinite investments. And even where such positive-sum options exist, they often are harder to arrive at as equilibria. (We see a similar dynamic with education, housing, and healthcare, where increasing wealth leads to competition over often artificially-constrained resources rather than expansion of useful capacity.)
Finally and more specifically, your idea that we’d see intelligence enhancement as a new (instrumental) goal in the intermediate term seems possible and even likely, but not a strong competitor for, nor inhibitor of, status competition. (Even ignoring the fact that intelligence itself is often an instrumental goal for status competition!) Even aside from the instrumental nature of the goal, I will posit that some strongly reduced returns to investment will exist—regardless of the fact that it’s unlikely on priors that these limits are near the current levels. Once those points are reached, the indefinite investment of resources will trade-off between more direct status competition and further intelligence increases, and as the latter shows decreased returns, as noted above, the former becomes the metaphorical paperclip which individuals can invest indefinitely into.
my uninformed intuition is that the people with the biggest positive impact on the world have prioritized the Magnitude
That’s probably true, but it’s selecting on the outcome variable. And I’ll bet that the people with the biggest negative impact are even more overwhelmingly also those who prioritized magnitude.
“If you already know that an adverse event is highly likely for your specific circumstances, then it is likely that the insurer will refuse to pay out for not disclosing “material information”—a breach of contract.”
Having worked in insurance, that’s not what the companies usually do. Denying explicitly for clear but legally hard to defend reasons, especially those which a jury would likely rule against, isn’t a good way to reduce costs and losses. (They usually will just say no and wait to see if you bother following up. Anyone determined enough to push to get a reasonable claim is gonna be cheaper to pay out for than to fight.)
The sequence description is: “Short stories about (implausible) AI dooms. Any resemblance to actual AI takeover plans is purely coincidental.“