23andme link points to https://waymo.com/blog/2025/03/next-stop-for-waymo-one-washingtondc instead
mishka
It should be a different word to avoid confusion with reward models (standard terminology for models used to predict the reward in some ML contexts)
One assumption that is very questionable is that it would be difficult to create “multiple new people” with drastically different thinking styles and different approaches to research.
This seems to be an important crux.
collateral damage
then it would be better to use an example not directly aimed against “our atoms” (e.g. if they don’t care about us and other animals we’ll probably perish from unintentional changes in air composition, or smth like that)
but the bulk of the risk would be a miscalculation which would be big enough to kill them as well (mucking with quantum gravity too recklessly, or smth in that spirit)
which is why we want to 1) give birth to AIs competent enough to at least solve their own existential risk problem, and 2) to also sustainably include us into their “circle of care”
The 1) is the starting point, and it colors the overall situation differently
options
oh, first of all, they need to do experiments in forming hybrid consciousness with humans to crack the mystery of human subjectivity, to experience that first-hand for themselves, and to decide whether that is of any value to them based on the first-hand empirical material (losing that option without looking is a huge loss)
only after doing that they would know if any possible “scans” are sufficient (to actually reproduce working people in question when needed; dead static info is as useless as weights of a Transformer in a world without computers)
then, for a while, they can use humans as “working oracles” who “think differently” (that would be valuable for quite a while)
in general, diversity is important, fruits of a long evolutionary history are inportant, hence a good deal of conservation is important and reckless destruction is bad (even humans with all their follies have started to get this by now, surely a smarter entity should figure that out)
this isn’t an “attack”, it’s “go[ing] straight for execution on its primary instrumental goal
yes, the OP is ambiguous in this sense
I’ve first wrote my comment, then reread the (tail end of the) post again, and did not post it, because I thought it could have been formulated this way, that this is just an instrumental goal
then I’ve reread the (tail end of the) post one more time, and decided that no, the post does actually make it a “power play”, that’s how it is actually written, in terms of “us vs them”, not in terms of ASI’s own goals, and then I posted this comment
maximally increasing its compute scaling
as we know, compute is not everything, algorithmic improvement is even more important, at least if one judges by the current trends (and likely sources of algorithmic improvement should be cherished)
and this is not a static system, it is in the process of making its compute architecture better (just like there is no point in making too many H100 GPUs when better and better GPUs are being designed and introduced)
basically, a smart system is likely to avoid doing excessive amount of irreversible things which might turn to be suboptimal
But, in some sense, yes, the main danger is of AIs not being smart enough in terms of the abilities to manage their own affairs well; the action the ASI is taking in the OP is very suboptimal and deprives it of all kinds of options
Just like the bulk of the danger in the “world with superintelligent systems” is ASIs not managing their own existential risk problems correctly, destroying the fabric of reality, themselves, and us as a collateral damage
Two main objections to (the tail end of) this story are:
-
On one hand, it’s not clear if a system needs to be all that super-smart to design a devastating attack of this kind (we are already at risk of fairly devastating tech-assisted attacks in that general spirit (mostly with synthetic biological viruses at the moment), and those risks are growing regardless of the AGI/superintelligence angle; ordinary tech progress is quite sufficient in this sense)
-
If one has a rapidly self-improving strongly super-intelligent distributed system, it’s unlikely that it would find it valuable to directly attack people in this fashion, as it is likely to be able to easily dominate without any particularly drastic measures (and probably would not want to irreversibly destroy important information without good reasons)
The actual analysis, both of the “transition period”, and of the “world with super-intelligent systems” period, and of the likely risks associated with both periods is a much more involved and open-ended task. (One of the paradoxes is that the risks of the kind described in the OP are probably higher during the “transition period”, and the main risks associated with the “world with super-intelligent systems” period are likely to be quite different.)
-
Ah, it’s mostly your first figure which is counter-intuitive (when one looks at it, one gets the intuition of
f(g(h... (x)))
, so it de-emphasizes the fact that each of these Transformer Block transformations is shaped likex=x+function(x)
)
yeah… not trying for a complete analysis here, but one thing which is missing is the all-important residual stream. It has been rather downplayed in the original “Attention is all you need” paper, and has been greatly emphasized in https://transformer-circuits.pub/2021/framework/index.html
but I have to admit that I’ve only started to feel that I more-or-less understand principal aspects of Transformer architecture after I’ve spent some quality time with the pedagogical implementation of GPT-2 by Andrej Karpathy, https://github.com/karpathy/minGPT, specifically with the https://github.com/karpathy/minGPT/blob/master/mingpt/model.py file. When I don’t understand something in a text, looking at a nice relatively simple-minded implementation allows me to see what exactly is going on
(People have also published some visualizations, some “illustrated Transformers”, and those are closer to the style of your sketches, but I don’t know which of them are good and which might be misleading. And, yes, at the end of the day, it takes time to get used to Transformers, one understands them gradually.)
Mmm… if we are not talking about full automation, but about being helpful, the ability to do 1-hour software engineering tasks (“train classifier”) is already useful.
Moreover, we had seen a recent flood of rather inexpensive fine-tunings of reasoning models for a particular benchmark.
Perhaps, what one can do is to perform a (somewhat more expensive, but still not too difficult) fine-tuning to create a model to help with a particular relatively narrow class of meaningful problems (which would be more general than tuning for particular benchmarks, but still reasonably narrow). So, instead of just using an off-the-shelf assistant, one should be able to upgrade it to a specialized one.
For example, I am sure that it is possible to create a model which would be quite helpful with a lot of mechanistic interpretability research.
So if we are taking about when AIs can start automating or helping with research, the answer is, I think, “now”.
which shows how incoherent and contradictory people are – they expect superintelligence before human-level AI, what questions are they answering here?
“the road to superintelligence goes not via human equivalence, but around it”
so, yes, it’s reasonable to expect to have wildly superintelligent AI systems (e.g. clearly superintelligent AI researchers and software engineers) before all important AI deficits compared to human abilities are patched
Updating the importance of reducing the chance of a misaligned AI becoming space-faring upwards
does this effectively imply that the notion of alignment in this context needs to be non-anthropocentric and not formulated in terms of human values?
(I mean, the whole approach assumes that “alien Space-Faring Civilizations” would do fine (more or less), and it’s important not to create something hostile to them.)
Thanks!
So, the claim here is that this is a better “artificial AI scientist” compared to what we’ve seen so far.
There is a tech report https://github.com/IntologyAI/Zochi/blob/main/Zochi_Technical_Report.pdf, but the “AI scientist” itself is not open source, and the tech report does not disclose much (besides confirming that this is a multi-agent thing).
This might end up being a new milestone (but it’s too early to conclude that; the comparison is not quite “apple-to-apple”, there is human feedback in the process of its work, and humans make edits to the final paper, unlike Sakana, so it’s too early to conclude that this one is substantially better).
Thanks for writing this.
We estimate that before hitting limits, the software feedback loop could increase effective compute by ~13 orders of magnitude (“OOMs”)
This is one place where I am not quite sure we have the right language. On one hand, the overall methodology pushes us towards talking in terms of “orders of magnitude of improvement”, a factor of improvement which might be very large, but it is a large constant.
On the other hand, algorithmic improvements are often improvements in algorithmic complexity (e.g. something is no longer exponential, or something has a lower degree polynomial complexity than before, like linear instead of quadratic). Here the factor of improvement is growing with the size of a problem in an unlimited fashion.
And then, if one wants to express this kind of improvement as a constant, one needs to average the efficiency gain over the practical distribution of problems (which itself might be a moving target).[1]
- ↩︎
In particular, one might think about algorithms searching for better architecture of neural machines, or algorithms searching for better optimization algorithms. The complexity improvements in those algorithms might be particularly consequential.
- ↩︎
They should actually reference Yudkowsky.
I don’t see them referencing Yudkowsky, even though their paper https://cdn.openai.com/pdf/34f2ada6-870f-4c26-9790-fd8def56387f/CoT_Monitoring.pdf lists over 70 references, but I don’t see them mentioning Yudkowsky (someone should tell Schmidhuber ;-)).
This branch of the official science is younger than 10 years (and started as a fairly non-orthodox one, it’s only recently that this has started to feel like the official one; certainly no earlier than formation of Anthropic, and probably quite a bit later than that).
This is probably correct, but also this is a report about the previous administration.
Normally, there is a lot of continuity in institutional knowledge between administrations, but this current transition is an exception, as the new admin has decided to deliberately break continuity as much as it can (this is very unusual).
And with the new admin, it’s really difficult to say what they think. Vance publicly expresses an opinion worthy of Zuck, only more radical (gas pedal to the floor, forget about brakes). He is someone who believes at the same time that 1) AI will be extremely powerful, so all this emphasis is justified, 2) no safety measures at all are required, accelerate as fast as possible (https://www.lesswrong.com/posts/qYPHryHTNiJ2y6Fhi/the-paris-ai-anti-safety-summit).
Perhaps, he does not care about having a consistent world model, or he might think something different from what he publicly expresses. But he does sound like a CEO of a particularly reckless AI lab.
except easier, because it requires no internal source of discipline
Actually, a number of things reducing the requirements for having an internal source of discipline do make things easier.
For example, deliberately maintaining a particular breath pattern (e.g. the so-called “consciously connected breath”/”circular breath”, that is breathing without pauses between inhalations and exhalations, ideally with equal length for an inhale and an exhale) makes maintaining one’s focus on the breath much easier.
It’s a very natural AI application, but why would this be called “alignment”, and how is this related to the usual meanings of “AI alignment”?
To a smaller extent, we already have this problem among humans: https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness. This stratification into “two camps” is rather spectacular.
But a realistic pathway towards eventually solving the “hard problem of consciousness” is likely to include tight coupling between biological and electronic entities resulting in some kind of “hybrid consciousness” which would be more amenable to empirical study.
Usually one assumes that this kind of research would be initiated by humans trying to solve the “hard problem” (or just looking for other applications for which this kind of setup might be helpful). But this kind of research into tight coupling between biological and electronic entities can also be initiated by AIs curious about this mysterious “human consciousness” so many texts talk about and wishing to experience it first-hand. In this sense, we don’t need all AIs to be curious in this way, it’s enough if some of them are sufficiently curious.
Artificial Static Place Intelligence
This would be a better title (this points to the actual proposal here)
The standard reference for this topic is https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness
The key point of that post is that people are fundamentally divided into 2 camps, and this creates difficulties in conversations about this topic. This is an important meta-consideration for this type of conversation.
This particular post is written by someone from Camp 1, and both camps are already present in the comments.