Agent foundations, AI macrostrategy, human enhancement.
I endorse and operate by Crocker’s rules.
I have not signed any agreements whose existence I cannot mention.
Agent foundations, AI macrostrategy, human enhancement.
I endorse and operate by Crocker’s rules.
I have not signed any agreements whose existence I cannot mention.
So it seems to be a reasonable interpretation that we might see human level AI around mid-2030 to 2040, which happens to be about my personal median.
What are the reasons your median is mid-2030s to 2040, other than this way of extrapolating the METR results?
How does the point about Hitler murder plots connect to the point about anthropics?
they can’t read Lesswrong or EA blogs
VPNs exist and are probably widely used in China + much of “all this work” is on ArXiv etc.
If that was his goal, he has better options.
I’m confused about how to think about this idea, but I really appreciate having this idea in my collection of ideas.
To show how weird English is: English is the only proto indo european language that doesn’t think the moon is female (“la luna”) and spoons are male (“der Löffel”). I mean… maybe not those genders specifically in every language. But some gender in each language.
Persian is ungendered too. They don’t even have gendered pronouns.
Writing articles in Chinese for my family members, explaining things like cognitive bias, evolutionary psychology, and why dialectical materialism is wrong.
Your needing to write them seems to suggest that there’s not enough content like that in Chinese, in which case it would plausibly make sense to publish them somewhere?
I’m also curious about how your family received these articles.
I think that the scenario of the war between several ASI (each merged with its origin country) is underexplored. Yes, there can be a value handshake between ASIs, but their creators will work to prevent this and see it as a type of misalignment.
Not clear to me, as long as they expect the conflict to be sufficiently destructive.
I wonder whether it’s related to this https://x.com/RichardMCNgo/status/1866948971694002657 (ping to @Richard_Ngo to get around to writing this up (as I think he hasn’t done it yet?))
Since this is about written English text (or maybe more broadly, text in Western languages written in Latinic or Cyrillic), the criterion is: ends with a dot, starts with an uppercase letter.
Fair enough. Modify my claim to “languages tend to move from fusional to analytic (or something like that) as their number of users expands”.
Related: https://www.lesswrong.com/posts/Pweg9xpKknkNwN8Fx/have-attention-spans-been-declining
Another related thing is that the grammar of languages appears to be getting simpler with time. Compare the grammar of Latin to that of modern French or Spanish. Or maybe not quite simpler but more structured/regular/principled, as something like the latter has been reproduced experimentally https://royalsocietypublishing.org/doi/10.1098/rspb.2019.1262 (to the extent that this paper’s findings generalize to natural language evolution).
Somewhat big if true although the publication date makes it marginally less likely to be true.
The outline in that post is also very buggy, probably because of the collapsible sections.
Any info on how this compares to other AI companies?
Link to the source of the quote?
Seeing some training data more than once would make the incentive to [have concepts that generalize OOD] weaker than if [they saw every possible training datapoint at most once], but this doesn’t mean that the latter is an incentive towards concepts that generalize OOD.
Though admittedly, we are getting into the discussion of where to place the zero point of “null OOD generalization incentive”.
Also, I haven’t looked into it, but it’s plausible to me that models actually do see some data more than once because there are a lot of duplicates on the internet. If your training data contains the entire English Wikipedia, nlab, and some math textbooks, then surely there’s a lot of duplicated theorems and exercises (not necessarily word-by-word, but it doesn’t have to be word-by-word).
But I realized there might be another flaw in my comment, so I’m going to add an ETA.
(If I’m misunderstanding you, feel free to elaborate, ofc.)
I’m curious about your exercise regimen.
DeepMind says boo SAEs, now Anthropic says yay SAEs!
The most straightforward synthesis[1] of these two reports is that SAEs find some sensible decomposition of the model’s internals into computational elements (concepts, features, etc.), which circuits then operate on. It’s just that these computational elements don’t align with human thinking as nicely as humans would like. E.g. SAE-based concept probes don’t work well OOD because the models were not optimized to have concepts that generalize OOD. This is perfectly consistent with linear probes being able to detect the concept from model activations (the model retains enough information about the concept such as “harmful intent” for the probe to latch onto it, even if the concept itself (or rather, its OOD-generalizing version) is not priviledged in the model’s ontology).
ETA: I think this would (weakly?) predict that SAE generalization failures should align with model performance dropping on some tasks. Or at least that the model would need to have some other features that get engaged OOD so that the performance doesn’t drop? Investigating this is not my priority, but I’d be curious to know if something like this is the case.
not to say that I’m believing it’s strongly; it’s just a tentative/provisional synthesis/conclusion
Steganography /j