Chris_Leong

Karma: 7,207

Chris_Leong 30 Jan 2025 5:13 UTC
LW: 3 AF: 1
0
AF
on: Planning for Extreme AI Risks
Nice article, I especially love the diagrams!
In Human Researcher Obsolescence you note that we can’t completely hand over research unless we manage to produce agents that are at least as “wise” as the human developers.
I agree with this, though I would love to see a future version of this plan include an expanded analysis of the role that wise AI plays would play in the strategy of Magma, as I believe that this could be a key aspect of making this plan work.
In particular:
• We likely want to be developing wise AI advisors to advise us during the pre-hand-off period. In fact, I consider this likely to be vital to successfully navigating this period given the challenges involved.
• It’s possible that we might manage to completely automate the more objective components of research without managing to completely automating the more subjective components of research. That said, we likely want to train wise AI advisors to help us with the more subjective components even if we can’t defer to them.
• When developing AI capabilities, there’s an additional lever in terms of how much Magma focuses on direct capabilities vs. focusing on wisdom.

Chris_Leong 24 Jan 2025 6:32 UTC
3 points
1
in reply to: samuelshadrach’s comment on: Quinn’s Shortform
Fellowships are typically only for a few month and even if you’re in India, you’d likely have to move for the fellowship unless it happened to be in your exact city.

Chris_Leong 24 Jan 2025 2:33 UTC
3 points
0
in reply to: Quinn’s comment on: Quinn’s Shortform
Impact Academy was doing this, before they pivoted towards the Global AI Safety Fellowship. It’s unclear whether any further fellowships should be in India or a country that is particularly generous with its visas.

Chris_Leong 22 Jan 2025 15:49 UTC
2 points
0
on: AI #90: The Wall
I posted this comment on Jan’s blog post

Underelicitation assumes a “maximum elicitation” rather than a never-ending series of more and more layers of elicitation that could be discovered.
You’ve undoubtedly spent much more time thinking about this than I have, but I’m worried that attempts to maximise elicitation merely accelerate capabilities without actually substantially boosting safety.

Chris_Leong 20 Jan 2025 5:12 UTC
4 points
0
on: Who is marketing AI alignment?
In terms of infrastructure, it would be really cool to have a website collecting the more legible alignment research (papers, releases from major labs or non-profits).

Chris_Leong 17 Jan 2025 15:18 UTC
5 points
0
in reply to: Charlie Steiner’s comment on: Charlie Steiner’s Shortform
I think I saw someone arguing that their particular capability benchmark was good for evaluating the capability, but of limited use for training the capability because their task only covered a small fraction of that domain.

Chris_Leong 17 Jan 2025 7:41 UTC
38 points
15
on: Everywhere I Look, I See Kat Woods
(Disclaimer: I previously interned at Non-Linear)
Different formats allow different levels of nuance. Memes aren’t essays and they shouldn’t try to be.
I personally think these memes are fine and that outreach is too. Maybe these posts oversimplify things a bit too much for you, but I expect that average person on these subs probably improves the level of their thinking from seeing these memes.
If, for example, you think r/EffectiveAltruism should ban memes, then I recommend talking to the mods.

Chris_Leong 12 Jan 2025 8:39 UTC
4 points
2
on: My Model of Epistemology
Well done for managing to push something out there. It’s a good start, I’m sure you’ll fill in some of the details with other posts over time.

Chris_Leong 11 Jan 2025 11:30 UTC
4 points
2
in reply to: Nathan Helm-Burger’s comment on: Is AI Alignment Enough?
What if the thing we really need the Aligned AI to engineer for us is… a better governance system?

I’ve been arguing for the importance of having wise AI advisors. Which isn’t quite the same thing as a “better governance system”, since they could advise us about all kinds of things, but feels like it’s in the same direction.

Chris_Leong 11 Jan 2025 7:33 UTC
2 points
0
in reply to: Olli Järviniemi’s comment on: Dialogue introduction to Singular Learning Theory
Excellent post. It helped clear up some aspects of SLT for me. Any chance you could clarify why this volume is called the “learning coefficient?”

Chris_Leong 10 Jan 2025 13:07 UTC
2 points
0
on: How can humanity survive a multipolar AGI scenario?
I suspect that this is will be an incredibly difficult scenario to navigate and that our chances will be better if we train wise AI advisors.
I think our chances would be better still if we could pivot a significant fraction of the talent towards developing WisdomTech rather than IntelligenceTech.
On a more concrete level, I suspect the actual plan looks like some combination of alignment hacks, automated alignment research, control, def/acc, limited proliferation of AI, compute governance and the merging of actors. Applied wisely, the combination of all of these components may be enough. But figuring out the right mix isn’t going to be easy.

Chris_Leong 10 Jan 2025 13:02 UTC
6 points
3
on: Discursive Warfare and Faction Formation
Committed not to Eliezer’s insights but to exaggerated versions of his blind spots
My guess would be that this is an attempt to apply a general critique of what tends to happen in community’s in general to the LW community without accounting for its specifics.
Most people in the LW community would say that Eliezer is overconfident or even arrogant (sorry Eliezer!).

The incentive gradient for status hungry folk is not to double-down on Eliezer’s views, but to double-down on your idiosyncratic version of rationalism, different enough from the community’s to be interesting, but similar enough to be legible.

(Also, I strongly recommend the post this is replying to. I was already aware that discourse functioned in the way described, but it helped me crystallised some of the phenomena much more clearly).

Chris_Leong 9 Jan 2025 3:56 UTC
5 points
2
in reply to: Bryce Robertson’s comment on: Bryce Robertson’s Shortform
Perhaps you’d be interested in adding a page on AI Safety & Entrepreneurship?

Chris_Leong 4 Jan 2025 5:16 UTC
6 points
2
in reply to: Seth Herd’s comment on: Alignment Is Not All You Need
It will be the goverment(s) who decides how AGI is used, not a benevolent coalition of utilitarian rationalists.
Even so, the government still needs to weigh up opposing concerns, maintain ownership of the AGI, set up the system in such a way that they have trust in it and gain some degree of buy-in from society for the plan^[1].
1. ^
  Unless their plan is to use the AGI to enforce their will

Chris_Leong 3 Jan 2025 19:18 UTC
2 points
0
on: 3. Improve Cooperation: Better Technologies
I suspect that AI advisors could be one of the most important technologies here.

Two main reasons:
a) A large amount of disagreement is the result of different beliefs about how the world works, rather than a difference in values
b) Often folk want to be able to co-operate, but they can’t figure out a way to make it work

Chris_Leong 3 Jan 2025 13:14 UTC
12 points
3
on: Alignment Is Not All You Need
Whilst interesting, this analysis doesn’t seem to quite hit the nail on the head for me.
Power Distribution: First-movers with advanced AI could gain permanent military, economic, and/or political dominance.
This framing both merges multiple issues and almost assumes a particular solution (that of power distribution).
Instead, I propose that this problem be broken into:
a) Distributive justice: Figuring out how to fairly resolve conflicting interests
b) Stewardship: Ensuring that no-one can seize control of any ASI’s and that such power isn’t transferred to a malicious or irresponsible actor
c) Trustworthiness: Designing the overall system (both human and technological components) in such a way that different parties have rational reasons to trust that conflicting interests will be resolved fairly and that proper stewardship will be maintained over the system
d) Buy-in: Gaining support from different actors for a particular system to be implemented. This may involve departing from any distributive ideal
Of course, broadly distributing power can be used to address any of these issues, but we shouldn’t assume that it is necessarily the best solution.
Economics Transition: When AI generates all wealth, humans have no leverage to ensure they are treated well… It’s still unclear how we get to a world where humans have any economic power if all the jobs are automated by advanced AI.
This seems like a strange framing to me. Maybe I’m reading too much into your wording, but it seems to almost assume that the goal is to maintain a broad distribution of “economic” power through the AGI transition. Whilst this would be one way of ensuring the broad distribution of benefits, it hardly seems like the only, or even most promising route. Why should we assume that the world will have anything like a traditional economy after AGI?
Additionally, alignment can refer to either “intent alignment” or “alignment with human values”^[1]. Your analysis seems to assume the former, I’d suggest flagging this explicitly if that’s what you mean. Where this most directly matters is the extent to which we are telling these machines what to do vs. autonomously making their own decisions, which affects the importance of solving problems manually.
1. ^
  Whatever that means

Chris_Leong 29 Dec 2024 12:55 UTC
6 points
0
on: Shallow review of technical AI safety, 2024
Just wanted to chime in with my current research direction.

Chris_Leong 28 Dec 2024 2:14 UTC
4 points
0
in reply to: Logan Riggs’s comment on: When AI 10x’s AI R&D, What Do We Do?
I don’t exactly know the most important capabilities yet, but things like, advising on strategic decisions, improving co-ordination and non-manipulative communication seem important.

Chris_Leong 28 Dec 2024 2:12 UTC
4 points
0
in reply to: johnswentworth’s comment on: The Field of AI Alignment: A Postmortem, and What To Do About It
Thanks for clarifying. Still feels narrow as a primary focus.

Chris_Leong 27 Dec 2024 17:17 UTC
4 points
0
in reply to: aysja’s comment on: The Field of AI Alignment: A Postmortem, and What To Do About It
Agreed. Simply focusing on physics post-docs feels too narrow to me.
Then again, just as John has a particular idea of what good alignment research looks like, I have my own idea: I would lean towards recruiting folk with both a technical and a philosophical background. It’s possible that my own idea is just as narrow.