Fellowships are typically only for a few month and even if you’re in India, you’d likely have to move for the fellowship unless it happened to be in your exact city.
Chris_Leong
Impact Academy was doing this, before they pivoted towards the Global AI Safety Fellowship. It’s unclear whether any further fellowships should be in India or a country that is particularly generous with its visas.
I posted this comment on Jan’s blog post
Underelicitation assumes a “maximum elicitation” rather than a never-ending series of more and more layers of elicitation that could be discovered.You’ve undoubtedly spent much more time thinking about this than I have, but I’m worried that attempts to maximise elicitation merely accelerate capabilities without actually substantially boosting safety.
In terms of infrastructure, it would be really cool to have a website collecting the more legible alignment research (papers, releases from major labs or non-profits).
I think I saw someone arguing that their particular capability benchmark was good for evaluating the capability, but of limited use for training the capability because their task only covered a small fraction of that domain.
(Disclaimer: I previously interned at Non-Linear)
Different formats allow different levels of nuance. Memes aren’t essays and they shouldn’t try to be.
I personally think these memes are fine and that outreach is too. Maybe these posts oversimplify things a bit too much for you, but I expect that average person on these subs probably improves the level of their thinking from seeing these memes.
If, for example, you think r/EffectiveAltruism should ban memes, then I recommend talking to the mods.
Well done for managing to push something out there. It’s a good start, I’m sure you’ll fill in some of the details with other posts over time.
What if the thing we really need the Aligned AI to engineer for us is… a better governance system?
I’ve been arguing for the importance of having wise AI advisors. Which isn’t quite the same thing as a “better governance system”, since they could advise us about all kinds of things, but feels like it’s in the same direction.
Excellent post. It helped clear up some aspects of SLT for me. Any chance you could clarify why this volume is called the “learning coefficient?”
I suspect that this is will be an incredibly difficult scenario to navigate and that our chances will be better if we train wise AI advisors.
I think our chances would be better still if we could pivot a significant fraction of the talent towards developing WisdomTech rather than IntelligenceTech.
On a more concrete level, I suspect the actual plan looks like some combination of alignment hacks, automated alignment research, control, def/acc, limited proliferation of AI, compute governance and the merging of actors. Applied wisely, the combination of all of these components may be enough. But figuring out the right mix isn’t going to be easy.
Committed not to Eliezer’s insights but to exaggerated versions of his blind spots
My guess would be that this is an attempt to apply a general critique of what tends to happen in community’s in general to the LW community without accounting for its specifics.
Most people in the LW community would say that Eliezer is overconfident or even arrogant (sorry Eliezer!).
The incentive gradient for status hungry folk is not to double-down on Eliezer’s views, but to double-down on your idiosyncratic version of rationalism, different enough from the community’s to be interesting, but similar enough to be legible.
(Also, I strongly recommend the post this is replying to. I was already aware that discourse functioned in the way described, but it helped me crystallised some of the phenomena much more clearly).
Perhaps you’d be interested in adding a page on AI Safety & Entrepreneurship?
It will be the goverment(s) who decides how AGI is used, not a benevolent coalition of utilitarian rationalists.
Even so, the government still needs to weigh up opposing concerns, maintain ownership of the AGI, set up the system in such a way that they have trust in it and gain some degree of buy-in from society for the plan[1].
- ^
Unless their plan is to use the AGI to enforce their will
- ^
I suspect that AI advisors could be one of the most important technologies here.
Two main reasons:
a) A large amount of disagreement is the result of different beliefs about how the world works, rather than a difference in values
b) Often folk want to be able to co-operate, but they can’t figure out a way to make it work
Whilst interesting, this analysis doesn’t seem to quite hit the nail on the head for me.
Power Distribution: First-movers with advanced AI could gain permanent military, economic, and/or political dominance.
This framing both merges multiple issues and almost assumes a particular solution (that of power distribution).
Instead, I propose that this problem be broken into:
a) Distributive justice: Figuring out how to fairly resolve conflicting interests
b) Stewardship: Ensuring that no-one can seize control of any ASI’s and that such power isn’t transferred to a malicious or irresponsible actor
c) Trustworthiness: Designing the overall system (both human and technological components) in such a way that different parties have rational reasons to trust that conflicting interests will be resolved fairly and that proper stewardship will be maintained over the system
d) Buy-in: Gaining support from different actors for a particular system to be implemented. This may involve departing from any distributive ideal
Of course, broadly distributing power can be used to address any of these issues, but we shouldn’t assume that it is necessarily the best solution.
Economics Transition: When AI generates all wealth, humans have no leverage to ensure they are treated well… It’s still unclear how we get to a world where humans have any economic power if all the jobs are automated by advanced AI.
This seems like a strange framing to me. Maybe I’m reading too much into your wording, but it seems to almost assume that the goal is to maintain a broad distribution of “economic” power through the AGI transition. Whilst this would be one way of ensuring the broad distribution of benefits, it hardly seems like the only, or even most promising route. Why should we assume that the world will have anything like a traditional economy after AGI?
Additionally, alignment can refer to either “intent alignment” or “alignment with human values”[1]. Your analysis seems to assume the former, I’d suggest flagging this explicitly if that’s what you mean. Where this most directly matters is the extent to which we are telling these machines what to do vs. autonomously making their own decisions, which affects the importance of solving problems manually.
- ^
Whatever that means
- ^
Just wanted to chime in with my current research direction.
I don’t exactly know the most important capabilities yet, but things like, advising on strategic decisions, improving co-ordination and non-manipulative communication seem important.
Thanks for clarifying. Still feels narrow as a primary focus.
Agreed. Simply focusing on physics post-docs feels too narrow to me.
Then again, just as John has a particular idea of what good alignment research looks like, I have my own idea: I would lean towards recruiting folk with both a technical and a philosophical background. It’s possible that my own idea is just as narrow.
Nice article, I especially love the diagrams!
In Human Researcher Obsolescence you note that we can’t completely hand over research unless we manage to produce agents that are at least as “wise” as the human developers.
I agree with this, though I would love to see a future version of this plan include an expanded analysis of the role that wise AI plays would play in the strategy of Magma, as I believe that this could be a key aspect of making this plan work.
In particular:
• We likely want to be developing wise AI advisors to advise us during the pre-hand-off period. In fact, I consider this likely to be vital to successfully navigating this period given the challenges involved.
• It’s possible that we might manage to completely automate the more objective components of research without managing to completely automating the more subjective components of research. That said, we likely want to train wise AI advisors to help us with the more subjective components even if we can’t defer to them.
• When developing AI capabilities, there’s an additional lever in terms of how much Magma focuses on direct capabilities vs. focusing on wisdom.