Having just seen this paper and still recovering from Dalle-2 and Palm and then re-reading Eliezer’s now incredibly prescient dying with dignity post I really have to ask: What are we supposed to do? I myself work on ML in a fairly boring corporate capacity and when reading these papers and posts I get a massive urge to drop everything and do something equivalent to a PhD in Alignment but the timelines that seem to be becoming possible now make that seem like a totally pointless exercise, I’d be writing my Dissertation as nanobots liquify my body into raw materials for paper clip manufacturing. Do we just carry on and hope someone somewhere stumbles upon a miracle solution and we happen to have enough heads in the space to implement it? Do I tell my partner we can’t have kids because the probability they will be born into some unknowable hellscape is far too high? Do I become a prepper and move to a cabin in the woods? I’m actually at a loss on how to proceed and frankly Eliezers article made things muddier for me.
As I understand it, the empirical ML alignment community is bottlenecked on good ML engineers, and so people with your stated background without any further training are potentially very valuable in alignment!
Lots of other positions at Jobs in AI safety & policy − 80,000 Hours too! E.g., from the Fund for Alignment Research and Aligned AI. But note that the 80,000 Hours jobs board lists positions from OpenAI, DeepMind, Baidu, etc. which aren’t actually alignment-related.
Things are a lot easier for me, given that I know that I couldn’t contribute to Alignment research directly, and the other option, monetarily, is at least not bottlenecked by money so much as prime talent. A doctor unfortunate enough to reside in the Third World, who happens to have emigration plans and a large increase in absolute discretionary income that will only pay off in tens of years has little scope to do more than signal boost.
As such, I intend to live the rest of my life primarily as a hedge against the world in which AGI isn’t imminent in the coming decade or two, and do all the usual things humans do, like keeping a job, having fun, raising a family.
That’s despite the fact that I think it’s more likely than not that I or my kids won’t make it out of the 21st century, but at the least it’ll be a quick and painless death, with the dispassionate dispatch of a bulldozer running over an anthill, not any actual malice.
Outright sadism is unlikely to be a terminal or contingent goal for any AGI we make, however unaligned; and I doubt that the life expectancy of anyone on a planet rapidly being disassembled for parts will be large enough for serious suffering. In slower circumstances, such as an Aligned AI that only caters to the needs of a cabal of creators, leaving the rest of us to starve, I have enough confidence that I can make the end quick.
Thus, I’ve made my peace with likely joining the odd 97 billion anatomically modern humans in oblivion, plus another 8 or 9 concurrently departing with me, but it doesn’t really spark anxiety or despair. It’s good to be alive, and I probably wouldn’t prefer to have been born at any earlier a time in history. Hoping for the best and expecting the worst really, assuming your psyche can handle it.
Then again, I’m not you, and someone with a decent foundation in ML is also in the 0.01% of people who could feasibly make an impact in the time we have, and I selfishly hope that you can do what I never could. And if not, at least enjoy the time you have!
Thanks for the reflection, it is how a part of me feels (I usually never post on LessWrong, being just a lurker, but your comment inspired me a bit).
Actually, I do have some background that could, maybe, be useful in alignment, and I did just complete the AGISF program. Right now, I’m applying to some positions (particularly, I’m focusing now on the SERIMATS application, which is an area that I may be differentially talented), and just honestly trying to do my best. After all, it would be outrageous if I could do something, but I simply did not.
But I recognize the possibility that I’m simply not good enough, and there is no way for me to actually do anything beyond just, as you said, signal boosting, so I can introduce more capable people into the field, while living my life and hoping that Humanity solves this.
But, if Humanity does not, well, it is what it is. There was the dream of success, and building a future Utopia, with future technology facilitated by aligned AI, but that may have been just that, a dream. Maybe alignment is unsolvable, and is the natural order of any advanced civilization to destroy itself by its own AI. Or maybe alignment is solvable, but given the incentives of our world as they are, it was always a fact that unsafe AGI would be created before we would solve alignment.
Or maybe, we will solve alignment in the end, or we were all wrong about the risks from AI in the first place.
As for me, for now, I’m going to keep trying, keep studying, just because, if the world comes to an end, I don’t want to conclude that I could’ve done more. While hoping that I never have to wonder about that in the first place.
EDIT: To be clear, I’m not that sure about short timelines, in the sense that, insofar I know (and I may be very, very wrong), the AGIs we are creating right now don’t seem to be very agentic, and it may be that creating agency from current techniques is much harder than creating general intelligence. But again, “not so sure” is something like 20%-30% chance of timelines being really short, so the point mostly stands.
Develop a training set for alignment via brute force. We can’t defer alignment to the ubernerds. If enough ordinary people (millions? tens of millions?) contribute billions or trillions of tokens, maybe we can increase the chance of alignment. It’s almost like we need to offer prayers of kindness and love to the future AGI: writing alignment essays of kindness that are posted to reddit, or videos extolling the virtue of love that are uploaded to youtube.
Primarily talking about it in rat-adjacent communities that are both open to such discussion, but also contain a large number of people who aren’t immersed in AI X-risk. A pertinent example would be either the SSC subreddit or its spinoff, The Motte.
The ideal target is someone with the intellectual curiosity to want to know more about such matters, while also not having encountered them beyond glancing summaries. Below that threshold, people are hard to sway because they’re going off the usual pop culture tropes about AI, and significantly above that, you have the LW crowd, and me trying to teach them anything novel would be trying to teach my grandma to suck eggs.
If I can find people who are mildly aware of such possibilities, then it’s easier to dispel any particular misconceptions they have, such as the tendency to anthromorphize AI, the question of “why not shut it off” etc. Showing them the blistering pace of progress in ML is a reliable eye-opener in my experience.
Engaging with naysayers is also effective, there’s a certain stentorian type who not only has said misunderstandings, but loudly shares them to dismiss X-risk altogether. Dismantling such arguments is always good, even if the odds of convincing them are minimal. There’s always a crowd of undecided but curious people who are somewhat swayed.
There’s also the topic of automation-induced unemployment, which is what I usually bring up in medical circles that would otherwise be baffled by AI X-risk. That’s the most concrete and imminent danger any skilled professional faces, even if the current timelines indicate that the period between the widespread adoption of near-human AI and actual Superhuman AGI is going to be tiny.
That’s about as much as I can do, I don’t have the money to donate anything but pocket change, and my access to high-flying ML engineers is mostly restricted to this very forum. I’m acutely aware that I’m not good enough at math to produce original work in the field, so given those constraints, I consider it a victory if I can sway people wealthier and better positioned by virtue of living in the First World on the matter!
Regarding the arguments for doom, they are quite logical, but they don’t quite have the same confidence as e.g. an argument that if you are in a burning, collapsing building, your life is in peril. There are a few too many profound unknowns that have a bearing on the consequences of superhuman AI, to know that the default outcome really is the equivalent of a paperclip maximizer.
However, I definitely agree that that is a very logical scenario, and also that the human race (or the portion of it that works on AI) is taking a huge gamble by pushing towards superhuman AI, without making its central priority that this superhuman AI is ‘friendly’ or ‘aligned’.
In that regard, I keep saying that the best plan I have seen, is June Ku’s “meta-ethical AI”, which falls into the category of AI proposals that construct an overall goal by aggregating idealized versions of the current goals of all human individuals. I want to make a post about it, but I haven’t had time… So I would suggest, check it out, and see if you can contribute technically or critically or by spreading awareness of this kind of proposal.
Having just seen this paper and still recovering from Dalle-2 and Palm and then re-reading Eliezer’s now incredibly prescient dying with dignity post I really have to ask: What are we supposed to do? I myself work on ML in a fairly boring corporate capacity and when reading these papers and posts I get a massive urge to drop everything and do something equivalent to a PhD in Alignment but the timelines that seem to be becoming possible now make that seem like a totally pointless exercise, I’d be writing my Dissertation as nanobots liquify my body into raw materials for paper clip manufacturing. Do we just carry on and hope someone somewhere stumbles upon a miracle solution and we happen to have enough heads in the space to implement it? Do I tell my partner we can’t have kids because the probability they will be born into some unknowable hellscape is far too high? Do I become a prepper and move to a cabin in the woods? I’m actually at a loss on how to proceed and frankly Eliezers article made things muddier for me.
As I understand it, the empirical ML alignment community is bottlenecked on good ML engineers, and so people with your stated background without any further training are potentially very valuable in alignment!
I agree. You can even get career advice here at https://www.aisafetysupport.org/resources/career-coaching
Or feel free to message me for a short call. I bet you could get paid to do alignment work, so it’s worth looking into at least.
What’s the best job board for that kind of job?
You should take a look at Anthropic and Redwood’s careers pages for engineer roles!
Lots of other positions at Jobs in AI safety & policy − 80,000 Hours too! E.g., from the Fund for Alignment Research and Aligned AI. But note that the 80,000 Hours jobs board lists positions from OpenAI, DeepMind, Baidu, etc. which aren’t actually alignment-related.
Things are a lot easier for me, given that I know that I couldn’t contribute to Alignment research directly, and the other option, monetarily, is at least not bottlenecked by money so much as prime talent. A doctor unfortunate enough to reside in the Third World, who happens to have emigration plans and a large increase in absolute discretionary income that will only pay off in tens of years has little scope to do more than signal boost.
As such, I intend to live the rest of my life primarily as a hedge against the world in which AGI isn’t imminent in the coming decade or two, and do all the usual things humans do, like keeping a job, having fun, raising a family.
That’s despite the fact that I think it’s more likely than not that I or my kids won’t make it out of the 21st century, but at the least it’ll be a quick and painless death, with the dispassionate dispatch of a bulldozer running over an anthill, not any actual malice.
Outright sadism is unlikely to be a terminal or contingent goal for any AGI we make, however unaligned; and I doubt that the life expectancy of anyone on a planet rapidly being disassembled for parts will be large enough for serious suffering. In slower circumstances, such as an Aligned AI that only caters to the needs of a cabal of creators, leaving the rest of us to starve, I have enough confidence that I can make the end quick.
Thus, I’ve made my peace with likely joining the odd 97 billion anatomically modern humans in oblivion, plus another 8 or 9 concurrently departing with me, but it doesn’t really spark anxiety or despair. It’s good to be alive, and I probably wouldn’t prefer to have been born at any earlier a time in history. Hoping for the best and expecting the worst really, assuming your psyche can handle it.
Then again, I’m not you, and someone with a decent foundation in ML is also in the 0.01% of people who could feasibly make an impact in the time we have, and I selfishly hope that you can do what I never could. And if not, at least enjoy the time you have!
Thanks for the reflection, it is how a part of me feels (I usually never post on LessWrong, being just a lurker, but your comment inspired me a bit).
Actually, I do have some background that could, maybe, be useful in alignment, and I did just complete the AGISF program. Right now, I’m applying to some positions (particularly, I’m focusing now on the SERIMATS application, which is an area that I may be differentially talented), and just honestly trying to do my best. After all, it would be outrageous if I could do something, but I simply did not.
But I recognize the possibility that I’m simply not good enough, and there is no way for me to actually do anything beyond just, as you said, signal boosting, so I can introduce more capable people into the field, while living my life and hoping that Humanity solves this.
But, if Humanity does not, well, it is what it is. There was the dream of success, and building a future Utopia, with future technology facilitated by aligned AI, but that may have been just that, a dream. Maybe alignment is unsolvable, and is the natural order of any advanced civilization to destroy itself by its own AI. Or maybe alignment is solvable, but given the incentives of our world as they are, it was always a fact that unsafe AGI would be created before we would solve alignment.
Or maybe, we will solve alignment in the end, or we were all wrong about the risks from AI in the first place.
As for me, for now, I’m going to keep trying, keep studying, just because, if the world comes to an end, I don’t want to conclude that I could’ve done more. While hoping that I never have to wonder about that in the first place.
EDIT: To be clear, I’m not that sure about short timelines, in the sense that, insofar I know (and I may be very, very wrong), the AGIs we are creating right now don’t seem to be very agentic, and it may be that creating agency from current techniques is much harder than creating general intelligence. But again, “not so sure” is something like 20%-30% chance of timelines being really short, so the point mostly stands.
Develop a training set for alignment via brute force. We can’t defer alignment to the ubernerds. If enough ordinary people (millions? tens of millions?) contribute billions or trillions of tokens, maybe we can increase the chance of alignment. It’s almost like we need to offer prayers of kindness and love to the future AGI: writing alignment essays of kindness that are posted to reddit, or videos extolling the virtue of love that are uploaded to youtube.
What’s your plan for signal boosting?
Primarily talking about it in rat-adjacent communities that are both open to such discussion, but also contain a large number of people who aren’t immersed in AI X-risk. A pertinent example would be either the SSC subreddit or its spinoff, The Motte.
The ideal target is someone with the intellectual curiosity to want to know more about such matters, while also not having encountered them beyond glancing summaries. Below that threshold, people are hard to sway because they’re going off the usual pop culture tropes about AI, and significantly above that, you have the LW crowd, and me trying to teach them anything novel would be trying to teach my grandma to suck eggs.
If I can find people who are mildly aware of such possibilities, then it’s easier to dispel any particular misconceptions they have, such as the tendency to anthromorphize AI, the question of “why not shut it off” etc. Showing them the blistering pace of progress in ML is a reliable eye-opener in my experience.
Engaging with naysayers is also effective, there’s a certain stentorian type who not only has said misunderstandings, but loudly shares them to dismiss X-risk altogether. Dismantling such arguments is always good, even if the odds of convincing them are minimal. There’s always a crowd of undecided but curious people who are somewhat swayed.
There’s also the topic of automation-induced unemployment, which is what I usually bring up in medical circles that would otherwise be baffled by AI X-risk. That’s the most concrete and imminent danger any skilled professional faces, even if the current timelines indicate that the period between the widespread adoption of near-human AI and actual Superhuman AGI is going to be tiny.
That’s about as much as I can do, I don’t have the money to donate anything but pocket change, and my access to high-flying ML engineers is mostly restricted to this very forum. I’m acutely aware that I’m not good enough at math to produce original work in the field, so given those constraints, I consider it a victory if I can sway people wealthier and better positioned by virtue of living in the First World on the matter!
That seems like an excellent strategy and I’m glad someone is focusing on that. Would you be interested in chatting about this sometime?
Absolutely! I haven’t used the messaging features here much, but I’m open to a conversation in any medium of your choice.
Regarding the arguments for doom, they are quite logical, but they don’t quite have the same confidence as e.g. an argument that if you are in a burning, collapsing building, your life is in peril. There are a few too many profound unknowns that have a bearing on the consequences of superhuman AI, to know that the default outcome really is the equivalent of a paperclip maximizer.
However, I definitely agree that that is a very logical scenario, and also that the human race (or the portion of it that works on AI) is taking a huge gamble by pushing towards superhuman AI, without making its central priority that this superhuman AI is ‘friendly’ or ‘aligned’.
In that regard, I keep saying that the best plan I have seen, is June Ku’s “meta-ethical AI”, which falls into the category of AI proposals that construct an overall goal by aggregating idealized versions of the current goals of all human individuals. I want to make a post about it, but I haven’t had time… So I would suggest, check it out, and see if you can contribute technically or critically or by spreading awareness of this kind of proposal.