I’ve multiple times been perplexed as to what the past events which can lead to this kind of take (over 7 years ago, EA/Rationality community’s influence probably accelerated openAI’s creation) have to do with today’s shutting down of the offices. Are there current, present day things going on in the EA and rationality community which you think warrant suspecting them of being incredibly net negative (causing worse worlds, conditioned on the current setup)? Things done in the last 6 months ? At Lightcone Offices ? (Though I’d appreciate specific examples, I’d already greatly appreciate knowing if there is something in the abstract and prefer a quick response to that level of precision than nothing)
I’ve imagined an answer, is the following on your mind ? “EAs are more about saying they care about numbers than actually caring about numbers, and didn’t calculate downside risk enough in the past. The past events reveal this attitude and because it’s not expected to have changed, we can expect it to still be affecting current EAs, who will continue causing great harm because of not actually caring for downside risk enough. ”
Are there current, present day things going on in the EA and rationality community which you think warrant suspecting them of being incredibly net negative (causing worse worlds, conditioned on the current setup)? Things done in the last 6 months ?
I mean yes! Don’t I mention a lot of them in the post above?
I mean FTX happened in the last 6 months! That caused incredibly large harm for the world.
OpenAI and Anthropic are two of the most central players in an extremely bad AI arms race that is causing enormous harm. I really feel like it doesn’t take a lot of imagination to think about how our extensive involvement in those organizations could be bad for the world. And a huge component of the Lightcone Offices was causing people to work at those organizations, as well as support them in various other ways.
EAs are more about saying they care about numbers than actually caring about numbers, and didn’t calculate downside risk enough in the past. The past events reveal this attitude and because it’s not expected to have changed, we can expect it to still be affecting current EAs, who will continue causing great harm because of not actually caring for downside risk enough.
No, this does not characterize my opinion very well. I don’t think “worrying about downside risk” is a good pointer to what I think will help, and I wouldn’t characterize the problem that people have spent too little effort or too little time on worrying about downside risk. I think people do care about downside risk, I just also think there are consistent and predictable biases that cause people to be unable to stop, or be unable to properly notice certain types of downside risk, though that statement feels in my mind kind of vacuous and like it just doesn’t capture the vast majority of the interesting detail of my model.
The main reason I didn’t understand (despite some things being listed) is I assumed none of that was happening at Lightcone (because I guessed you would filter out EAs with bad takes in favor of rationalists for example). The fact that some people in EA (a huge broad community) are probably wrong about some things didn’t seem to be an argument that Lightcone Offices would be ineffective as (AFAIK) you could filter people at your discretion.
More specifically, I had no idea “a huge component of the Lightcone Offices was causing people to work at those organizations”. That’s strikingly more of a debatable move but I’m curious why that happened in the first place ? In my field building in France we talk of x-risk and alignment and people don’t want to accelerate the labs but do want to slow down or do alignment work. I feel a bit preachy here but internally it just feels like the obvious move is “stop doing the probably bad thing”, but I do understand if you got in this situation unexpectedly that you’ll have a better chance burning this place down and creating a fresh one with better norms.
Overall I get a weird feeling of “the people doing bad stuff are being protected again, we should name more precisely who’s doing the bad stuff and why we think it’s bad” (because I feel aimed at by vague descriptions like field-building, even though I certainly don’t feel like I contributed to any of the bad stuff being pointed at)
No, this does not characterize my opinion very well. I don’t think “worrying about downside risk” is a good pointer to what I think will help, and I wouldn’t characterize the problem that people have spent too little effort or too little time on worrying about downside risk. I think people do care about downside risk, I just also think there are consistent and predictable biases that cause people to be unable to stop, or be unable to properly notice certain types of downside risk, though that statement feels in my mind kind of vacuous and like it just doesn’t capture the vast majority of the interesting detail of my model.
So it’s not a problem of not caring, but of not succeeding at the task. I assume the kind of errors you’re pointing at are things which should happen less with more practiced rationalists ? I guess then we can either filter to only have people who are already pretty good rationalists, or train them (I don’t know if there are good results on that side per CFAR).
The fact that some people in EA (a huge broad community) are probably wrong about some things didn’t seem to be an argument that Lightcone Offices would be ineffective as (AFAIK) you could filter people at your discretion.
I mean, no, we were specifically trying to support theEA community, we do not get to unilaterally decide who is part of the community. People I don’t personally have much respect for but are members of the EA community who are putting in the work to be considered members in good standing definitely get to pass through. I’m not going as far as to say this was the only thing going on, I made choices about which parts of the movement seemed like they were producing good work and acting ethically and which parts seemed pretty horrendous and to be avoided, but I would (for instance) regularly make an attempt to welcome people from an area that seemed to have poor connections in the social graph (e.g. the first EA from country X, from org Y, from area-of-work Z etc), even if I wasn’t excited about that person or place or area, because it was part of the EA community and it seems very valuable for the community as a whole to have better interconnectedness between the disparate parts. Overall I think the question I asked was closer to “what would a good custodian of the EA community want to use these resources for” rather than “what would Ben or Lightcone want to use these resources for”.
As to your confusion about the office, an analogy that might help here is to consider the marketing or recruitment part of a large company, or perhaps a branch of the company that makes a different product from the rest — yes, our part of the organization functioned nicely, and I liked the choices we made, but if some other part of the company is screwing over its customers/staff, or the CEO is stealing money, or the company’s product seems unethical to me, it doesn’t matter if I like my part of the company, I am contributing to the company’s life and output and should act accordingly. I did not work at FTX, I have not worked for OpenAI, but I am heavily supporting an ecosystem that supported these companies, and I anticipate that the resources I contribute will continue to get captured by these sorts of players via some circuitous route.
I mean FTX happened in the last 6 months! That caused incredibly large harm for the world.
I agree, but I have very different takeaways on what FTX means for the Rationalist community.
I think the major takeaway is that human society is somewhat more adequate, relative to our values than we think, and this matters.
To be blunt, FTX was always a fraud, because Bitcoin and cryptocurrency violated a fundamental axiom of good money: It’s value must be stable, or at least slowly change, and it’s not a good store of value due to the wildly unstable price of say a single Bitcoin or cryptocurrency, and the issue is the deeply stupid idea of fixing the supply, which combined with variable demand, led to wild price swings.
It’s possible to salvage some value out of crypto, but they can’t be tied to real money.
Most groups have way better ideas for money than Bitcoin and cryptocurrency.
OpenAI and Anthropic are two of the most central players in an extremely bad AI arms race that is causing enormous harm. I really feel like it doesn’t take a lot of imagination to think about how our extensive involvement in those organizations could be bad for the world. And a huge component of the Lightcone Offices was causing people to work at those organizations, as well as support them in various other ways.
I don’t agree, in this world, and this is related to a very important crux in AI Alignment/AI safety: Can it be solved solely via iteration and empirical work? My answer is yes, and one of the biggest examples is Pretraining from Human Feedback, and I’ll explain why it’s the first real breakthrough of empirical alignment:
It almost completely avoids deceptive alignment via the fact that it lets us specify the base goal as human values first before it has the generalization capabilities, and the goal is pretty simple and myopic, so simplicity bias doesn’t have as much incentive to make the model deceptively aligned. Basically, we first pretrain the base goal, which is way more outer aligned than the standard MLE goal, and then we let the AI generalize, and this inverts the order of alignment and capabilities, where RLHF and other alignment solutions first give capabilities, then try to align the model. This is of course not going to work all that well compared to PHF. In particular, it means that more capabilities means better and better inner alignment by default.
The goal that was best for pretraining from human feedback, conditional training, has a number of outer alignment benefits compared to RLHF and fine-tuning, even without inner alignment being effectively solved and preventing deceptive alignment.
One major benefit is since it’s offline training, there is never a way for any model to affect the distribution of data that we use for alignment, so there’s never a way or incentive to gradient hack or shift the distribution. In essence, we avoid embedded agency problems by recreating a Cartesian boundary that actually works in an embedded setting. While it will likely fade away in time, we only need to have it work once, and then we can dispense with the Cartesian boundary.
Again, this shows increasing alignment with scale, which is good because we found the holy grail of alignment: A competitive alignment scheme that scales well with model data and allows you to crank capabilities up and get better and better results from alignment.
Finally, I don’t think you realize how well we did in getting companies to care about alignment, our how good the fact that LLMs are being pursued first compared to RL first, which means we can have simulators before agentic systems arise.
I’ve multiple times been perplexed as to what the past events which can lead to this kind of take (over 7 years ago, EA/Rationality community’s influence probably accelerated openAI’s creation) have to do with today’s shutting down of the offices.
Are there current, present day things going on in the EA and rationality community which you think warrant suspecting them of being incredibly net negative (causing worse worlds, conditioned on the current setup)? Things done in the last 6 months ? At Lightcone Offices ? (Though I’d appreciate specific examples, I’d already greatly appreciate knowing if there is something in the abstract and prefer a quick response to that level of precision than nothing)
I’ve imagined an answer, is the following on your mind ?
“EAs are more about saying they care about numbers than actually caring about numbers, and didn’t calculate downside risk enough in the past. The past events reveal this attitude and because it’s not expected to have changed, we can expect it to still be affecting current EAs, who will continue causing great harm because of not actually caring for downside risk enough. ”
I mean yes! Don’t I mention a lot of them in the post above?
I mean FTX happened in the last 6 months! That caused incredibly large harm for the world.
OpenAI and Anthropic are two of the most central players in an extremely bad AI arms race that is causing enormous harm. I really feel like it doesn’t take a lot of imagination to think about how our extensive involvement in those organizations could be bad for the world. And a huge component of the Lightcone Offices was causing people to work at those organizations, as well as support them in various other ways.
No, this does not characterize my opinion very well. I don’t think “worrying about downside risk” is a good pointer to what I think will help, and I wouldn’t characterize the problem that people have spent too little effort or too little time on worrying about downside risk. I think people do care about downside risk, I just also think there are consistent and predictable biases that cause people to be unable to stop, or be unable to properly notice certain types of downside risk, though that statement feels in my mind kind of vacuous and like it just doesn’t capture the vast majority of the interesting detail of my model.
Thanks for the reply !
The main reason I didn’t understand (despite some things being listed) is I assumed none of that was happening at Lightcone (because I guessed you would filter out EAs with bad takes in favor of rationalists for example). The fact that some people in EA (a huge broad community) are probably wrong about some things didn’t seem to be an argument that Lightcone Offices would be ineffective as (AFAIK) you could filter people at your discretion.
More specifically, I had no idea “a huge component of the Lightcone Offices was causing people to work at those organizations”. That’s strikingly more of a debatable move but I’m curious why that happened in the first place ? In my field building in France we talk of x-risk and alignment and people don’t want to accelerate the labs but do want to slow down or do alignment work. I feel a bit preachy here but internally it just feels like the obvious move is “stop doing the probably bad thing”, but I do understand if you got in this situation unexpectedly that you’ll have a better chance burning this place down and creating a fresh one with better norms.
Overall I get a weird feeling of “the people doing bad stuff are being protected again, we should name more precisely who’s doing the bad stuff and why we think it’s bad” (because I feel aimed at by vague descriptions like field-building, even though I certainly don’t feel like I contributed to any of the bad stuff being pointed at)
So it’s not a problem of not caring, but of not succeeding at the task. I assume the kind of errors you’re pointing at are things which should happen less with more practiced rationalists ? I guess then we can either filter to only have people who are already pretty good rationalists, or train them (I don’t know if there are good results on that side per CFAR).
I mean, no, we were specifically trying to support the EA community, we do not get to unilaterally decide who is part of the community. People I don’t personally have much respect for but are members of the EA community who are putting in the work to be considered members in good standing definitely get to pass through. I’m not going as far as to say this was the only thing going on, I made choices about which parts of the movement seemed like they were producing good work and acting ethically and which parts seemed pretty horrendous and to be avoided, but I would (for instance) regularly make an attempt to welcome people from an area that seemed to have poor connections in the social graph (e.g. the first EA from country X, from org Y, from area-of-work Z etc), even if I wasn’t excited about that person or place or area, because it was part of the EA community and it seems very valuable for the community as a whole to have better interconnectedness between the disparate parts. Overall I think the question I asked was closer to “what would a good custodian of the EA community want to use these resources for” rather than “what would Ben or Lightcone want to use these resources for”.
As to your confusion about the office, an analogy that might help here is to consider the marketing or recruitment part of a large company, or perhaps a branch of the company that makes a different product from the rest — yes, our part of the organization functioned nicely, and I liked the choices we made, but if some other part of the company is screwing over its customers/staff, or the CEO is stealing money, or the company’s product seems unethical to me, it doesn’t matter if I like my part of the company, I am contributing to the company’s life and output and should act accordingly. I did not work at FTX, I have not worked for OpenAI, but I am heavily supporting an ecosystem that supported these companies, and I anticipate that the resources I contribute will continue to get captured by these sorts of players via some circuitous route.
I agree, but I have very different takeaways on what FTX means for the Rationalist community.
I think the major takeaway is that human society is somewhat more adequate, relative to our values than we think, and this matters.
To be blunt, FTX was always a fraud, because Bitcoin and cryptocurrency violated a fundamental axiom of good money: It’s value must be stable, or at least slowly change, and it’s not a good store of value due to the wildly unstable price of say a single Bitcoin or cryptocurrency, and the issue is the deeply stupid idea of fixing the supply, which combined with variable demand, led to wild price swings.
It’s possible to salvage some value out of crypto, but they can’t be tied to real money.
Most groups have way better ideas for money than Bitcoin and cryptocurrency.
I don’t agree, in this world, and this is related to a very important crux in AI Alignment/AI safety: Can it be solved solely via iteration and empirical work? My answer is yes, and one of the biggest examples is Pretraining from Human Feedback, and I’ll explain why it’s the first real breakthrough of empirical alignment:
It almost completely avoids deceptive alignment via the fact that it lets us specify the base goal as human values first before it has the generalization capabilities, and the goal is pretty simple and myopic, so simplicity bias doesn’t have as much incentive to make the model deceptively aligned. Basically, we first pretrain the base goal, which is way more outer aligned than the standard MLE goal, and then we let the AI generalize, and this inverts the order of alignment and capabilities, where RLHF and other alignment solutions first give capabilities, then try to align the model. This is of course not going to work all that well compared to PHF. In particular, it means that more capabilities means better and better inner alignment by default.
The goal that was best for pretraining from human feedback, conditional training, has a number of outer alignment benefits compared to RLHF and fine-tuning, even without inner alignment being effectively solved and preventing deceptive alignment.
One major benefit is since it’s offline training, there is never a way for any model to affect the distribution of data that we use for alignment, so there’s never a way or incentive to gradient hack or shift the distribution. In essence, we avoid embedded agency problems by recreating a Cartesian boundary that actually works in an embedded setting. While it will likely fade away in time, we only need to have it work once, and then we can dispense with the Cartesian boundary.
Again, this shows increasing alignment with scale, which is good because we found the holy grail of alignment: A competitive alignment scheme that scales well with model data and allows you to crank capabilities up and get better and better results from alignment.
Here’s a link if you’re interested:
https://www.lesswrong.com/posts/8F4dXYriqbsom46x5/pretraining-language-models-with-human-preferences
Finally, I don’t think you realize how well we did in getting companies to care about alignment, our how good the fact that LLMs are being pursued first compared to RL first, which means we can have simulators before agentic systems arise.