I began reading this charitably (unaware of whatever inside baseball is potentially going on, and seems to be alluded to), but to be honest struggled after “X” seemed to really want someone (Eliezer) to admit they’re “not smart”? I’m not sure why that would be relevant.
I’m not sure exactly what is meant, one guess is that it’s about centrality: making yourself more central (more making executive decisions, more of a bottleneck on approving things, more looked to as a leader by others, etc) makes more sense the more you’re more correct about relevant things relative to other people. Saying “oh, I was wrong about a lot, whoops” is the kind of thing someone might do before e.g. stepping down as project manager or CEO. If you think your philosophy has major problems and your replacements’ philosophies have fewer major problems, that might increase the chance of success.
I would guess this is comparable to what Eliezer is saying in this post about how some people should just avoid consequentialist reasoning because they’re too bad at it and unlikely to improve:
People like this should not be ‘consequentialists’ or ‘utilitarians’ as they understand those terms. They should back off from this form of reasoning that their mind is not naturally well-suited for processing in a native format, and stick to intuitively informally asking themselves what’s good or bad behavior, without any special focus on what they think are ‘outcomes’.
If they try to be consequentialists, they’ll end up as Hollywood villains describing some grand scheme that violates a lot of ethics and deontology but sure will end up having grandiose benefits, yup, even while everybody in the audience knows perfectly well that it won’t work. You can only safely be a consequentialist if you’re genre-savvy about that class of arguments—if you’re not the blind villain on screen, but the person in the audience watching who sees why that won’t work.
...
Is capability supposed to be hard for similar reasons as alignment? Can you expand/link? The only argument I can think of relating the two (which I think is a bad one) is “machines will have to solve their own alignment problem to become capable.”
Alignment is hard because it’s a quite general technical problem. You don’t just need to make the AI aligned in case X, you also have to make it aligned in cases Y and Z. To do this you need to create very general analysis and engineering tools that generalize across these situations.
Similarly, AGI is a quite general technical problem. You don’t just need to make an AI that can do narrow task X, it has to work in cases Y and Z too, or it will fall over and fail to take over the world at some point. To do this you need to create very general analysis and engineering tools that generalize across these situations.
For an intuition pump about this, imagine that LW’s effort towards making an aligned AI over the past ~14 years was instead directed at making AGI. We have records of certain mathematical formalisms people have come up with (e.g. UDT, logical induction). These tools are pretty far from enhancing AI capabilities. If the goal had been to enhance AI capabilities, they would have enhanced AI capabilities more, but still, the total amount of intellectual work that’s been completed is quite small compared to how much intellectual work would be required to build a working agent that generalizes across situations. The AI field has been at this for decades and has produced the results that it has, which are quite impressive in some domains, but still fail to generalize most of the time, and even what has been produced has required a lot of intellectual work spanning multiple academic fields and industries over decades. (Even if the field is inefficient in some ways, that would still imply that inefficiency is common, and LW seems to also be inefficient at solving AI-related technical problems compared to its potential.)
This would be a pretty useless machiavellian strategy, so I’m assuming you’re saying it’s happening for other reasons? Maybe self-deception? Can you explain?
I’m not locating all the intentionality for creating these bubbles in Eliezer, there are other people in the “scene” that promote memes and gain various benefits from them (see this dialogue, ctrl-f “billions”).
There’s a common motive to try to be important by claiming that one has unique skills to solve important problems, and pursuing that motive leads to stress because it involves creating implicit or explicit promises that are hard to fulfil (see e.g. Elizabeth Holmes), and telling people “hey actually, I can’t solve this” reduces the stress level and makes it easier to live a normal life.
This just made me go “wha” at first but my guess now is that this and the bits above it around speech recognition seem to be pointing at some AI winter-esque (or even tech stagnation) beliefs? Is this right?
I think what Ben means here is that access to large amounts of capital is anti-correlated with actually trying to solve difficult intellectual problems. This is the opposite of what would be predicted by the efficient market hypothesis.
The Debtors’ Revolt argues that college (which many, many more Americans have gone too than previously) primarily functions to cause people to correlate with each other, not to teach people epistemic and instrumental rationality. E.g. college educated people are more likely to immediately dismiss Vitamin D as a COVID health intervention (due to an impression of “expert consensus”) rather than forming an opinion based on reading some studies and doing probability calculations. One would by default expect epistemic/instrumental rationality to be well-correlated with income, for standard efficient market hypothesis reasons. However, if there is a massive amount of correlation among the “irrational” actors, they can reward each other, provide insurance to each other, commit violence in favor of their class (e.g. the 2008 bailouts), etc.
(On this model, a major reason large companies do the “train a single large, expensive model using standard techniques like transformers” is to create correlation in the form of a canonical way of spending resources to advance AI.)
Similarly, AGI is a quite general technical problem. You don’t just need to make an AI that can do narrow task X, it has to work in cases Y and Z too, or it will fall over and fail to take over the world at some point. To do this you need to create very general analysis and engineering tools that generalize across these situations.
I don’t think this is a valid argument. Counter-example: you could build an AGI by uploading a human brain onto an artificial substrate, and you don’t “need to create very general analysis and engineering tools that generalize across these situations” to do this.
More realistically, it seems pretty plausible that all of the necessary patterns/rules/heuristics/algorithms/forms of reasoning necessary for “being generally intelligent” can be found in human culture, and ML can distill these elements of general intelligence into a (language or multimodal) model that will then be generally intelligent. This also doesn’t seem to require very general analysis and engineering tools. What do you think of this possibility?
You’re right that the uploading case wouldn’t necessarily require strong algorithmic insight. However, it’s a kind of bounded technical problem that’s relatively easy to evaluate progress in relative to the difficulty, e.g. based on ability to upload smaller animal brains, so would lead to >40 year timelines absent large shifts in the field or large drivers of progress. It would also lead to a significant degree of alignment by default.
For copying culture, I think the main issue is that culture is a protocol that runs on human brains, not on computers. Analogously, there are Internet protocols saying things like “a SYN/ACK packet must follow a SYN packet”, but these are insufficient for understanding a human’s usage of the Internet. Copying these would lead to imitations, e.g. machines that correctly send SYN/ACK packets and produce semi-grammatical text but lack certain forms of understanding, especially connection to a surrounding “the real world” that is spaciotemporal etc.
If you don’t have logic yourself, you can look at a lot of logical content (e.g. math papers) without understanding logic. Most machines work by already working, not by searching over machine designs that fit a dataset.
Also in the cultural case, if it worked it would be decently aligned, since it could copy cultural reasoning about goodness. (The main reason I have for thinking cultural notions of goodness might be undesirable is thinking that, as stated above, culture is just a protocol and most of the relevant value processing happens in the brains, see this post.)
I’m not sure exactly what is meant, one guess is that it’s about centrality: making yourself more central (more making executive decisions, more of a bottleneck on approving things, more looked to as a leader by others, etc) makes more sense the more you’re more correct about relevant things relative to other people. Saying “oh, I was wrong about a lot, whoops” is the kind of thing someone might do before e.g. stepping down as project manager or CEO. If you think your philosophy has major problems and your replacements’ philosophies have fewer major problems, that might increase the chance of success.
I would guess this is comparable to what Eliezer is saying in this post about how some people should just avoid consequentialist reasoning because they’re too bad at it and unlikely to improve:
...
Alignment is hard because it’s a quite general technical problem. You don’t just need to make the AI aligned in case X, you also have to make it aligned in cases Y and Z. To do this you need to create very general analysis and engineering tools that generalize across these situations.
Similarly, AGI is a quite general technical problem. You don’t just need to make an AI that can do narrow task X, it has to work in cases Y and Z too, or it will fall over and fail to take over the world at some point. To do this you need to create very general analysis and engineering tools that generalize across these situations.
For an intuition pump about this, imagine that LW’s effort towards making an aligned AI over the past ~14 years was instead directed at making AGI. We have records of certain mathematical formalisms people have come up with (e.g. UDT, logical induction). These tools are pretty far from enhancing AI capabilities. If the goal had been to enhance AI capabilities, they would have enhanced AI capabilities more, but still, the total amount of intellectual work that’s been completed is quite small compared to how much intellectual work would be required to build a working agent that generalizes across situations. The AI field has been at this for decades and has produced the results that it has, which are quite impressive in some domains, but still fail to generalize most of the time, and even what has been produced has required a lot of intellectual work spanning multiple academic fields and industries over decades. (Even if the field is inefficient in some ways, that would still imply that inefficiency is common, and LW seems to also be inefficient at solving AI-related technical problems compared to its potential.)
I’m not locating all the intentionality for creating these bubbles in Eliezer, there are other people in the “scene” that promote memes and gain various benefits from them (see this dialogue, ctrl-f “billions”).
There’s a common motive to try to be important by claiming that one has unique skills to solve important problems, and pursuing that motive leads to stress because it involves creating implicit or explicit promises that are hard to fulfil (see e.g. Elizabeth Holmes), and telling people “hey actually, I can’t solve this” reduces the stress level and makes it easier to live a normal life.
I think what Ben means here is that access to large amounts of capital is anti-correlated with actually trying to solve difficult intellectual problems. This is the opposite of what would be predicted by the efficient market hypothesis.
The Debtors’ Revolt argues that college (which many, many more Americans have gone too than previously) primarily functions to cause people to correlate with each other, not to teach people epistemic and instrumental rationality. E.g. college educated people are more likely to immediately dismiss Vitamin D as a COVID health intervention (due to an impression of “expert consensus”) rather than forming an opinion based on reading some studies and doing probability calculations. One would by default expect epistemic/instrumental rationality to be well-correlated with income, for standard efficient market hypothesis reasons. However, if there is a massive amount of correlation among the “irrational” actors, they can reward each other, provide insurance to each other, commit violence in favor of their class (e.g. the 2008 bailouts), etc.
(On this model, a major reason large companies do the “train a single large, expensive model using standard techniques like transformers” is to create correlation in the form of a canonical way of spending resources to advance AI.)
I don’t think this is a valid argument. Counter-example: you could build an AGI by uploading a human brain onto an artificial substrate, and you don’t “need to create very general analysis and engineering tools that generalize across these situations” to do this.
More realistically, it seems pretty plausible that all of the necessary patterns/rules/heuristics/algorithms/forms of reasoning necessary for “being generally intelligent” can be found in human culture, and ML can distill these elements of general intelligence into a (language or multimodal) model that will then be generally intelligent. This also doesn’t seem to require very general analysis and engineering tools. What do you think of this possibility?
You’re right that the uploading case wouldn’t necessarily require strong algorithmic insight. However, it’s a kind of bounded technical problem that’s relatively easy to evaluate progress in relative to the difficulty, e.g. based on ability to upload smaller animal brains, so would lead to >40 year timelines absent large shifts in the field or large drivers of progress. It would also lead to a significant degree of alignment by default.
For copying culture, I think the main issue is that culture is a protocol that runs on human brains, not on computers. Analogously, there are Internet protocols saying things like “a SYN/ACK packet must follow a SYN packet”, but these are insufficient for understanding a human’s usage of the Internet. Copying these would lead to imitations, e.g. machines that correctly send SYN/ACK packets and produce semi-grammatical text but lack certain forms of understanding, especially connection to a surrounding “the real world” that is spaciotemporal etc.
If you don’t have logic yourself, you can look at a lot of logical content (e.g. math papers) without understanding logic. Most machines work by already working, not by searching over machine designs that fit a dataset.
Also in the cultural case, if it worked it would be decently aligned, since it could copy cultural reasoning about goodness. (The main reason I have for thinking cultural notions of goodness might be undesirable is thinking that, as stated above, culture is just a protocol and most of the relevant value processing happens in the brains, see this post.)
Thanks so much for the one-paragraph summary of The Debtors’ Revolt, that was clarifying.