Mostly I’d agree with this, but I think there needs to be a bit of caution and balance around:
How do we get more streams of evidence? By making productive mistakes. By attempting to leverage weird analogies and connections, and iterating on them. We should obviously recognize that most of this will be garbage, but you’ll be surprised how many brilliant ideas in the history of science first looked like, or were, garbage.
Do we want variety? Absolutely: worlds where things work out well likely correlate strongly with finding a variety of approaches.
However, there’s some risk in Do(increase variety). The ideal is that we get many researchers thinking about the problem in a principled way, and variety happens. If we intentionally push too much for variety, we may end up with a lot of wacky approaches that abandoned too much principled thinking too early. (I think I’ve been guilty of this at times)
That said, I fully agree with the goal of finding a variety of approaches. It’s just rather less clear to me how much an individual researcher should be thinking in terms of boosting variety. (it’s very clear that there should be spaces that provide support for finding different approaches, so I’m entirely behind that; currently it’s much more straightforward to work on existing ideas than to work on genuinely new ones)
Certainly many great ideas initially looked like garbage—but I’ll wager a lot of garbage initially looked like garbage too. I’d be interested in knowing more about the hidden-greatness-garbage: did it tend to have any common recognisable qualities at the time? Did it tend to emerge from processes with common recognisable qualities? In environments with shared qualities?...
It’s also clear when reading these works and interacting with these researchers that they all get how alignment is about dealing with unbounded optimization, they understand fundamental problems and ideas related to instrumental convergence, the security mindset, the fragility of value, the orthogonality thesis…
I bet Adam will argue about this (or something similar) is the minimal we want for a research idea, because I agree with your idea that we shouldn’t expect solution to alignment to fall out of the marketing program for Oreos. We want to constrain it to at least “has a plausible story on reducing x-risk” and maybe what’s mentioned in the quote as well.
For sure I agree that the researcher knowing these things is a good start—so getting as many potential researchers to grok these things is important.
My question is about which ideas researchers should focus on generating/elaborating given that they understand these things. We presumably don’t want to restrict thinking to ideas that may overcome all these issues—since we want to use ideas that fail in some respects, but have some aspect that turns out to be useful.
Generating a broad variety of new ideas is great, and we don’t want to be too quick in throwing out those that miss the target. The thing I’m unclear about is something like:
What target(s) do I aim for if I want to generate the set of ideas with greatest value?
I don’t think that “Aim for full alignment solution” is the right target here. I also don’t think that “Aim for wacky long-shots” is the right target—and of course I realize that Adam isn’t suggesting this. (we might find ideas that look like wacky long-shots from outside, but we shouldn’t be aiming for wacky long-shots)
But I don’t have a clear sense of what target I would aim for (or what process I’d use, what environment I’d set up, what kind of people I’d involve...), if my goal were specifically to generate promising ideas (rather than to work on them long-term, or to generate ideas that I could productively work on).
Another disanalogy with previous research/invention… is that we need to solve this particular problem. So in some sense a history of: [initially garbage-looking-idea] ---> [important research problem solved] may not be relevant.
What we need is: [initially garbage-looking-idea generated as attempt to solve x] ---> [x was solved] It’s not good enough if we find ideas that are useful for something, they need to be useful for this.
I expect the kinds of processes that work well to look different from those used where there’s no fixed problem.
Mostly I’d agree with this, but I think there needs to be a bit of caution and balance around:
Do we want variety? Absolutely: worlds where things work out well likely correlate strongly with finding a variety of approaches.
However, there’s some risk in Do(increase variety). The ideal is that we get many researchers thinking about the problem in a principled way, and variety happens. If we intentionally push too much for variety, we may end up with a lot of wacky approaches that abandoned too much principled thinking too early. (I think I’ve been guilty of this at times)
That said, I fully agree with the goal of finding a variety of approaches. It’s just rather less clear to me how much an individual researcher should be thinking in terms of boosting variety. (it’s very clear that there should be spaces that provide support for finding different approaches, so I’m entirely behind that; currently it’s much more straightforward to work on existing ideas than to work on genuinely new ones)
Certainly many great ideas initially looked like garbage—but I’ll wager a lot of garbage initially looked like garbage too. I’d be interested in knowing more about the hidden-greatness-garbage: did it tend to have any common recognisable qualities at the time? Did it tend to emerge from processes with common recognisable qualities? In environments with shared qualities?...
I bet Adam will argue about this (or something similar) is the minimal we want for a research idea, because I agree with your idea that we shouldn’t expect solution to alignment to fall out of the marketing program for Oreos. We want to constrain it to at least “has a plausible story on reducing x-risk” and maybe what’s mentioned in the quote as well.
For sure I agree that the researcher knowing these things is a good start—so getting as many potential researchers to grok these things is important.
My question is about which ideas researchers should focus on generating/elaborating given that they understand these things. We presumably don’t want to restrict thinking to ideas that may overcome all these issues—since we want to use ideas that fail in some respects, but have some aspect that turns out to be useful.
Generating a broad variety of new ideas is great, and we don’t want to be too quick in throwing out those that miss the target. The thing I’m unclear about is something like:
What target(s) do I aim for if I want to generate the set of ideas with greatest value?
I don’t think that “Aim for full alignment solution” is the right target here.
I also don’t think that “Aim for wacky long-shots” is the right target—and of course I realize that Adam isn’t suggesting this.
(we might find ideas that look like wacky long-shots from outside, but we shouldn’t be aiming for wacky long-shots)
But I don’t have a clear sense of what target I would aim for (or what process I’d use, what environment I’d set up, what kind of people I’d involve...), if my goal were specifically to generate promising ideas (rather than to work on them long-term, or to generate ideas that I could productively work on).
Another disanalogy with previous research/invention… is that we need to solve this particular problem. So in some sense a history of:
[initially garbage-looking-idea] ---> [important research problem solved] may not be relevant.
What we need is: [initially garbage-looking-idea generated as attempt to solve x] ---> [x was solved]
It’s not good enough if we find ideas that are useful for something, they need to be useful for this.
I expect the kinds of processes that work well to look different from those used where there’s no fixed problem.