Alright, I think I’ve figured out what my disagreement with this post is.
A field of research pursues the general endeavor of finding out things there are to know about a topic. It consists of building an accurate map of the world, of how-things-work, in general.
A solution to alignment is less like a field of research and more like a single engineering project. A difficult one, for sure! But ultimately, still a single engineering project, for which it is not necessary to know all the facts about the field, but only the facts that are useful.
And small groups/individuals do put together single engineering projects all the time! Including very large engineering projects like compilers, games & game engines, etc.
And, yes, we need solving alignment to be an at least partially nonpublic affair, because some important insights about how to solve alignment will be dual use, and the whole point is to get the people trying to save the world to succeed before the people functionally trying to kill everyone, not to get the people trying to save the world to theoretically succeed if they as much time as they wanted.
some important insights about how to solve alignment will be dual use
Suggestion: if you’re using the framing of alignment-as-a-major-engineering-project, you can re-frame “exfohazards” as “trade secrets”. That should work to make people who’d ordinarily think that the very idea of exfohazards is preposterous[1] take you seriously.
tbh I kinda gave up on reaching people who think like this :/
My heuristic is that they have too many brainworms to be particularly helpful to the critical parts of worldsaving, and it feels like it’d be unpleasant and not-great-norms to have a part of my brain specialized in “manipulating people with biases/brainworms”.
I don’t think that reframing is manipulation? In my model, reframing between various setting is a necessary part of general intelligence—you set problem and switch between frameworks until you find one of which where solution-search-path is “smooth”. The same with communication—you build various models of your companion until you find shortest-inference path.
I meant when interfacing with governments/other organizations/etc., and plausibly at later stages, when the project may require “normal” software engineers/specialists in distributed computations/lower-level employees or subcontractors.
I agree that people who don’t take the matter seriously aren’t going to be particularly helpful during higher-level research stages.
“manipulating people with biases/brainworms”
I don’t think this is really manipulation? You’re communicating an accurate understanding of the situation to them, in the manner they can parse. You’re optimizing for accuracy, not for their taking specific actions that they wouldn’t have taken if they understood the situation (as manipulators do).
If anything, using niche jargon would be manipulation, or willful miscommunication: inasmuch as you’d be trying to convey them accurate information in the way you know they will misinterpret (even if you’re not actively optimizing for misinterpretation).
Alright, I think I’ve figured out what my disagreement with this post is.
A field of research pursues the general endeavor of finding out things there are to know about a topic. It consists of building an accurate map of the world, of how-things-work, in general.
A solution to alignment is less like a field of research and more like a single engineering project. A difficult one, for sure! But ultimately, still a single engineering project, for which it is not necessary to know all the facts about the field, but only the facts that are useful.
And small groups/individuals do put together single engineering projects all the time! Including very large engineering projects like compilers, games & game engines, etc.
And, yes, we need solving alignment to be an at least partially nonpublic affair, because some important insights about how to solve alignment will be dual use, and the whole point is to get the people trying to save the world to succeed before the people functionally trying to kill everyone, not to get the people trying to save the world to theoretically succeed if they as much time as they wanted.
(Also: I believe this post means “exfohazard”, not “infohazard”)
Suggestion: if you’re using the framing of alignment-as-a-major-engineering-project, you can re-frame “exfohazards” as “trade secrets”. That should work to make people who’d ordinarily think that the very idea of exfohazards is preposterous[1] take you seriously.
As in: “Aren’t you trying to grab too much status by suggesting you’re smart enough to figure out something dangerous? Know your station!”
tbh I kinda gave up on reaching people who think like this :/
My heuristic is that they have too many brainworms to be particularly helpful to the critical parts of worldsaving, and it feels like it’d be unpleasant and not-great-norms to have a part of my brain specialized in “manipulating people with biases/brainworms”.
I don’t think that reframing is manipulation? In my model, reframing between various setting is a necessary part of general intelligence—you set problem and switch between frameworks until you find one of which where solution-search-path is “smooth”. The same with communication—you build various models of your companion until you find shortest-inference path.
I meant when interfacing with governments/other organizations/etc., and plausibly at later stages, when the project may require “normal” software engineers/specialists in distributed computations/lower-level employees or subcontractors.
I agree that people who don’t take the matter seriously aren’t going to be particularly helpful during higher-level research stages.
I don’t think this is really manipulation? You’re communicating an accurate understanding of the situation to them, in the manner they can parse. You’re optimizing for accuracy, not for their taking specific actions that they wouldn’t have taken if they understood the situation (as manipulators do).
If anything, using niche jargon would be manipulation, or willful miscommunication: inasmuch as you’d be trying to convey them accurate information in the way you know they will misinterpret (even if you’re not actively optimizing for misinterpretation).