I’ve been thinking recently and writing a post about potential AGI architecture that seems possible to make with current technology in 3 to 5 years, and even faster if significant effort will be put to that goal.
It is a bold claim, and that architecture very well might not be feasible, but it got me thinking about the memetic hazard of similar posts.
It might very well be true that there is an architecture combining current AI tech in a manner as to create AGI out there; in that case, should we treat it as a memetic hazard? If so, what is the course of action here?
I’m thinking that the best thing to do is to covertly discuss it with the AI Safety crowd, both to understand it’s feasibility, and to start working on how to keep this particular architecture aligned (which is a much easier task than aligning something that you don’t even know how it will look.)
What are your thoughts on this matter?
I endorse not widely sharing info that could destroy the world (or accelerate us more than it helps ensure aligned AGI). Starting with smaller private discussion seems great to me.
My personal course of action has been to avoid this topic. I have specific thoughts about what current AI is not doing that bars it from becoming AGI, I have specific thoughts about what lines of research are most likely to successfully lead to AGI, I have arguments for these thoughts, and I’ve decided to keep them to myself. My reasoning is that if I’m right then sharing them marginally accelerates AGI development and if I’m wrong then whatever I say is likely neutral, so it’s all downside, however I keep open the option to change my mind if I encounter some safety related work that hinges on these thoughts (and I’ve hinted at earlier versions of my thoughts in this space as part of previous safety writing to make the case for why I think certain things matter to building aligned AGI).
You’re probably wrong. But who knows?
Remember that if you thought of it, somebody else has probably already thought of it, or something really close. Other people will keep thinking of it, even if you do happen to be the first.
Probably at least 95 percent of them will do nothing with the idea. They’ll just say “Hmm, that seems like it’d work”, and go on with their lives.
Eventually, one or more of them will probably do something more than think. They may publicize it, or they may try to start a public or secret effort to actually build it. Or they may drop it in casual conversation with somebody else who does those things. It may spread slowly by gossip until some random person lights a fire under it. It only takes one if they act effectively and are reasonably lucky.
Therefore, YOU ARE ON THE CLOCK, you do not know how long the clock will run, and you may get little or no warning when the clock runs out. Secrets have short shelf lives.
Any disclosure, public or private, is an irrevocable move, and should be considered carefully. But that does not mean it’s a move you should never make. In fact it’s probably the main move available to you. And you are still on the clock no matter what.
It is always good to get advice from others. You have to give them full information, or their advice will probably be wrong. Be careful whose advice you take. And each person you tell is another person who might do something you don’t like.
In particular, it’s not enough that they be useful, responsible, competent, and/or well-meaning. You may in many scenarios need to decide whether you think their idea of “aligned” matches yours, and whether you think their approach to getting there matches yours. “Alignment” is a very vague concept. “The AI Safety crowd” probably includes people who do not match you.
Actors you’ve never heard of can blow up your plans at any time, including by changing the timeline. “Competitors” with the same idea might not be the only people who can make moves that matter, nor the only people who might respond to your moves in ways that matter.
Whenever you make a move, try to think about how everybody relevant might respond, and how you and the others would respond to that, and so forth. It would probably help to make a list of actors or types of actors, known and unknown, and actually think about what they all might do. And remember that relative speeds matter a lot.
You will be stuck with some rules and heuristics like “try to tell this kind of people” or “try not to tell thak kind of people” or “always/never do X when/until Y”. Simple rules like that are never right for all situations. They are last resorts. You shouldn’t apply them if you can think out the specific, situational consequences instead. It’s very easy to grab a cached rule and stop thinking.
At the same time, if you’re about to violate a rule, you should run the situation against the original reason you originally adopted that rule, and be sure you’re satisfied with why you’re violating it.
Even if they want to make everyone ponies, there’s a decision theory according to which they won’t make you regret approaching them with a secret.
… assuming that they actually subscribe to that line of thought. And, depending on exactly which form of that line of thought they subscribe to, assuming that they think you approximate Omega. And assuming that they practice what they preach. That’s a lot of assumptions.
I mean, not to put too fine a point on it, but I personally think that whole memeplex around various decision theories, and weird Newcomb-like problems, and acausal trade, and whatever, is almost entirely a combination of (a) stuff that only matters in conditions that will never happen to anybody, (b) stuff that can’t be executed, and (c) pure hooey. Some of the hooey seems to be of the “I really want to extract ethics from pure logic” variety. There’s maybe a tiny bit of unlikely, but possible, hypothetical reality thrown in to season it. I would not even think of acting on that if I were approached with something like this.
Maybe I’m totally wrong, but what if the person you tell is totally wrong in the same way?
You are not alone. There’s never been a meta level proof that you can apply the same decision theory to any possible universe. But most lesswrongians want to fiddle with the details, and not look at the big picture.
I don’t follow—what part of switching out your universe should stop decision theory from working? If you care about some universe, you can be the kind of person such that if you are that kind of person then that universe gets better. You can execute this motion from anywhere, though if the universe you care about has nothing depend on what kind of person you are it won’t help.
Decision theory is not some vague claim about being a certain kind of person.
Universes can stymie DT’s by having no possibility of what you want, having infinite amounts of it, having infinite copies of you, disallowing causal connections between decisions and results , etc, etc.
If there is no possibility of what you want we can do no better than whatever approach I propose. The remote possibility of controlling infinite matter does indeed dominate all other concerns for any unbounded utility function, so I observe our utility function to be bounded. Having infinite copies of me is fine if me being a particular kind of person implies the copies of me being the same kind of person. Causal connections are not required—if someone knows what kind of person you are even without building a copy of you, that is enough for my “such that” clause.
Are you an engineer?
If so, then just build the AGI yourself. Experimentation trumps talk. Actually building the thing will teach you more about the architecture than discussing it with other people. Building a scaled-down version is perfectly safe, provided you limit its compute (which happens by default) and don’t put it in the path of any important systems (which also happens by default).
If you can’t build the AGI yourself then you’re not at the top of the ML field which means your architecture is unlikely to be correct and you can publish it safely.
I can’t say I am one, but I am currently working on research and prototyping and will probably refrain to that until I can prove some of my hypotheses, since I do have access to the tools I need at the moment.
Still, I didn’t want this post to only have relevance to my case, as I stated I don’t think probability of successs is meaningful. But I am interested in the opinions of the community related to other similar cases.
edit: It’s kinda hard to answer your comment since it keeps changing every time I refresh. By “can’t say I am one” I mean a “world-class engineer” in the original comment. I do appreciate the change of tone in the final (?) version, though :)
There are a bunch of ways that could go wrong. The most obvious would be somebody else seeing what you were doing and scaling it up, followed by it hacking its way out and putting itself in whatever systems it pleased. But there are others, especially since if it does work, you are going to want to scale it up, which you may or may not be able to do while keeping it from going amok, and, only loosely correlated with reality, may or may not be able to convince yourself you can do.
And, depending on the architecture, you can’t necessarily predict the scale effects, so a scaled-down version may not tell you much.
The only reason to publish it would be if OP thought it was correct, or at least that there was a meaningful chance. If you have an idea, but you’re convinced that the probability of it being correct is that low, then it seems to me that you should just drop it silently and not waste your and other people’s time on it.
Also, I’ll bet you 5 Postsingular Currency Units that pure machine learning, at least as presently defined by the people at the “top of the ML field”, will not generate a truly generally intelligent artificial agent, at or above human level, that can run on any real hardware. At least not before other AGIs, not entirely based on ML, have built Jupiter brains or something for it to run on. I think there’s gonna be other stuff in the mix.