jbash answers Memetic hazards of AGI architecture posts

jbash 16 Oct 2021 17:19 UTC
8 points
1. You’re probably wrong. But who knows?
2. Remember that if you thought of it, somebody else has probably already thought of it, or something really close. Other people will keep thinking of it, even if you do happen to be the first.
3. Probably at least 95 percent of them will do nothing with the idea. They’ll just say “Hmm, that seems like it’d work”, and go on with their lives.
4. Eventually, one or more of them will probably do something more than think. They may publicize it, or they may try to start a public or secret effort to actually build it. Or they may drop it in casual conversation with somebody else who does those things. It may spread slowly by gossip until some random person lights a fire under it. It only takes one if they act effectively and are reasonably lucky.
5. Therefore, YOU ARE ON THE CLOCK, you do not know how long the clock will run, and you may get little or no warning when the clock runs out. Secrets have short shelf lives.
6. Any disclosure, public or private, is an irrevocable move, and should be considered carefully. But that does not mean it’s a move you should never make. In fact it’s probably the main move available to you. And you are still on the clock no matter what.
7. It is always good to get advice from others. You have to give them full information, or their advice will probably be wrong. Be careful whose advice you take. And each person you tell is another person who might do something you don’t like.
8. In particular, it’s not enough that they be useful, responsible, competent, and/or well-meaning. You may in many scenarios need to decide whether you think their idea of “aligned” matches yours, and whether you think their approach to getting there matches yours. “Alignment” is a very vague concept. “The AI Safety crowd” probably includes people who do not match you.
9. Actors you’ve never heard of can blow up your plans at any time, including by changing the timeline. “Competitors” with the same idea might not be the only people who can make moves that matter, nor the only people who might respond to your moves in ways that matter.
10. Whenever you make a move, try to think about how everybody relevant might respond, and how you and the others would respond to that, and so forth. It would probably help to make a list of actors or types of actors, known and unknown, and actually think about what they all might do. And remember that relative speeds matter a lot.
11. You will be stuck with some rules and heuristics like “try to tell this kind of people” or “try not to tell thak kind of people” or “always/never do X when/until Y”. Simple rules like that are never right for all situations. They are last resorts. You shouldn’t apply them if you can think out the specific, situational consequences instead. It’s very easy to grab a cached rule and stop thinking.
12. At the same time, if you’re about to violate a rule, you should run the situation against the original reason you originally adopted that rule, and be sure you’re satisfied with why you’re violating it.
- Gurkenglas 16 Oct 2021 17:36 UTC
  3 points
  Parent
  In particular, it’s not enough that they be useful, responsible, competent, and/or well-meaning.
  Even if they want to make everyone ponies, there’s a decision theory according to which they won’t make you regret approaching them with a secret.
  - jbash 16 Oct 2021 22:40 UTC
    6 points
    Parent
    … assuming that they actually subscribe to that line of thought. And, depending on exactly which form of that line of thought they subscribe to, assuming that they think you approximate Omega. And assuming that they practice what they preach. That’s a lot of assumptions.
    
    I mean, not to put too fine a point on it, but I personally think that whole memeplex around various decision theories, and weird Newcomb-like problems, and acausal trade, and whatever, is almost entirely a combination of (a) stuff that only matters in conditions that will never happen to anybody, (b) stuff that can’t be executed, and (c) pure hooey. Some of the hooey seems to be of the “I really want to extract ethics from pure logic” variety. There’s maybe a tiny bit of unlikely, but possible, hypothetical reality thrown in to season it. I would not even think of acting on that if I were approached with something like this.
    
    Maybe I’m totally wrong, but what if the person you tell is totally wrong in the same way?
    - TAG 16 Oct 2021 22:56 UTC
      1 point
      Parent
      You are not alone. There’s never been a meta level proof that you can apply the same decision theory to any possible universe. But most lesswrongians want to fiddle with the details, and not look at the big picture.
      - Gurkenglas 9 Nov 2021 14:52 UTC
        2 points
        Parent
        I don’t follow—what part of switching out your universe should stop decision theory from working? If you care about some universe, you can be the kind of person such that if you are that kind of person then that universe gets better. You can execute this motion from anywhere, though if the universe you care about has nothing depend on what kind of person you are it won’t help.
        TAG 9 Nov 2021 16:35 UTC
        1 point
        Parent
        
        If you care about some universe, you can be the kind of person such that if you are that kind of person then that universe gets better.
        
        Decision theory is not some vague claim about being a certain kind of person.
        
        Universes can stymie DT’s by having no possibility of what you want, having infinite amounts of it, having infinite copies of you, disallowing causal connections between decisions and results , etc, etc.
        Gurkenglas 9 Nov 2021 16:44 UTC
        2 points
        Parent
        If there is no possibility of what you want we can do no better than whatever approach I propose. The remote possibility of controlling infinite matter does indeed dominate all other concerns for any unbounded utility function, so I observe our utility function to be bounded. Having infinite copies of me is fine if me being a particular kind of person implies the copies of me being the same kind of person. Causal connections are not required—if someone knows what kind of person you are even without building a copy of you, that is enough for my “such that” clause.