Knight Lee comments on A Solution for AGI/ASI Safety

Knight Lee 22 Dec 2024 5:54 UTC
2 points
0
I didn’t read the 100 pages, but the content seems extremely intelligent and logical. I really like the illustrations, they are awesome.
A few questions.
1: In your opinion, which idea in your paper is the most important, most new (not already focused on by others), and most affordable (can work without needing huge improvements in political will for AI safety)?
2: The paper suggests preventing AI from self-iteration, or recursive self improvement. My worry is that once many countries (or companies) have access to AI which are far better and faster than humans at AI research, each one will be tempted to allow a very rapid self improvement cycle.
Each country might fear that if it doesn’t allow it, one of the other countries will, and that country’s AI will be so intelligent it can engineer self replicating nanobots which take over the world. This motivates each country to allow the recursive self improvement, even if the AI’s methods of AI development become so advanced they are inscrutable by human minds.
How can we prevent this?
Edit: sorry I didn’t read the paper. But when I skimmed it you did have a section on “AI Safety Governance System,” and talked about an international organization to get countries to do the right thing. I guess one question is, why would an international system succeed in AI safety, when current international systems have so far failed to prevent countries from acting selfishly in ways which severely harms other countries (e.g. all wars, exploitation, etc.)?
- Weibing Wang 23 Dec 2024 5:35 UTC
  3 points
  0
  Parent
  1. I think it is “Decentralizing AI Power”. So far, most descriptions of the extreme risks of AI assume the existence of an all-powerful superintelligence. However, I believe this can be avoided. That is, we can create a large number of AI instances with independent decision-making and different specialties. Through their collaboration, they can also complete the complex tasks that a single superintelligence can accomplish. They will supervise each other to ensure that no AI will violate the rules. This is very much like human society: The power of a single individual is very weak, but through division of labor and collaboration, humans have created an unprecedentedly powerful civilization.
  2. I am not sure that an international governance system will definitely succeed in AI safety. This requires extremely arduous efforts. First, all countries need to reach a consensus on AI risks, but this has not happened yet. So I think risk evaluation is a very important task. If it can be proven that the risks of AI in the future are very high, for example, higher than that of nuclear weapons, then countries may cooperate, just as they have cooperated in controlling the proliferation of nuclear weapons in the past. Second, even if countries are willing to cooperate, they will also face great challenges. Restricting the development of AI is much more difficult than restricting the proliferation of nuclear weapons. I discussed some restriction methods in Section 14.3, but I am also not sure whether these methods can be effectively implemented.
  - Knight Lee 23 Dec 2024 7:50 UTC
    2 points
    0
    Parent
    Thank you for your response!
    What do you think is your best insight about decentralizing AI power, which is most likely to help the idea succeed, or to convince others to focus on the idea?
    EDIT: PS, one idea I really like is dividing one agent into many agents working together. In fact, thinking about this. Maybe if many agents working together behave exactly identical to one agent, but merely use the language of many agents working together, e.g. giving the narrator different names for different parts of the text, and saying “he thought X and she did Y,” instead of “I thought X and I did Y,” will massively reduce self-allegiance, by making it far more sensible for one agent to betray another agent to the human overseers, than for the same agent in one moment in time to betray the agent in a previous moment of time to the human overseers.
    I made a post on this. Thank you for your ideas :)
    I feel when the stakes are incredibly high, e.g. WWII, countries which do not like each other, e.g. the US and USSR, do join forces to survive. The main problem is that very few people today believe in incredibly high stakes. Not a single country has made serious sacrifices for it. The AI alignment spending is less than 0.1% of the AI capability spending. This is despite some people making some strong arguments. What is the main hope for convincing people?
    - Weibing Wang 24 Dec 2024 7:55 UTC
      3 points
      0
      Parent
      1. One of my favorite ideas is Specializing AI Powers. I think it is both safer and more economical. Here, I divide AI into seven types, each engaged in different work. Among them, the most dangerous one may be the High-Intellectual-Power AI, but we only let it engage in scientific research work in a restricted environment. In fact, in most economic fields, using overly intelligent AI does not bring more returns. In the past, industrial assembly lines greatly improved the output efficiency of workers. I think the same is true for AI. AIs with different specialties collaborating in an assembly line manner will have higher efficiency than using all-powerful AIs. Therefore, it is possible that without special efforts, the market will automatically develop in this direction.
      2. I think the key for convincing people may lie in the demonstration of AI’s capabilities, that is, showing that AI does indeed have great destructive power. However, the current AI capabilities are still relatively weak and cannot provide sufficient persuasion. Maybe it will have to wait until AGI is achieved?
      - Knight Lee 24 Dec 2024 9:13 UTC
        2 points
        0
        Parent
        That is very thoughtful.
        1.
        When you talk about specializing AI powers, you talk about a high intellectual power AI with limited informational power and limited mental (social) power. I think this idea is similar to what Max Tegmark said in an article:
        If you’d summarize the conventional past wisdom on how to avoid an intelligence explosion in a “Don’t-do-list” for powerful AI, it might start like this:
        ☐ Don’t teach it to code: this facilitates recursive self-improvement
        ☐ Don’t connect it to the internet: let it learn only the minimum needed to help us, not how to manipulate us or gain power
        ☐ Don’t give it a public API: prevent nefarious actors from using it within their code
        ☐ Don’t start an arms race: this incentivizes everyone to prioritize development speed over safety
        Industry has collectively proven itself incapable to self-regulate, by violating all of these rules.
        He disagrees that “the market will automatically develop in this direction” and is strongly pushing for regulation.
        Another think Max Tegmark talks about is focusing on Tool AI instead of building a single AGI which can do everything better than humans (see 4:48 to 6:30 in his video). This slightly resembles specializing AI intelligence, but I feel his Tool AI regulation is too restrictive to be a permanent solution. He also argues for cooperation between the US and China to push for international regulation (in 12:03 to 14:28 of that video).
        Of course, there are tons of ideas in your paper that he hasn’t talked about yet.
        You should read about the Future of Life Institute, which is headed by Max Tegmark and is said to have a budget of $30 million.
        2.
        The problem with AGI is at first it has no destructive power at all, and then it suddenly has great destructive power. By the time people see its destructive power, it’s too late. Maybe the ASI has already taken over the world, or maybe the AGI has already invented a new deadly technology which can never ever be “uninvented,” and bad actors can do harm far more efficiently.
        Weibing Wang 25 Dec 2024 3:15 UTC
        3 points
        0
        Parent
        1. The industry is currently not violating the rules mentioned in my paper, because all current AIs are weak AIs, so none of the AIs’ power has reached the upper limit of the 7 types of AIs I described. In the future, it is possible for an AI to break through the upper limit, but I think it is uneconomical. For example, an AI psychiatrist does not need to have superhuman intelligence to perform well. An AI mathematician may be very intelligent in mathematics, but it does not need to learn how to manipulate humans or how to design DNA sequences. Of course, having regulations is better, because there may be some careless AI developers who will grant AIs too many unnecessary capabilities or permissions, although this does not improve the performance of AIs in actual tasks.
        The difference between my view and Max Tegmark’s is that he seems to assume that there will only be one type of super intelligent AI in the world, while I think there will be many different types of AIs. Different types of AIs should be subject to different rules, rather than the same rule. Can you imagine a person who is both a Nobel Prize-winning scientist, the president, the richest man, and an Olympic champion at the same time? This is very strange, right? Our society doesn’t need such an all-round person. Similarly, we don’t need such an all-round AI either.
        The development of a technology usually has two stages: first, achieving capabilities, and second, reducing costs. The AI technology is currently in the first stage. When AI develops to the second stage, specialization will occur.
        
        2. Agree.
        Knight Lee 25 Dec 2024 7:57 UTC
        2 points
        0
        Parent
        I feel your points are very intelligent. I also agree that specializing AI is a worthwhile direction.
        It’s very uncertain if it works, but all approaches are very uncertain, so humanity’s best chance is to work on many uncertain approaches.
        Unfortunately, I disagree it will happen automatically. Gemini 1.5 (and probably Gemini 2.0 and GPT-4) are Mixture of Experts models. I’m no expert, but I think that means that for each token of text, a “weighting function” decides which of the sub-models should output the next token of text, or how much weight to give each sub-model.
        So maybe there is an AI psychiatrist, an AI mathematician, and an AI biologist inside Gemini and o1. Which one is doing the talking depends on what question is asked, or which part of the question the overall model is answering.
        The problem is they they all output words to the same stream of consciousness, and refer to past sentences with the words “I said this,” rather than “the biologist said this.” They think that they are one agent, and so they behave like one agent.
        My idea—which I only thought of thanks to your paper—is to do the opposite. The experts within the Mixture of Experts model, or even the same AI on different days, do not refer to themselves with “I” but “he,” so they behave like many agents.
        :) thank you for your work!
        I’m not disagreeing with your work, I’m just a little less optimistic than you and don’t think things will go well unless effort is made. You wrote the 100 page paper so you probably understand effort more than me :)
        Happy holidays!
        Weibing Wang 26 Dec 2024 3:03 UTC
        3 points
        0
        Parent
        You mentioned Mixture of Experts. That’s interesting. I’m not an expert in this area. I speculate that in an architecture similar to MoE, when one expert is working, the others are idle. In this way, we don’t need to run all the experts simultaneously, which indeed saves computation, but it doesn’t save memory. However, if an expert is shared among different tasks, when it’s not needed for one task, it can handle other tasks, so it can stay busy all the time.
        The key point here is the independence of the experts, including what you mentioned, that each expert has an independent self-cognition. A possible bad scenario is that although there are many experts, they all passively follow the commands of a Leader AI. In this case, the AI team is essentially no different from a single superintelligence. Extra efforts are indeed needed to achieve this independence. Thank you for pointing this out!
        Happy holidays, too!
        Knight Lee 26 Dec 2024 5:08 UTC
        1 point
        0
        Parent
        I agree, it takes extra effort to make the AI behave like a team of experts.
        Thank you :)
        Good luck on sharing your ideas. If things aren’t working out, try changing strategies. Maybe instead of giving people a 100 page paper, tell them the idea you think is “the best,” and focus on that one idea. Add a little note at the end “by the way, if you want to see many other ideas from me, I have a 100 page paper here.”
        Maybe even think of different ideas.
        I cannot tell you which way is better, just keep trying different things. I don’t know what is right because I’m also having trouble sharing my ideas.

Knight Lee comments on A Solution for AGI/​ASI Safety

Knight Lee comments on A Solution for AGI/ASI Safety