Weibing Wang

Karma: 90

Rethinking AI Safety Approach in the Era of Open-Source AI

Weibing WangFeb 11, 2025, 2:01 PM

3 points

0 comments6 min readLW link

Weibing Wang Dec 29, 2024, 3:13 AM
2 points
0
in reply to: Dakara’s comment on: A Solution for AGI/ASI Safety
I think this plan is not sufficient to completely solve problems #1, #2, #3 and #5. I can’t come up with a better one for the time being. I think more discussions are needed.

Weibing Wang Dec 28, 2024, 6:33 AM
4 points
1
in reply to: Dakara’s comment on: A Solution for AGI/ASI Safety
I agree with your view about organizational problems. Your discussion gave me an idea: Is it possible to shift employees dedicated to capability improvement to work on safety improvement? Set safety goals for these employees within the organization. This way, they will have a new direction and won’t be idle, worried about being fired or resigning to go to other companies. Besides, it’s necessary to make employees understand that improving safety is a highly meaningful job. This may not rely solely on the organization itself, but also require external pressure, such as from the government, peers, or the public. If the safety cannot be ensured, your product may face a lot of criticism and even be restricted from market access. And there will be some third-party organizations conducting safety evaluations of your product, so you need to do a solid job in safety rather than just going through the motions.

Weibing Wang Dec 27, 2024, 1:53 AM
4 points
3
in reply to: Knight Lee’s comment on: A Solution for AGI/ASI Safety
Thank you for your advice!

Weibing Wang Dec 26, 2024, 3:03 AM
4 points
1
in reply to: Knight Lee’s comment on: A Solution for AGI/ASI Safety
You mentioned Mixture of Experts. That’s interesting. I’m not an expert in this area. I speculate that in an architecture similar to MoE, when one expert is working, the others are idle. In this way, we don’t need to run all the experts simultaneously, which indeed saves computation, but it doesn’t save memory. However, if an expert is shared among different tasks, when it’s not needed for one task, it can handle other tasks, so it can stay busy all the time.
The key point here is the independence of the experts, including what you mentioned, that each expert has an independent self-cognition. A possible bad scenario is that although there are many experts, they all passively follow the commands of a Leader AI. In this case, the AI team is essentially no different from a single superintelligence. Extra efforts are indeed needed to achieve this independence. Thank you for pointing this out!
Happy holidays, too!

Weibing Wang Dec 25, 2024, 3:15 AM
4 points
1
in reply to: Knight Lee’s comment on: A Solution for AGI/ASI Safety
1. The industry is currently not violating the rules mentioned in my paper, because all current AIs are weak AIs, so none of the AIs’ power has reached the upper limit of the 7 types of AIs I described. In the future, it is possible for an AI to break through the upper limit, but I think it is uneconomical. For example, an AI psychiatrist does not need to have superhuman intelligence to perform well. An AI mathematician may be very intelligent in mathematics, but it does not need to learn how to manipulate humans or how to design DNA sequences. Of course, having regulations is better, because there may be some careless AI developers who will grant AIs too many unnecessary capabilities or permissions, although this does not improve the performance of AIs in actual tasks.
The difference between my view and Max Tegmark’s is that he seems to assume that there will only be one type of super intelligent AI in the world, while I think there will be many different types of AIs. Different types of AIs should be subject to different rules, rather than the same rule. Can you imagine a person who is both a Nobel Prize-winning scientist, the president, the richest man, and an Olympic champion at the same time? This is very strange, right? Our society doesn’t need such an all-round person. Similarly, we don’t need such an all-round AI either.
The development of a technology usually has two stages: first, achieving capabilities, and second, reducing costs. The AI technology is currently in the first stage. When AI develops to the second stage, specialization will occur.

2. Agree.

Weibing Wang Dec 24, 2024, 7:55 AM
4 points
1
in reply to: Knight Lee’s comment on: A Solution for AGI/ASI Safety
1. One of my favorite ideas is Specializing AI Powers. I think it is both safer and more economical. Here, I divide AI into seven types, each engaged in different work. Among them, the most dangerous one may be the High-Intellectual-Power AI, but we only let it engage in scientific research work in a restricted environment. In fact, in most economic fields, using overly intelligent AI does not bring more returns. In the past, industrial assembly lines greatly improved the output efficiency of workers. I think the same is true for AI. AIs with different specialties collaborating in an assembly line manner will have higher efficiency than using all-powerful AIs. Therefore, it is possible that without special efforts, the market will automatically develop in this direction.
2. I think the key for convincing people may lie in the demonstration of AI’s capabilities, that is, showing that AI does indeed have great destructive power. However, the current AI capabilities are still relatively weak and cannot provide sufficient persuasion. Maybe it will have to wait until AGI is achieved?

Weibing Wang Dec 24, 2024, 6:24 AM
4 points
1
in reply to: plex’s comment on: A Solution for AGI/ASI Safety
For the first issue, I agree that “Carefully Bootstrapped Alignment” is organizationally hard, but I don’t think improving the organizational culture is an effective solution. It is too slow and humans often make mistakes. I think technical solutions are needed. For example, let an AI be responsible for safety assessment. When a researcher submits a job to the AI training cluster, this AI assesses the safety of the job. If this job may produce a dangerous AI, the job will be rejected. In addition, external supervision is also needed. For example, the government could stipulate that before an AI organization releases a new model, it needs to be evaluated by a third-party safety organization, and all organizations with computing resources exceeding a certain threshold need be supervised. There is more discussion on this in the section Restricting AI Development.
For the second issue, you mentioned free variables. I think this is a key point. In the case where we are not fully confident in the safety of AI, we should reduce free variables as much as possible. This is why I proposed a series of AI Controllability Rules. The priority of these rules is higher than the goals. AI should be trained to achieve the goals under the premise of complying with the rules. In addition, I think we should not place all our hopes on alignment. We should have more measures to deal with the situation where AI alignment fails, such as AI Monitoring and Decentralizing AI Power.

Weibing Wang Dec 23, 2024, 5:35 AM
4 points
1
in reply to: Knight Lee’s comment on: A Solution for AGI/ASI Safety
1. I think it is “Decentralizing AI Power”. So far, most descriptions of the extreme risks of AI assume the existence of an all-powerful superintelligence. However, I believe this can be avoided. That is, we can create a large number of AI instances with independent decision-making and different specialties. Through their collaboration, they can also complete the complex tasks that a single superintelligence can accomplish. They will supervise each other to ensure that no AI will violate the rules. This is very much like human society: The power of a single individual is very weak, but through division of labor and collaboration, humans have created an unprecedentedly powerful civilization.
2. I am not sure that an international governance system will definitely succeed in AI safety. This requires extremely arduous efforts. First, all countries need to reach a consensus on AI risks, but this has not happened yet. So I think risk evaluation is a very important task. If it can be proven that the risks of AI in the future are very high, for example, higher than that of nuclear weapons, then countries may cooperate, just as they have cooperated in controlling the proliferation of nuclear weapons in the past. Second, even if countries are willing to cooperate, they will also face great challenges. Restricting the development of AI is much more difficult than restricting the proliferation of nuclear weapons. I discussed some restriction methods in Section 14.3, but I am also not sure whether these methods can be effectively implemented.

Weibing Wang Dec 22, 2024, 3:34 AM
5 points
1
in reply to: plex’s comment on: A Solution for AGI/ASI Safety
The core idea about alignment is described here: https://wwbmmm.github.io/asi-safety-solution/en/main.html#aligning-ai-systems
If you only focus on alignment, you can only read Sections 6.1-6.3, and the length of this part will not be too long.

Weibing Wang Dec 20, 2024, 2:35 AM
5 points
1
in reply to: bhauth’s comment on: A Solution for AGI/ASI Safety
Thank you for your comment! I think your concern is right. Many safety measures may slow down the development of AI’s capabilities. Developers who ignore safety may develop more powerful AI more quickly. I think this is a governance issue. I have discussed some solutions in Sections 13.2 and 16. If you are interested, you can take a look.

Weibing Wang Dec 20, 2024, 2:22 AM
2 points
1
in reply to: Charlie Steiner’s comment on: A Solution for AGI/ASI Safety
Thank you for your comment! I think my solution is applicable to arbitrary intelligent AI for the following reasons:
1. During the development stage, AI will align with the developers’ goals. If the developers are benevolent, they will specify a goal that is beneficial to humans. Since the developers’ goals have a higher priority than the users’ goals, if a user specifies an inappropriate goal, the AI can refuse.
2. Guiding the AI to “do the right thing” through the developers’ goals and constraining the AI to “not do the wrong thing” through the rules may seem a bit redundant. If the AI has learned to do the right thing, it should not do the wrong thing. However, the significance of the rules is that they can serve as a standard for AI monitoring, making it clear to the monitors under what circumstances the AI’s actions should be stopped.
3. If the monitor is an equally intelligent AI, it should have able to identify those behaviors that attempt to bypass the loopholes in the rules.

Weibing Wang Dec 19, 2024, 3:51 AM
4 points
1
in reply to: Nathan Helm-Burger’s comment on: A Solution for AGI/ASI Safety
Thank you for your feedback! I’ll read the resources you’ve shared. I also look forward to your specific suggestions for my paper.

Weibing Wang Dec 19, 2024, 3:46 AM
4 points
2
in reply to: jacquesthibs’s comment on: A Solution for AGI/ASI Safety
Thank you for your suggestions! I have read the CAIS stuff you provided and I generally agree with these views. I think the solution in my paper is also applicable to CAIS.

Weibing Wang Dec 19, 2024, 2:31 AM
4 points
2
in reply to: Seth Herd’s comment on: A Solution for AGI/ASI Safety
Thank you for your suggestions! I will read the materials you recommended and try to cite more related works.
For o1, I think o1 is the right direction. The developers of o1 should be able to see the hidden chain of thoughts of o1, which is explainable for them.
I think that alignment or interpretability is not a “yes” or “no” property, but a gradually changing property. o1 has done a good job in terms of interpretability, but there is still room for improvement. Similarly, the first AGI to come out in the future may be partially aligned and partially interpretable, and then the approaches in this paper can be used to improve its alignment and interpretability.

A Solution for AGI/ASI Safety

Weibing WangDec 18, 2024, 7:44 PM

50 points

29 comments1 min readLW link

Weibing Wang

Re­think­ing AI Safety Ap­proach in the Era of Open-Source AI

A Solu­tion for AGI/​ASI Safety

Rethinking AI Safety Approach in the Era of Open-Source AI

A Solution for AGI/ASI Safety