One group that isn’t considered in the analysis is new trainees. It seems that AGI is probably sufficiently far off, that many of the people who will make the breakthroughs are not yet researchers or experts. If you are a bright young person who might work at MIRI or somewhere similar in 5 years time, you would want to get familiar with the area. You are probably reading MIRI’s existing work, to see if you have the capability to work in the field. This means that if you do join MIRI, you have already been thinking along the right lines for years.
Obviously you don’t want your discussions live streamed to the world, you might come up with dangerous ideas. But I would suggest sticking things online once you understand the area sufficiently well to be confident its safe. If writing it up into a fully formal paper is too time intensive, any rough scraps will still be read by the dedicated.
Yup. As someone aiming to do their dissertation on issues of limited agency (low impact, mild optimization, corrigibility), it sure would be frustrating to essentially end up duplicating the insights that MIRI has on some new optimization paradigm.
I still understand why they’re doing this and think it’s possibly beneficial, but it would be nice to avoid having this happen.
You can find a number of interesting engineering practices at NASA. They do things like take three independent teams, give each of them the same engineering spec, and tell them to design the same software system; and then they choose between implementations by majority vote. The system that they actually deploy consults all three systems when making a choice, and if the three systems disagree, the choice is made by majority vote. The idea is that any one implementation will have bugs, but it’s unlikely all three implementations will have a bug in the same place.
One could make an argument for multiple indepedent AI safety teams on similar grounds:
“any one optimization paradigm may have weaknesses, but it’s unlikely that multiple optimization paradigms will have weaknesses in the same place”
“any one team may not consider a particular fatal flaw, but it’s unlikely that multiple teams will all neglect the same fatal flaw”
In the best case, you can merge multiple paradigms & preserve the strengths of all with the weaknesses of none. In the worst case, having competing paradigms still gives you the opportunity to select the best one.
This works best if individual researchers/research teams are able to set aside their egos and overcome not-invented-here syndrome to create the best overall system… which is a big if.
One group that isn’t considered in the analysis is new trainees. It seems that AGI is probably sufficiently far off, that many of the people who will make the breakthroughs are not yet researchers or experts. If you are a bright young person who might work at MIRI or somewhere similar in 5 years time, you would want to get familiar with the area. You are probably reading MIRI’s existing work, to see if you have the capability to work in the field. This means that if you do join MIRI, you have already been thinking along the right lines for years.
Obviously you don’t want your discussions live streamed to the world, you might come up with dangerous ideas. But I would suggest sticking things online once you understand the area sufficiently well to be confident its safe. If writing it up into a fully formal paper is too time intensive, any rough scraps will still be read by the dedicated.
Yup. As someone aiming to do their dissertation on issues of limited agency (low impact, mild optimization, corrigibility), it sure would be frustrating to essentially end up duplicating the insights that MIRI has on some new optimization paradigm.
I still understand why they’re doing this and think it’s possibly beneficial, but it would be nice to avoid having this happen.
From this post:
One could make an argument for multiple indepedent AI safety teams on similar grounds:
“any one optimization paradigm may have weaknesses, but it’s unlikely that multiple optimization paradigms will have weaknesses in the same place”
“any one team may not consider a particular fatal flaw, but it’s unlikely that multiple teams will all neglect the same fatal flaw”
In the best case, you can merge multiple paradigms & preserve the strengths of all with the weaknesses of none. In the worst case, having competing paradigms still gives you the opportunity to select the best one.
This works best if individual researchers/research teams are able to set aside their egos and overcome not-invented-here syndrome to create the best overall system… which is a big if.