What guarantees that, in case you happen to be the first to build an interpretable aligned AGI, Conjecture, as an organization wielding a newly acquired immense power, stays aligned with the best interests of humanity?
For the record, having any person or organization in this position would be a tremendous win. Interpretable aligned AGI?! We are talking about a top .1% scenario here! Like, the difference between egoistical Connor vs altruistic Connor with an aligned AGI in his hands is much much smaller than Connor with an aligned AGI and anyone, any organization or any scenario, with a misaligned AGI.
But let’s assume this.
Unfortunately, there is no actual functioning reliable mechanism by which humans can guarantee their alignment to each other. If there was something I could do that would irreversibly bind me to my commitment to the best interests of mankind in a publicly verifiable way, I would do it in a heartbeat. But there isn’t and most attempts at such are security theater.
What I can do is point to my history of acting in ways that, I hope, show my consistent commitment to doing what is best for the longterm future (even if of course some people with different models of what is “best for the longterm future” will have legitimate disagreements with my choices of past actions), and pledge to remain in control of Conjecture and shape its goals and actions appropriately.
On a meta-level, I think the best guarantee I can give is simply that not acting in humanity’s best interest is, in my model, Stupid. And my personal guiding philosophy in life is “Don’t Be Stupid”. Human values are complex and fragile, and while many humans disagree about many details of how they think the world should be, there are many core values that we all share, and not fighting with everything we’ve got to protect these values (or dying with dignity in the process) is Stupid.
I have very high confidence that the *current* Connor Leahy will act towards the best interests of humanity, however, given the extraordinary amount of power an AGI can provide, confidence in this behavior staying the same for decades or centuries (directing some of the AGIs resources towards radical human life extension seems logical) to come is much less.
Another question in case you have time—considering the same hypothetical situation of Conjecture being first to develop an aligned AGI, do you think that immediately applying its powers to ensure no other AGIs can be constructed is the correct behavior to maximize humanity’s chances of survival?
What I can do is point to my history of acting in ways that, I hope, show my consistent commitment to doing what is best for the longterm future (even if of course some people with different models of what is “best for the longterm future” will have legitimate disagreements with my choices of past actions), and pledge to remain in control of Conjecture and shape its goals and actions appropriately.
Sorry, do you mean that you are actually pledging to “remain in control of Conjecture”? Can some other founder(s) make that pledge too if it’s necessary for maintaining >50% voting power?
Will you have the ability to transfer full control over the company to another individual of your choice in case it’s necessary? (Larry Page and Sergey Brin, for example, are seemingly limited in their ability to transfer their 10x-voting-power Alphabet shares to others).
What guarantees that, in case you happen to be the first to build an interpretable aligned AGI, Conjecture, as an organization wielding a newly acquired immense power, stays aligned with the best interests of humanity?
For the record, having any person or organization in this position would be a tremendous win. Interpretable aligned AGI?! We are talking about a top .1% scenario here! Like, the difference between egoistical Connor vs altruistic Connor with an aligned AGI in his hands is much much smaller than Connor with an aligned AGI and anyone, any organization or any scenario, with a misaligned AGI.
But let’s assume this.
Unfortunately, there is no actual functioning reliable mechanism by which humans can guarantee their alignment to each other. If there was something I could do that would irreversibly bind me to my commitment to the best interests of mankind in a publicly verifiable way, I would do it in a heartbeat. But there isn’t and most attempts at such are security theater.
What I can do is point to my history of acting in ways that, I hope, show my consistent commitment to doing what is best for the longterm future (even if of course some people with different models of what is “best for the longterm future” will have legitimate disagreements with my choices of past actions), and pledge to remain in control of Conjecture and shape its goals and actions appropriately.
On a meta-level, I think the best guarantee I can give is simply that not acting in humanity’s best interest is, in my model, Stupid. And my personal guiding philosophy in life is “Don’t Be Stupid”. Human values are complex and fragile, and while many humans disagree about many details of how they think the world should be, there are many core values that we all share, and not fighting with everything we’ve got to protect these values (or dying with dignity in the process) is Stupid.
Thank you for your answer.
I have very high confidence that the *current* Connor Leahy will act towards the best interests of humanity, however, given the extraordinary amount of power an AGI can provide, confidence in this behavior staying the same for decades or centuries (directing some of the AGIs resources towards radical human life extension seems logical) to come is much less.
Another question in case you have time—considering the same hypothetical situation of Conjecture being first to develop an aligned AGI, do you think that immediately applying its powers to ensure no other AGIs can be constructed is the correct behavior to maximize humanity’s chances of survival?
Sorry, do you mean that you are actually pledging to “remain in control of Conjecture”? Can some other founder(s) make that pledge too if it’s necessary for maintaining >50% voting power?
Will you have the ability to transfer full control over the company to another individual of your choice in case it’s necessary? (Larry Page and Sergey Brin, for example, are seemingly limited in their ability to transfer their 10x-voting-power Alphabet shares to others).
There are no guarantees in the affairs of sentient beings, I’m afraid.
This may be usually true, but that’s all.