I hopelessly anchored on almost everything in the report, so my estimate is far from independent. I followed roughly Nate’s approach (though that happened without anchoring afaik), and my final probability is ~ 50% (+/- 5% when I play around with the factors). But it might’ve turned out differently if I had had an espresso more or less in the morning.
50% is lower than what I expected – another AI winter would lead to a delay and might invalidate the scaling hypothesis, so that the cumulative probability should probably increase more slowly after 2040–50, but I would’ve still expected something like 65–85%.
My biggest crux seems to be about the necessity for “deployment.” The report seems to assume that for a system to become dangerous, someone has to decide to deploy it. I don’t know what the technology will be like decades from now, but today I test software systems hundreds of times on my laptop and on different test systems before I deploy them. I’m the only user of my laptop, so privilege escalation is intentionally trivial. As it happens it’s also more powerful than the production server. Besides, my SSH public key is in authorized_keys files on various servers.
So before I test a software, I likely will not know whether it’s aligned, and once I test it, it’s too late. Add to that that even if many people should indeed end up using carefully sandboxed systems for testing AGIs within a few decades, it still only takes one person to relax their security a bit.
So that’s why I merged the last three points and collectively assigned ~ 95% to them (not anchored on Nate afaik). (Or to be precise I mostly ignored the last point because it seems like a huge intellectual can of beans spanning ethics, game theory, decision theory, physics, etc., so more than the purview of the report.)
That’s a great report and exercise! Thank you!
I hopelessly anchored on almost everything in the report, so my estimate is far from independent. I followed roughly Nate’s approach (though that happened without anchoring afaik), and my final probability is ~ 50% (+/- 5% when I play around with the factors). But it might’ve turned out differently if I had had an espresso more or less in the morning.
50% is lower than what I expected – another AI winter would lead to a delay and might invalidate the scaling hypothesis, so that the cumulative probability should probably increase more slowly after 2040–50, but I would’ve still expected something like 65–85%.
My biggest crux seems to be about the necessity for “deployment.” The report seems to assume that for a system to become dangerous, someone has to decide to deploy it. I don’t know what the technology will be like decades from now, but today I test software systems hundreds of times on my laptop and on different test systems before I deploy them. I’m the only user of my laptop, so privilege escalation is intentionally trivial. As it happens it’s also more powerful than the production server. Besides, my SSH public key is in authorized_keys files on various servers.
So before I test a software, I likely will not know whether it’s aligned, and once I test it, it’s too late. Add to that that even if many people should indeed end up using carefully sandboxed systems for testing AGIs within a few decades, it still only takes one person to relax their security a bit.
So that’s why I merged the last three points and collectively assigned ~ 95% to them (not anchored on Nate afaik). (Or to be precise I mostly ignored the last point because it seems like a huge intellectual can of beans spanning ethics, game theory, decision theory, physics, etc., so more than the purview of the report.)