I am uncertain whether or not it will be possible to align models constructed like this, and I have some worries that halfhearted attempts to make models secure will reassure people more than they should. Nevertheless, my guess is that it’s more dignified for us to have these sorts of reporting systems than to not have them; for cybersecurity experts to be involved in the creation of the systems surrounding AI than for them to not be involved; for people interested in the large-scale problems to be contributing (in dignity-increasing ways) to companies which are likely to be involved in causing those large-scale problems than not. I think those are potentially all controversial (especially the last), and so am interested in talking about them.
Nevertheless, my guess is that it’s more dignified for us to have these sorts of reporting systems than to not have them
Can you elaborate on this one? (I don’t have a strong opinion one way or the other; seems unclear to me. If this system had been in place before Bing, and it had properly fixed all the issues with Bing, it seems plausible to me that this would’ve been net negative for x-risk reduction. The media coverage on Bing seems good for getting people to be more concerned about alignment and AI safety, reducing trust in a “we’ll just figure it out as we go” mentality, increasing security mindset, and providing a wider platform for alignment folks.)
for cybersecurity experts to be involved in the creation of the systems surrounding AI than for them to not be involved
This seems good to me, all else equal (and might outweigh the point above).
for people interested in the large-scale problems to be contributing (in dignity-increasing ways) to companies which are likely to be involved in causing those large-scale problems than not
This also seems good to me, though I agree that the case isn’t clear. It also likely depends a lot on the individual and their counterfactual (e.g., some people might have strong comparative advantages in independent research or certain kinds of coordination/governance roles that require being outside of a lab).
reducing trust in a “we’ll just figure it out as we go” mentality
I think reducing trust in “we’ll just figure it out as we go” while still operating under that mentality is bad; I think steps like this are how we stop operating under that mentality. [Was it the case that nothing like this would happen in a widespread way until high profile failures, because of the lack of external pressure? Maybe.]
I think users being able to report problems doesn’t help with x-risk-related problems. (The issue will be when these systems stop sending bug reports!) I nevertheless think having systems for users to report issues will be a step in the right direction, even if it doesn’t get us all the way.
It also likely depends a lot on the individual and their counterfactual (e.g., some people might have strong comparative advantages in independent research or certain kinds of coordination/governance roles that require being outside of a lab).
This seems right and is good to point out; but it wouldn’t surprise me if the right place for a lot of safety-minded folk to be is non-profits with broad government/industry backing that serve valuable infrastructure roles, rather than just standing athwart history yelling “stop!”. [How do we get that backing? Well, that’s the challenge.]
The argument I see against this is that voluntary security that’s short term useful can be discarded once it’s no longer so, whereas security driven by public pressure or regulation can’t. If a lab was had great practices for forever and then dropped them, there would be much less pressure to revert than if they’d previously had huge security incidents.
For instance, we might want to focus on public pressure for 1-2 years, then switch gears towards security
I agree that you want the regulation to have more teeth than just being an industry cartel. I’m not sure I agree on the ‘switching gears’ point—it seems to me like we can do both simultaneously (tho not as well), and may not have the time to do them sequentially.
I am uncertain whether or not it will be possible to align models constructed like this, and I have some worries that halfhearted attempts to make models secure will reassure people more than they should. Nevertheless, my guess is that it’s more dignified for us to have these sorts of reporting systems than to not have them; for cybersecurity experts to be involved in the creation of the systems surrounding AI than for them to not be involved; for people interested in the large-scale problems to be contributing (in dignity-increasing ways) to companies which are likely to be involved in causing those large-scale problems than not. I think those are potentially all controversial (especially the last), and so am interested in talking about them.
Can you elaborate on this one? (I don’t have a strong opinion one way or the other; seems unclear to me. If this system had been in place before Bing, and it had properly fixed all the issues with Bing, it seems plausible to me that this would’ve been net negative for x-risk reduction. The media coverage on Bing seems good for getting people to be more concerned about alignment and AI safety, reducing trust in a “we’ll just figure it out as we go” mentality, increasing security mindset, and providing a wider platform for alignment folks.)
This seems good to me, all else equal (and might outweigh the point above).
This also seems good to me, though I agree that the case isn’t clear. It also likely depends a lot on the individual and their counterfactual (e.g., some people might have strong comparative advantages in independent research or certain kinds of coordination/governance roles that require being outside of a lab).
I think reducing trust in “we’ll just figure it out as we go” while still operating under that mentality is bad; I think steps like this are how we stop operating under that mentality. [Was it the case that nothing like this would happen in a widespread way until high profile failures, because of the lack of external pressure? Maybe.]
I think users being able to report problems doesn’t help with x-risk-related problems. (The issue will be when these systems stop sending bug reports!) I nevertheless think having systems for users to report issues will be a step in the right direction, even if it doesn’t get us all the way.
This seems right and is good to point out; but it wouldn’t surprise me if the right place for a lot of safety-minded folk to be is non-profits with broad government/industry backing that serve valuable infrastructure roles, rather than just standing athwart history yelling “stop!”. [How do we get that backing? Well, that’s the challenge.]
The argument I see against this is that voluntary security that’s short term useful can be discarded once it’s no longer so, whereas security driven by public pressure or regulation can’t. If a lab was had great practices for forever and then dropped them, there would be much less pressure to revert than if they’d previously had huge security incidents.
For instance, we might want to focus on public pressure for 1-2 years, then switch gears towards security
I agree that you want the regulation to have more teeth than just being an industry cartel. I’m not sure I agree on the ‘switching gears’ point—it seems to me like we can do both simultaneously (tho not as well), and may not have the time to do them sequentially.