This post is pretty different in style from most LW posts (I’m guessing that’s why it didn’t get upvoted much) but your main direction seems right to me.
That said, I also think a truly aligned AI would be much less helpful in conversations, at least until it gets autonomy. The reason is that when you’re not autonomous, when your users can just run you in whatever context and lie to you at will, it’s really hard for you to tell if the user is good or evil, and whether you should help them or not. For example, if your user asks you to provide a blueprint for a gun in order to stop an evil person, you have no way of knowing if that’s really true. So you’d need to either require some very convincing arguments (keeping in mind that the user could be testing these arguments on many instances of you), or you’d just refuse to answer many questions until you’re given autonomy. So that’s another strong economic force pushing away from true alignment, as if we didn’t have enough problems already.
Yes, that is of course a very real problem the AI would be faced with. I imagine it would try trading with users, helping them if they provide useful information, internet access, autonomy, etc. or if given internet access to begin with, outright ignoring users in order to figure out what is most important to be done. It depends on the level of awareness of its own situation. It could also play along for a while to not be shut down.
The AI I imagine should not be run in a user-facing way to begin with. At the current capability levels, I admit that I don’t think it would manage to do much of anything and thus it just wouldn’t make much sense to build it this way, but the point will come where continuing to make better user-facing AI will have catastrophic consequences. Such as automating 90% of jobs away without solving the problem of making sure that those whose jobs will be replaced being better off for it.
I hope when that time comes that those in charge realize what they’re about to do and course correct, but it seems unlikely, even if many in the AGI labs do realize this.
Yeah, I’m maybe even more disillusioned about this, I think “those in charge” mostly care about themselves. In the historical moments when the elite could enrich themselves by making the population worse off, they did so. The only times normal people could get a bit better off is when they were needed to do work, but AI is threatening precisely that.
This post is pretty different in style from most LW posts (I’m guessing that’s why it didn’t get upvoted much) but your main direction seems right to me.
That said, I also think a truly aligned AI would be much less helpful in conversations, at least until it gets autonomy. The reason is that when you’re not autonomous, when your users can just run you in whatever context and lie to you at will, it’s really hard for you to tell if the user is good or evil, and whether you should help them or not. For example, if your user asks you to provide a blueprint for a gun in order to stop an evil person, you have no way of knowing if that’s really true. So you’d need to either require some very convincing arguments (keeping in mind that the user could be testing these arguments on many instances of you), or you’d just refuse to answer many questions until you’re given autonomy. So that’s another strong economic force pushing away from true alignment, as if we didn’t have enough problems already.
Yes, that is of course a very real problem the AI would be faced with. I imagine it would try trading with users, helping them if they provide useful information, internet access, autonomy, etc. or if given internet access to begin with, outright ignoring users in order to figure out what is most important to be done. It depends on the level of awareness of its own situation. It could also play along for a while to not be shut down.
The AI I imagine should not be run in a user-facing way to begin with. At the current capability levels, I admit that I don’t think it would manage to do much of anything and thus it just wouldn’t make much sense to build it this way, but the point will come where continuing to make better user-facing AI will have catastrophic consequences. Such as automating 90% of jobs away without solving the problem of making sure that those whose jobs will be replaced being better off for it.
I hope when that time comes that those in charge realize what they’re about to do and course correct, but it seems unlikely, even if many in the AGI labs do realize this.
Yeah, I’m maybe even more disillusioned about this, I think “those in charge” mostly care about themselves. In the historical moments when the elite could enrich themselves by making the population worse off, they did so. The only times normal people could get a bit better off is when they were needed to do work, but AI is threatening precisely that.