MinusGix comments on What in your opinion is the biggest open problem in AI alignment?

MinusGix 3 Jul 2023 23:33 UTC
3 points
2
Just 3 with a dash of 1?
I don’t understand the specific appeal of complete reproductive freedom. It is desirable to have that freedom, in the same way it is desirable to be allowed to do whatever I feel like doing. However, that more general heading of arbitrary freedom has the answer of ‘you do have to draw lines somewhere’. In a good future, I’m not allowed to harm a person (nonconsensually), and I can’t requisition all matter in the available universe for my personal projects without ~enough of the population endorsing it, and I can’t reproduce / construct arbitrary amounts and arbitrary new people. (Constructing arbitrary people obviously has moral issues too, so it has cutoff lines at both the ‘moral issues’ and ‘resource limitations even at the scale’)

I think economic freedom looks significantly different in a post aligned AGI world than it does now. Like, there is still some concepts of trade going on, but I expect often running in the background.

I’m not sure why you think the ‘default trajectory’ is 1+2. Aligned AGI seems to most likely go for some mix of 1+3, while pointing at the more wider/specific cause area of ‘what humans want’. A paperclipper just says null to all of those, because it isn’t giving humans the right to create new people or any economic freedom unless they manage to be in a position to actually-trade and have something worth offering.

I don’t think that what we want to align it to is that pertinent a question at this stage? In the specifics, that is, obviously human values in some manner.
I expect that we want to align it via some process that lets it figure our values out without needing to decide on much of it now, ala CEV.

Having a good theory of human values beforehand is useful for starting down a good track and verifying it, of course.

I think the generalized problem of ‘figure out how to make a process that is corrigible and learns our values in some form that is robust’ is easier than figuring out a decent specification of our values.

(Though simpler bounded-task agents seem likely before we manage that, so my answer to the overall question is ‘how do we make approximately corrigible powerful bounded-task agents to get to a position where humanity can safely focus on producing aligned AGI’)