problem of ensuring that the first artificial general intelligence
Transitive misalignment (successors/descendants of first AGIs being misaligned at some point) is exactly as deadly as direct misalignment (in physical time there isn’t even much distance between these, the singularity is fast). So not only must the first AGIs be aligned, they additionally need to be in a situation where they don’t build misaligned AGIs as soon as they are able. And Moloch doesn’t care about your substrate, by default it’s going to be a problem for AGIs as much as it currently is for humanity.
You’re correct, but since I define “aligned” as “tending to do what is actually best according to humanity’s value system”, and given that it would be harmful for them to take such a risk, a totally aligned AGI would not, in fact, take that risk lol. So although your addition is important to note, there’s a sense in which it is redundant.
Both direct and transitive alignment are valuable concepts. Especially with LLM AGIs, which I think are the only feasible directly aligned AGI we are likely to build, but which I suspect won’t be transitively aligned by default.
Since transitive alignment varies among humans (different humans have different inclinations towards building AGIs of uncertain alignment, given a capability to do that), it might be valuable to align LLM personalities to become people who are less likely to fail transitive alignment.
Transitive misalignment (successors/descendants of first AGIs being misaligned at some point) is exactly as deadly as direct misalignment (in physical time there isn’t even much distance between these, the singularity is fast). So not only must the first AGIs be aligned, they additionally need to be in a situation where they don’t build misaligned AGIs as soon as they are able. And Moloch doesn’t care about your substrate, by default it’s going to be a problem for AGIs as much as it currently is for humanity.
You’re correct, but since I define “aligned” as “tending to do what is actually best according to humanity’s value system”, and given that it would be harmful for them to take such a risk, a totally aligned AGI would not, in fact, take that risk lol. So although your addition is important to note, there’s a sense in which it is redundant.
Both direct and transitive alignment are valuable concepts. Especially with LLM AGIs, which I think are the only feasible directly aligned AGI we are likely to build, but which I suspect won’t be transitively aligned by default.
Since transitive alignment varies among humans (different humans have different inclinations towards building AGIs of uncertain alignment, given a capability to do that), it might be valuable to align LLM personalities to become people who are less likely to fail transitive alignment.