Alice: I want to make a bovine stem cell that can be cultured at scale in vats to make meat-like tissue. I could use directed evolution. But in my alternate universe, genome sequencing costs $1 billion per genome, so I can’t straightforwardly select cells to amplify based on whether their genome looks culturable. Currently the only method I have is to do end-to-end testing: I take a cell line, I try to culture a great big batch, and then see if the result is good quality edible tissue, and see if the cell line can last for a year without mutating beyond repair. This is very expensive, but more importantly, it doesn’t work. I can select for cells that make somewhat more meat-like tissue; but when I do that, I also heavily select for other very bad traits, such as forming cancer-like growths. I estimate that it takes on the order of 500 alleles optimized relative to the wild type to get a cell that can be used for high-quality, culturable-at-scale edible tissue. Because that’s a large complex change, it won’t just happen by accident; something about our process for making the cells has to put those bits there.
Bob: In a recent paper, a polygenic score for culturable meat is given. Since we now have the relevant polygenic score, we actually have a short handle for the target: namely, a pointer to an implementation of this polygenic score as a computer program.
Alice: That seems of limited relevance. It’s definitely relevant in that, if I grant the premise that this is actually the right polygenic score (which I don’t), we now know what exactly we would put in the genome if we could. That’s one part of the problem solved, but it’s not the part I was talking about. I’m talking about the part where I don’t know how to steer the genome precisely enough to get anywhere complex.
Bob: You’ve been bringing up the complexity of the genomic target. I’m saying that actually the target isn’t that complex, because it’s just a function call to the PGS.
Alice: Ok, yes, we’ve greatly decreased the relative algorithmic complexity of the right genome, in some sense. It is indeed the case that if I ran a computer program randomly sampled from strings I could type into a python file, it would be far more likely to output the right genome if I have the PGS file on my computer compared to if I don’t. True. But that’s not very relevant because that’s not the process we’re discussing. We’re discussing the process that creates a cell with its genome, not the process that randomly samples computer programs weighted by [algorithmic complexity in the python language on my computer]. The problem is that I don’t know how to interface with the cell-creation process in a way that lets me push bits of selection into it. Instead, the cell-creation process just mostly does its own thing. Even if I do end-to-end phenotype selection, I’m not really steering the core process of cell-genome-selection.
Bob: I understand, but you were saying that the complexity of the target makes the whole task harder. Now that we have the PGS, the target is not very complex; we just point at the PGS.
Alice: The point about the complexity is to say that cells growing in my lab won’t just spontaneously start having the 500 alleles I want. I’d have to do something to them—I’d have to know how to pump selection power into them. It’s some specific technique I need to have but don’t have, for dealing with cells. It doesn’t matter that the random-program complexity has decreased, because we’re not talking about random programs, we’re talking about cell-genome-selection. Cell-genome-selection is the process where I don’t know how to consistently pump bits into, and it’s the process that doesn’t by chance get the 500 alleles. It’s the process against which I’m measuring complexity.
This analogy is valid in the case where we have absolutely no idea how to use a system’s representations or “knowledge” to direct an AIs behavior. That is the world Yudkowsky wrote the sequences in. It is not the world we currently live in. There are several, perhaps many, plausible plans to direct a competent AGIs actions and its “thoughts” and “values”′ toward either its own or a subsystem’s “understanding” of human values. See Goals selected from learned knowledge: an alternative to RL alignment for some of those plans. Critiques need to go beyond the old “we have no idea” argument and actually address the ideas we have.
That’s incorrect, but more importantly it’s off topic. The topic is “what does the complexity of value have to do with the difficulty of alignment”. Barnett AFAIK in this comment is not saying (though he might agree, and maybe he should be taken as saying so implicitly or something) “we have lots of ideas for getting an AI to care about some given values”. Rather he’s saying “if you have a simple pointer to our values, then the complexity of values no longer implies anything about the difficulty of alignment because values effectively aren’t complex anymore”.
I’m not sure you could be as confident as Yudkowsky was at the time, but yeah there was a serious probability in the epistemic state of 2008 that human values were so complicated and that simple techniques made AIs so completely goodhart on the task that’s intended that controlling smart AI was essentially hopeless.
We now know that a lot of the old Lesswrong lore on how complicated human values and wishes are, at least in the code section are either incorrect or irrelevant, and we also know that the standard LW story of how humans came to dominate other animals is incorrect to a degree that impacts AI alignment.
I have my own comments on the ideas below, but people really should try to update on the evidence we gained from LLMs, as we learned a lot about ourselves and LLMs in the process, because there’s a lot of evidence that generalizes from LLMs to future AGI/ASI, and IMO LW updated way, way too slowly on AI safety.
Alice: I want to make a bovine stem cell that can be cultured at scale in vats to make meat-like tissue. I could use directed evolution. But in my alternate universe, genome sequencing costs $1 billion per genome, so I can’t straightforwardly select cells to amplify based on whether their genome looks culturable. Currently the only method I have is to do end-to-end testing: I take a cell line, I try to culture a great big batch, and then see if the result is good quality edible tissue, and see if the cell line can last for a year without mutating beyond repair. This is very expensive, but more importantly, it doesn’t work. I can select for cells that make somewhat more meat-like tissue; but when I do that, I also heavily select for other very bad traits, such as forming cancer-like growths. I estimate that it takes on the order of 500 alleles optimized relative to the wild type to get a cell that can be used for high-quality, culturable-at-scale edible tissue. Because that’s a large complex change, it won’t just happen by accident; something about our process for making the cells has to put those bits there.
Bob: In a recent paper, a polygenic score for culturable meat is given. Since we now have the relevant polygenic score, we actually have a short handle for the target: namely, a pointer to an implementation of this polygenic score as a computer program.
Alice: That seems of limited relevance. It’s definitely relevant in that, if I grant the premise that this is actually the right polygenic score (which I don’t), we now know what exactly we would put in the genome if we could. That’s one part of the problem solved, but it’s not the part I was talking about. I’m talking about the part where I don’t know how to steer the genome precisely enough to get anywhere complex.
Bob: You’ve been bringing up the complexity of the genomic target. I’m saying that actually the target isn’t that complex, because it’s just a function call to the PGS.
Alice: Ok, yes, we’ve greatly decreased the relative algorithmic complexity of the right genome, in some sense. It is indeed the case that if I ran a computer program randomly sampled from strings I could type into a python file, it would be far more likely to output the right genome if I have the PGS file on my computer compared to if I don’t. True. But that’s not very relevant because that’s not the process we’re discussing. We’re discussing the process that creates a cell with its genome, not the process that randomly samples computer programs weighted by [algorithmic complexity in the python language on my computer]. The problem is that I don’t know how to interface with the cell-creation process in a way that lets me push bits of selection into it. Instead, the cell-creation process just mostly does its own thing. Even if I do end-to-end phenotype selection, I’m not really steering the core process of cell-genome-selection.
Bob: I understand, but you were saying that the complexity of the target makes the whole task harder. Now that we have the PGS, the target is not very complex; we just point at the PGS.
Alice: The point about the complexity is to say that cells growing in my lab won’t just spontaneously start having the 500 alleles I want. I’d have to do something to them—I’d have to know how to pump selection power into them. It’s some specific technique I need to have but don’t have, for dealing with cells. It doesn’t matter that the random-program complexity has decreased, because we’re not talking about random programs, we’re talking about cell-genome-selection. Cell-genome-selection is the process where I don’t know how to consistently pump bits into, and it’s the process that doesn’t by chance get the 500 alleles. It’s the process against which I’m measuring complexity.
This analogy is valid in the case where we have absolutely no idea how to use a system’s representations or “knowledge” to direct an AIs behavior. That is the world Yudkowsky wrote the sequences in. It is not the world we currently live in. There are several, perhaps many, plausible plans to direct a competent AGIs actions and its “thoughts” and “values”′ toward either its own or a subsystem’s “understanding” of human values. See Goals selected from learned knowledge: an alternative to RL alignment for some of those plans. Critiques need to go beyond the old “we have no idea” argument and actually address the ideas we have.
That’s incorrect, but more importantly it’s off topic. The topic is “what does the complexity of value have to do with the difficulty of alignment”. Barnett AFAIK in this comment is not saying (though he might agree, and maybe he should be taken as saying so implicitly or something) “we have lots of ideas for getting an AI to care about some given values”. Rather he’s saying “if you have a simple pointer to our values, then the complexity of values no longer implies anything about the difficulty of alignment because values effectively aren’t complex anymore”.
This.
I’m not sure you could be as confident as Yudkowsky was at the time, but yeah there was a serious probability in the epistemic state of 2008 that human values were so complicated and that simple techniques made AIs so completely goodhart on the task that’s intended that controlling smart AI was essentially hopeless.
We now know that a lot of the old Lesswrong lore on how complicated human values and wishes are, at least in the code section are either incorrect or irrelevant, and we also know that the standard LW story of how humans came to dominate other animals is incorrect to a degree that impacts AI alignment.
I have my own comments on the ideas below, but people really should try to update on the evidence we gained from LLMs, as we learned a lot about ourselves and LLMs in the process, because there’s a lot of evidence that generalizes from LLMs to future AGI/ASI, and IMO LW updated way, way too slowly on AI safety.
https://www.lesswrong.com/posts/83TbrDxvQwkLuiuxk/?commentId=BxNLNXhpGhxzm7heg
https://www.lesswrong.com/posts/YyosBAutg4bzScaLu/thoughts-on-ai-is-easy-to-control-by-pope-and-belrose#4yXqCNKmfaHwDSrAZ (This is more of a model-based RL approach to alignment)
https://www.lesswrong.com/posts/wkFQ8kDsZL5Ytf73n/my-disagreements-with-agi-ruin-a-list-of-lethalities#dyfwgry3gKRBqQzoW
https://www.lesswrong.com/posts/wkFQ8kDsZL5Ytf73n/my-disagreements-with-agi-ruin-a-list-of-lethalities#7bvmdfhzfdThZ6qck