Just wanted to say this question pinged my “huh, this is a neat question” detector. (I consider myself pretty confused about the broader topic so don’t have good answers, but found your question neat because it tackled a problem at a level that where it still personally feels meaty to me)
Doublechecking my understanding of your “implied question” – it sounds like what you want is a reward function that is simple, but somehow analogous to the complexity of human value? And it sounds like maybe the underspecified bit is “you, as a human, have some vague notion that some sorts of value-generation are ‘cheating’”, and your true goal is “the most interesting outcome that doesn’t feel like Somehow Cheating to me?”
Some thoughts I have at the moment:
I think the actual direct comparison between Game of Life and Real Life is “one cell == an atom” (or some small physical particular”, rather than one cell representing a bit of sentience, or even a single biological cell). I’d expect truly analogous “value” in Game of Life to look less like ‘stuff happening’ and more like “Particular types of patterns are more common, without being too repetitive.” (i.e. in real life, I don’t optimize “atoms moving around”, I optimize for something more like “larger patterns of atoms doing particular things”)
Assuming “one cell = minimum viable bit of value-weight”, there are still some questions I’d struggle with here that seem analogous to my philosophical confusions about human-value. How ‘good’ is it to have a repeating loop of, say, a billion flourishing human lives? Is it better than a billion human lives that happens exactly once and ends?”
To some degree, I think “moral value” (or, “value”) in real life is about the process of solving “what is valuable and how to do I get it?”, and gaining more value depends somewhat on that question being “unsolved”. I’m not sure, if I knew exactly what value was with infinite compute, that there would be as much point to actually having it.
If I’m taking your current (implied?) assumptions of “one cell == minimum viable value-weight”, and “the goal is to have as simple a function you can that sort of ‘feels like it’s getting at something analogous to human value”, I think the answer of “maximize the number of unique states that happen before things start looping” (maybe with a finite board, so that gliders-guns can’t “game” the system by generating technically infinite ‘variety’?)
In this case it might mean that the system optimizes either for true continuous novelty, or the longest possible loop?
I do suspect that figuring out which of your assumptions are “valid” is an important part of the question here.
Thanks for the detailed response. Meta: It feels good to receive a signal that this was a ‘neat question’, or in general, a positive-seeming contribution to LW. I have several unexpressed thoughts from fear of not actually creating value for the community.
it sounds like what you want is a reward function that is simple, but somehow analogous to the complexity of human value? And it sounds like maybe the underspecified bit is “you, as a human, have some vague notion that some sorts of value-generation are ‘cheating’”, and your true goal is “the most interesting outcome that doesn’t feel like Somehow Cheating to me?”
This is about correct. A secondary reason for simplicity is to attempt to be computationally efficient (for the environment that generates the reward).
“one cell == an atom”
I can see that as being a case, but, again, computational tractability. Actual interesting structures in GoL can be incredibly massive, for example, this Tetris Proccessor (2,940,928 x 10,295,296 cells). Maybe there’s some middle ground between truly fascinating GoL patterns made from atoms and my cell-as-a-planet level abstraction, as suggested by Daniel Kokotajlo in another comment.
How ‘good’ is it to have a repeating loop of, say, a billion flourishing human lives? Is it better than a billion human lives that happens exactly once and ends?
Wouldn’t most argue that, in general, more life is better than less life? (but I see some of my hidden assumptions here, such as “the ’life’s we’re talking about here are qualitatively similar e.g. the repeating life doesn’t feel trapped/irrelevant/futile because it is aware that it is repeating”)
I think “moral value” (or, “value”) in real life is about the process of solving “what is valuable and how to do I get it?”
I don’t disagree, but I also think this is sort of outside the scope of finite-space cellular automata.
In this case it might mean that the system optimizes either for true continuous novelty, or the longest possible loop?
Given the constraints of CA, I’m mostly in agreement with this suggestion. Thanks.
I do suspect that figuring out which of your assumptions are “valid” is an important part of the question here.
Yes, I agree. Concretely, to me it looks like ‘if I saw X happening in GoL, and I imagine being a sentient being (at some scale, TBD) in that world (well, with my human values), then would I want to live in it?’, and translating that into some rules that promote or disincentivise X.
I do think taking this approach is broadly difficult, though. Perhaps its worth getting a v0.1 out with reward being tied to instantiations of novel states to begin with, and then seeing whether to build on that or try a new approach.
Just wanted to say this question pinged my “huh, this is a neat question” detector. (I consider myself pretty confused about the broader topic so don’t have good answers, but found your question neat because it tackled a problem at a level that where it still personally feels meaty to me)
Doublechecking my understanding of your “implied question” – it sounds like what you want is a reward function that is simple, but somehow analogous to the complexity of human value? And it sounds like maybe the underspecified bit is “you, as a human, have some vague notion that some sorts of value-generation are ‘cheating’”, and your true goal is “the most interesting outcome that doesn’t feel like Somehow Cheating to me?”
Some thoughts I have at the moment:
I think the actual direct comparison between Game of Life and Real Life is “one cell == an atom” (or some small physical particular”, rather than one cell representing a bit of sentience, or even a single biological cell). I’d expect truly analogous “value” in Game of Life to look less like ‘stuff happening’ and more like “Particular types of patterns are more common, without being too repetitive.” (i.e. in real life, I don’t optimize “atoms moving around”, I optimize for something more like “larger patterns of atoms doing particular things”)
Assuming “one cell = minimum viable bit of value-weight”, there are still some questions I’d struggle with here that seem analogous to my philosophical confusions about human-value. How ‘good’ is it to have a repeating loop of, say, a billion flourishing human lives? Is it better than a billion human lives that happens exactly once and ends?”
To some degree, I think “moral value” (or, “value”) in real life is about the process of solving “what is valuable and how to do I get it?”, and gaining more value depends somewhat on that question being “unsolved”. I’m not sure, if I knew exactly what value was with infinite compute, that there would be as much point to actually having it.
If I’m taking your current (implied?) assumptions of “one cell == minimum viable value-weight”, and “the goal is to have as simple a function you can that sort of ‘feels like it’s getting at something analogous to human value”, I think the answer of “maximize the number of unique states that happen before things start looping” (maybe with a finite board, so that gliders-guns can’t “game” the system by generating technically infinite ‘variety’?)
In this case it might mean that the system optimizes either for true continuous novelty, or the longest possible loop?
I do suspect that figuring out which of your assumptions are “valid” is an important part of the question here.
Thanks for the detailed response. Meta: It feels good to receive a signal that this was a ‘neat question’, or in general, a positive-seeming contribution to LW. I have several unexpressed thoughts from fear of not actually creating value for the community.
This is about correct. A secondary reason for simplicity is to attempt to be computationally efficient (for the environment that generates the reward).
I can see that as being a case, but, again, computational tractability. Actual interesting structures in GoL can be incredibly massive, for example, this Tetris Proccessor (2,940,928 x 10,295,296 cells). Maybe there’s some middle ground between truly fascinating GoL patterns made from atoms and my cell-as-a-planet level abstraction, as suggested by Daniel Kokotajlo in another comment.
Wouldn’t most argue that, in general, more life is better than less life? (but I see some of my hidden assumptions here, such as “the ’life’s we’re talking about here are qualitatively similar e.g. the repeating life doesn’t feel trapped/irrelevant/futile because it is aware that it is repeating”)
I don’t disagree, but I also think this is sort of outside the scope of finite-space cellular automata.
Given the constraints of CA, I’m mostly in agreement with this suggestion. Thanks.
Yes, I agree. Concretely, to me it looks like ‘if I saw X happening in GoL, and I imagine being a sentient being (at some scale, TBD) in that world (well, with my human values), then would I want to live in it?’, and translating that into some rules that promote or disincentivise X.
I do think taking this approach is broadly difficult, though. Perhaps its worth getting a v0.1 out with reward being tied to instantiations of novel states to begin with, and then seeing whether to build on that or try a new approach.
Ah, if computational tractability is a key constraint that makes lots of sense.