It might not matter in the grand scheme of things, but my comment above has been on my mind for the last few days. I didn’t do a good job of demonstrating the thing I set out to argue for, that effect X is negligible and can be ignored. That’s the first step in any physics problem, since there are infinitely many effects that could be considered, but only enough time to compute a few of them in detail.
The first respondent made the mistake of using the challenger’s intentions as data—she knew it was a puzzle that was expected to be solvable in a reasonable amount of time, so she disregarded defects that would be too difficult to calculate. That can be a useful criterion in video games (“how well does the game explain itself?”), it can be exploited in academic tests, though it defeats the purpose to do so, and it’s useless in real-world problems. Nature doesn’t care how easy or hard a problem is.
I didn’t do a good job demonstrating that X is negligible compared to Y because I didn’t resolve enough variables to put them into the same units. If I had shown that X’ and Y’ are both in units of energy and X’ scales linearly with a parameter that is much larger than the equivalent in Y’, while everything else is order 1, that would have been a good demonstration.
If I were just trying to solve the problem and not prove it, I wouldn’t have bothered because I knew that X is negligible than Y without even a scaling argument. Why? The answer physicists give in this situation is “physics intuition,” which may sound like an evasion. But in other contexts, you find physicists talking about “training their intuition,” which is not something that birds or clairvoyants do with their instincts or intuitions. Physicists intentionally use the neural networks in their heads to get familiarity with how big certain quantities are relative to each other. When I thought about effects X and Y in the blacked-out comment above, I was using familiarity with the few-foot drop the track represented, the size and weight of a ball you can hold in your hand, etc. I was implicitly bringing prior experience into this problem, so it wasn’t really “getting it right on the first try.” It wasn’t the first try.
It might be that any problem has some overlap with previous problems—I’m not sure that a problem could be posed in an intelligible way if it were truly novel. This article was supposed to be a metaphor for getting AI to understand human values. Okay, we’ve never done that before. But AI systems have some incomplete overlap with how “System 1” intelligence works in human brains, some overlap with a behavioralist conditioned response, and some overlap with conventional curve-fitting (regression). Also, we somehow communicate values with other humans, defining the culture in which we live. We can tell how much they’re instinctive versus learned by how isolated cultures are similar or different.
I think this comment would get too long if I continue down this line of thought, but don’t we equalize our values by trying to please each other? We (humans) are a bit dog-like in our social interactions. More than trying to form a logically consistent ethic, we continually keep tabs on what other people think of us and try to stay “good” in their eyes, even if that means inconsistency. Maybe AI needs to be optimized on sentiment analysis, so when it starts trying to kill all the humans to end cancer, it notices that it’s making us unhappy, or whimpers in response to a firm “BAD DOG” and tap on the nose...
Sorry—I addressed one bout of undisciplined thinking (in physics) and then tacked on a whole lot more undisciplined thinking in a different subject (AI alignment, which I haven’t thought about nearly as much as people here have).
I could delete the last two paragraphs, but I want to think about it more and maybe bring it up in a place that’s dedicated to the subject.
It might not matter in the grand scheme of things, but my comment above has been on my mind for the last few days. I didn’t do a good job of demonstrating the thing I set out to argue for, that effect X is negligible and can be ignored. That’s the first step in any physics problem, since there are infinitely many effects that could be considered, but only enough time to compute a few of them in detail.
The first respondent made the mistake of using the challenger’s intentions as data—she knew it was a puzzle that was expected to be solvable in a reasonable amount of time, so she disregarded defects that would be too difficult to calculate. That can be a useful criterion in video games (“how well does the game explain itself?”), it can be exploited in academic tests, though it defeats the purpose to do so, and it’s useless in real-world problems. Nature doesn’t care how easy or hard a problem is.
I didn’t do a good job demonstrating that X is negligible compared to Y because I didn’t resolve enough variables to put them into the same units. If I had shown that X’ and Y’ are both in units of energy and X’ scales linearly with a parameter that is much larger than the equivalent in Y’, while everything else is order 1, that would have been a good demonstration.
If I were just trying to solve the problem and not prove it, I wouldn’t have bothered because I knew that X is negligible than Y without even a scaling argument. Why? The answer physicists give in this situation is “physics intuition,” which may sound like an evasion. But in other contexts, you find physicists talking about “training their intuition,” which is not something that birds or clairvoyants do with their instincts or intuitions. Physicists intentionally use the neural networks in their heads to get familiarity with how big certain quantities are relative to each other. When I thought about effects X and Y in the blacked-out comment above, I was using familiarity with the few-foot drop the track represented, the size and weight of a ball you can hold in your hand, etc. I was implicitly bringing prior experience into this problem, so it wasn’t really “getting it right on the first try.” It wasn’t the first try.
It might be that any problem has some overlap with previous problems—I’m not sure that a problem could be posed in an intelligible way if it were truly novel. This article was supposed to be a metaphor for getting AI to understand human values. Okay, we’ve never done that before. But AI systems have some incomplete overlap with how “System 1” intelligence works in human brains, some overlap with a behavioralist conditioned response, and some overlap with conventional curve-fitting (regression). Also, we somehow communicate values with other humans, defining the culture in which we live. We can tell how much they’re instinctive versus learned by how isolated cultures are similar or different.
I think this comment would get too long if I continue down this line of thought, but don’t we equalize our values by trying to please each other? We (humans) are a bit dog-like in our social interactions. More than trying to form a logically consistent ethic, we continually keep tabs on what other people think of us and try to stay “good” in their eyes, even if that means inconsistency. Maybe AI needs to be optimized on sentiment analysis, so when it starts trying to kill all the humans to end cancer, it notices that it’s making us unhappy, or whimpers in response to a firm “BAD DOG” and tap on the nose...
Sorry—I addressed one bout of undisciplined thinking (in physics) and then tacked on a whole lot more undisciplined thinking in a different subject (AI alignment, which I haven’t thought about nearly as much as people here have).
I could delete the last two paragraphs, but I want to think about it more and maybe bring it up in a place that’s dedicated to the subject.