I appreciate this sentiment, and do think there’s a dangerous, bad reduction of values to valence grounded in the operation of the brain that ignores much of what we care about, and that that extra stuff that we care about is also expressed as valence grounded in the operation of the brain. All the concerns you bring up must be computed somewhere, that somewhere is human brains, and if what those brains do is “minimize prediction error” then those concerns are also expressions of prediction error minimization. This to me is what’s exciting about a grounding like the one I’m considering: it’s embedded in the world in a way that means we don’t leave anything out (unless there’s some kind of “spooky” physics happening that we can’t observe, which I consider unlikely) such that we naturally capture all the complexity you’re concerned about, though it may take quite a bit to compute it all.
The difficulty is that we want to take human values and put them into an AI that doesn’t do prediction error minimization in the human sense, but instead does superhumanly competent search and planning. But if you have a specific scheme in mind that could outperform humans without leaving anything out, I’d be super interested.
As of yet, no, although this brings up an interesting point, which is that I’m looking at this stuff to find a precise grounding because I don’t think we can develop a plan that will work to our satisfaction without it. I realize lots of people disagree with me here, thinking that we need the method first and the value grounding will be worked out instrumentally by the method, but I dislike this because it makes it hard to verify the method than by observing what an AI produced by that method does, and this is a dangerous verification method due to the risk of a “treacherous” turn that isn’t so much treacherous as it is the one that could have been predicted if we bothered to have a solid theory of what the method we were using really implied in terms of the thing we cared about, if we had bothered to know what the thing we cared about fundamentally was.
Also I suspect we will be able to think of our desired AI in terms of control systems and set points, because I think we can do this for everything that’s “alive”, although it may not be the most natural abstraction to use for its architecture.
I appreciate this sentiment, and do think there’s a dangerous, bad reduction of values to valence grounded in the operation of the brain that ignores much of what we care about, and that that extra stuff that we care about is also expressed as valence grounded in the operation of the brain. All the concerns you bring up must be computed somewhere, that somewhere is human brains, and if what those brains do is “minimize prediction error” then those concerns are also expressions of prediction error minimization. This to me is what’s exciting about a grounding like the one I’m considering: it’s embedded in the world in a way that means we don’t leave anything out (unless there’s some kind of “spooky” physics happening that we can’t observe, which I consider unlikely) such that we naturally capture all the complexity you’re concerned about, though it may take quite a bit to compute it all.
The difficulty is that we want to take human values and put them into an AI that doesn’t do prediction error minimization in the human sense, but instead does superhumanly competent search and planning. But if you have a specific scheme in mind that could outperform humans without leaving anything out, I’d be super interested.
As of yet, no, although this brings up an interesting point, which is that I’m looking at this stuff to find a precise grounding because I don’t think we can develop a plan that will work to our satisfaction without it. I realize lots of people disagree with me here, thinking that we need the method first and the value grounding will be worked out instrumentally by the method, but I dislike this because it makes it hard to verify the method than by observing what an AI produced by that method does, and this is a dangerous verification method due to the risk of a “treacherous” turn that isn’t so much treacherous as it is the one that could have been predicted if we bothered to have a solid theory of what the method we were using really implied in terms of the thing we cared about, if we had bothered to know what the thing we cared about fundamentally was.
Also I suspect we will be able to think of our desired AI in terms of control systems and set points, because I think we can do this for everything that’s “alive”, although it may not be the most natural abstraction to use for its architecture.