Some possible paths to creating aligned AGI involve designing systems with certain cognitive properties, like corrigiblility or myopia. We currently don’t know how to create sufficiently advanced minds with those particular properties. Do we know how to choose any cognitive properties at all, or do known techniques unavoidably converge on “utility maximizer that has properties implied by near-optimality plus other idiosyncratic properties we can’t choose” in the limit of capability? Is there is a list of properties we do know how to manipulate?
Not a very helpful answer, but: If you don’t also require computational efficiency, we can do some of those. Like, you can make AIXI variants. Is the question “Can we do this with deep learning?”, or “Can we do this with deep learning or something competitive with it?”
I think I mean “within a factor of 100 in competitiveness”, that seems like the point at which things become at all relevant for engineering, in ways other than trivial bounds.
Some possible paths to creating aligned AGI involve designing systems with certain cognitive properties, like corrigiblility or myopia. We currently don’t know how to create sufficiently advanced minds with those particular properties. Do we know how to choose any cognitive properties at all, or do known techniques unavoidably converge on “utility maximizer that has properties implied by near-optimality plus other idiosyncratic properties we can’t choose” in the limit of capability? Is there is a list of properties we do know how to manipulate?
Some example cognitive properties:
having a utility function of a certain type
being human-level or below at certain tasks even after a sharp left turn
some degree of incoherence e.g. time-inconsistency
an architecture separated into well-defined planning and world-modeling modules
Not a very helpful answer, but: If you don’t also require computational efficiency, we can do some of those. Like, you can make AIXI variants. Is the question “Can we do this with deep learning?”, or “Can we do this with deep learning or something competitive with it?”
I think I mean “within a factor of 100 in competitiveness”, that seems like the point at which things become at all relevant for engineering, in ways other than trivial bounds.