After reading the article, I thought I understood it, but from reading the comments, this appears to be an illusion. Yet, I think I should be able to understand, it doesn’t seem to require any special math or radically new concepts… My understanding is below. Could someone check it and tell me where I’m wrong?
The proposal is to define a utility function U(), which takes as input some kind of description of the universe, and returns the evaluation of this description, a number between 0 and 1.
The function U is defined in terms of two other functions—H and T, representing a mathematical description of a specific human brain, and an infinitely powerful computing environment.
Although the U-maximizing AGI will not be able to actually calculate U, it will be able to reason about it (that is, prove theorems), which should allow it to perform at least some actions, which would therefore be provably friendly.
After reading the article, I thought I understood it, but from reading the comments, this appears to be an illusion. Yet, I think I should be able to understand, it doesn’t seem to require any special math or radically new concepts… My understanding is below. Could someone check it and tell me where I’m wrong?
The proposal is to define a utility function U(), which takes as input some kind of description of the universe, and returns the evaluation of this description, a number between 0 and 1.
The function U is defined in terms of two other functions—H and T, representing a mathematical description of a specific human brain, and an infinitely powerful computing environment.
Although the U-maximizing AGI will not be able to actually calculate U, it will be able to reason about it (that is, prove theorems), which should allow it to perform at least some actions, which would therefore be provably friendly.