Designing a better version of itself will increase an AI’s reward function
An AI doesn’t have to have a reward function, or one that implies self improvement. RFs often only apply at the training stage.
How would an AI be directed without using a reward function? Are there some examples I can read?
Current AIs are mostly not explicit expected-utility-maximizers. I think this is illustrated by RLHF (https://huggingface.co/blog/rlhf).
But isn’t that also using a reward function? The AI is trying to maximise the reward it receives from the Reward Model. The Reward Model that was trained using Human Feedback.
An AI doesn’t have to have a reward function, or one that implies self improvement. RFs often only apply at the training stage.
How would an AI be directed without using a reward function? Are there some examples I can read?
Current AIs are mostly not explicit expected-utility-maximizers. I think this is illustrated by RLHF (https://huggingface.co/blog/rlhf).
But isn’t that also using a reward function? The AI is trying to maximise the reward it receives from the Reward Model. The Reward Model that was trained using Human Feedback.