The system might develop several search parts, some of which would be epistemical—for instance, “where my friend Bob is? Volunteering at a camp? Eating out at a cafe? Watching a movie?”—and attempt to retarget one to select the option based on alignment target instead of truth would make AI underperform or act on invalid world model.
Are there better ways to fix this issue than to retarget just the last search (one nearest to the output)?
The system might develop several search parts, some of which would be epistemical—for instance, “where my friend Bob is? Volunteering at a camp? Eating out at a cafe? Watching a movie?”—and attempt to retarget one to select the option based on alignment target instead of truth would make AI underperform or act on invalid world model.
Are there better ways to fix this issue than to retarget just the last search (one nearest to the output)?