Even if it’s a useful abstraction, it’s only an abstraction. You can’t make an AI safe by changing the it’s UF unless it’s UF is a distinct component at the engineering level, not just an abstraction.
Its not really an abstraction at all in this case, it literally has a utility function. What rates highest on its utility function is returning whatever token is ‘most likely’ given it’s training data.
Even if it’s a useful abstraction, it’s only an abstraction. You can’t make an AI safe by changing the it’s UF unless it’s UF is a distinct component at the engineering level, not just an abstraction.
And you can’t determine if it’s safe by examining or understanding it’s utility function, if the abstraction is so loose as to not be align-able.
Its not really an abstraction at all in this case, it literally has a utility function. What rates highest on its utility function is returning whatever token is ‘most likely’ given it’s training data.