As far as I understand, MIRI did not assume that we’re just able to give the AI a utility function directly.
I’m a bit unsure about how to interpret you here.
In my original comment, I used terms such as positive/optimistic assumptions and simplifying assumptions. When doing that, I meant to refer to simplifying assumptions that were made so as to abstract away some parts of the problem.
The Risks from Learned Optimization paper was written mainly by people from MIRI!
Good point (I should have written my comment in such a way that pointing out this didn’t feel necessary).
Other things like Ontological Crises and Low Impact sort of assume you can get some info into the values of an agent
I guess this is more central to what I was trying to communicate than whether it is expressed in terms of a utility function per se.
”The idea with agent foundations, which I guess hasn’t successfully been communicated to this day, was finding a coherent target to try to get into the system by any means (potentially including DL ones).”
Based on e.g. this talk from 2016, I get the sense that when he says “coherent target” he means targets that relate to the non-digital world. But perhaps that’s not the case (or perhaps it’s sort of the case, but more nuanced).
Maybe I’m making this out to have been a bigger part of their work than what actually was the case.
Yeah, I find it difficult to figure out how to look at this. At lot of MIRI discussion focused on their decision theory work, but I think that’s just not that important.
Tiling agents e.g. was more about constructing or theorizing about agents that may have access to their own values, in a highly idealized setting about logic.
I’m a bit unsure about how to interpret you here.
In my original comment, I used terms such as positive/optimistic assumptions and simplifying assumptions. When doing that, I meant to refer to simplifying assumptions that were made so as to abstract away some parts of the problem.
Good point (I should have written my comment in such a way that pointing out this didn’t feel necessary).
I guess this is more central to what I was trying to communicate than whether it is expressed in terms of a utility function per se.
In this tweet, Eliezer writes:
”The idea with agent foundations, which I guess hasn’t successfully been communicated to this day, was finding a coherent target to try to get into the system by any means (potentially including DL ones).”
Based on e.g. this talk from 2016, I get the sense that when he says “coherent target” he means targets that relate to the non-digital world. But perhaps that’s not the case (or perhaps it’s sort of the case, but more nuanced).
Maybe I’m making this out to have been a bigger part of their work than what actually was the case.
Yeah, I find it difficult to figure out how to look at this. At lot of MIRI discussion focused on their decision theory work, but I think that’s just not that important.
Tiling agents e.g. was more about constructing or theorizing about agents that may have access to their own values, in a highly idealized setting about logic.