FAI has definite subproblems. It is not a matter of scratching away at a chalkboard hoping to make some breakthrough in “philosophy” or some other proto-sensical field that will Elucidate Everything and make the problem solvable at all. FAI, right now, is a matter of setting researchers to work on one subproblem after another until they are all solved.
In fact, when I do literature searches for FAI/AGI material, I often find that the narrow AI or machine-learning literature contains a round dozen papers nobody working explicitly on FAI has ever cited, or even appears to know about. This is my view: there is low-hanging fruit in applying existing academic knowledge to FAI problems. Where such low-hanging fruit does not exist, the major open problems can largely be addressed by recourse to higher-hanging fruit within mathematics, or even to empirical science.
Since you believe it’s all so wide-open, I’d like to know what you think of as “the FAI problem”.
If you have an Oracle AI you can trust, you can use it to solve FAI problems for you. This is a fine approach.
We don’t have time to be dicking around doing basic research on whiteboards.
In-context, what was meant by “Oracle AI” is a very general learning algorithm with some debug output, but no actual decision-theory or utility function whatsoever built in. That would be safe, since it has no capability or desire to do anything.
Ok, but a system like you’ve described isn’t likely to think about what you want it to think about or produce output that’s actually useful to you either.
Well yes. That’s sort of the problem with building one. Utility functions are certainly useful for specifying where logical uncertainty should be reduced.
Well, I don’t know about the precise construction that would be used. Certainly I could see a human being deliberately focusing the system on some things rather than others.
All existing learning algorithms I know of, and I dare say all that exist, have at least an utility function, and also something that could be interpreted as a decision theory. Consider for example support vector machines, which explicitly try to maximize a margin (that would be the utility function), and any algorithm for computing SVMs can be interpreted as a decision theory. Similar considerations hold for neural networks, genetic algorithms, and even the minimax algorithm.
Thus, I strongly doubt that the notion of a learning algorithm with no utility function makes any sense.
Those are optimization criteria, but they are not decision algorithms in the sense that we usually talk about them in AI. A support vector machine is just finding the extrema of a cost function via its derivative, not planning a sequence of actions.
The most popular algorithm for SVMs does plan a sequence of actions, complete with heuristics as to which action to take. True, the “actions” are internal : they are changes to some data structure within the computer’s memory, rather than changes to the external world. But that is not so different from e.g. a chess AI, which assigns some heuristic score to chess positions and attempts to maximize it using a decision algorithm (to decide which move to make), even though the chessboard is just a data structure within the computer memory.
“Internal” to the “agent” is very different from having an external output to a computational system outside the “agent”. “Actions” that come from an extremely limited, non-Turing-complete “vocabulary” (really: programming language or computational calculus (those two are identical)) are also categorically different from a Turing complete calculus of possible actions.
The same distinction applies for hypothesis class that the learner can learn: if it’s not Turing complete (or some approximation thereof, like a total calculus with coinductive types and corecursive programs), then it is categorically not general learning or general decision-making.
This is why we all employ primitive classifiers every day without danger, and you need something like Solomonoff’s algorithmic probability in order to build AGI.
I agree, of course, that none of the examples I gave (“primitive classifiers”) are dangerous. Indeed, the “plans” they are capable of considering are too simple to pose any threat (they are, as you say, not Turing complete).
But, that doesn’t seem to relevant to the argument at all. You claimed
a very general learning algorithm with some debug output, but no actual decision-theory or utility function
whatsoever built in. That would be safe, since it has no capability or desire to do anything.
You claimed that a general learning algorithm without decision-theory or utility function is possible.
I pointed out that all (harmless) practical learning algorithms we know of do in fact have decision theories and utility functions.
What would “a learning algorithm without decision-theory or utility function, something that has no desire to do anything” even look like? Does the concept even make sense? Eliezer writes here
A string of zeroes down an output line to a motorized arm is just as much an output as any other output;
there is no privileged null, there is no such thing as ‘no action’ among all possible outputs.
To ‘do nothing’ is just another string of English words, that would be interpreted the same as
any other English words, with latitude.
You claimed that a general learning algorithm without decision-theory or utility function is possible. I pointed out that all (harmless) practical learning algorithms we know of do in fact have decision theories and utility functions.
/facepalm
There is in fact such a thing as a null output. There is in fact such a thing as a learner with a sub-Turing hypothesis class. Such a learner with such a primitive output as “in the class” or “not in the class” does not engage in world optimization, that is: its actions do not, to its own knowledge, skew any probability distribution over future states of any portion of the world outside itself.
It does not narrow the future.
Now, what we’ve been proposing as an Oracle is even less capable. It would truly have no outputs whatsoever, only input and a debug view. It would, by definition, be incapable of narrowing the future of anything, even its own internal states.
Perhaps I have misused terminology, but that is what I was referring to: inability to narrow the outer world’s future.
This thing you are proposing, an “oracle” that is incapable of modeling itself and incapable of modeling its environment (either would require turing-complete hypotheses), what could it possibly be useful for? What could it do that today’s narrow AI can’t?
You seem to have lost the thread of the conversation. The proposal was to build a learner that can model the environment using Turing-complete models, but which has no power to make decisions or take actions. This would be a Solomonoff Inducer approximation, not an AIXI approximation.
There is in fact such a thing as a learner with a sub-Turing hypothesis class. Such a learner
with such a primitive output as “in the class” or “not in the class” does not engage in
world optimization, that is: its actions do not, to its own knowledge,
skew any probability distribution over future states of any portion of the world outside itself.
…
Now, what we’ve been proposing as an Oracle is even less capable.
which led me to think you were talking about an oracle even less capable than a learner with a sub-Turing hypothesis class.
It would truly have no outputs whatsoever, only input and a debug view. It would, by definition, be
incapable of narrowing the future of anything, even its own internal states.
If the hypotheses it considers are turing-complete, then, given enough information (and someone would give it enough information, otherwise they couldn’t do anything useful with it), it could model itself, its environment, the relation between its internal states and what shows up on the debug view, and the reactions of its operators on the information they learn from that debug view. Its (internal) actions very much would, to its own knowledge, skew the probability distribution over future states of the outer world.
I often find that the narrow AI or machine-learning literature contains a round dozen papers nobody working explicitly on FAI has ever cited, or even appears to know about.
Name three. FAI contains a number of counterintuitive difficulties and it’s unlikely for someone to do FAI work successfully by accident. On the other hand, someone with a fuzzier model believing that a paper they found sure sounds relevant, why isn’t MIRI citing it, is far more probable from my perspective and prior.
I wouldn’t say that there’s someone out there directly solving FAI problems without having explicitly intended to do so. I would say there’s a lot we can build on.
Keep in mind, I’ve seen enough of a sample of Eld Science being stupid to understand how you can have a very low prior on Eld Science figuring out anything relevant. But lacking more problem guides from you on the delta between plain AI problems and FAI problems, we go on what we can.
One paper on utility learning that relies on a supervised-learning methodology (pairwise comparison data) rather than a de-facto reinforcement learning methodology (which can and will go wrong in well-known ways when put into AGI). One paper on progress towards induction algorithms that operate at multiple levels of abstraction, which could be useful for naturalized induction if someone put more thought and expertise into it.
That’s only two, but I’m a comparative beginner at this stuff and Eld Science isn’t very good at focusing on our problems, so I expect that there’s actually more to discover and I’m just limited by lack of time and knowledge to do the literature searches.
By the way, I’m already trying to follow the semi-official MIRI curriculum, but if you could actually write out some material on the specific deltas where FAI work departs from the preexisting knowledge-base of academic science, that would be really helpful.
Since you believe it’s all so wide-open, I’d like to know what you think of as “the FAI problem”.
1) Designing a program capable of arbitrary self-modification, yet maintaining guarantees of “correct” behavior according to a goal set that is by necessity included in the modifications as well.
2) Designing such a high level set of goals which ensure “friendliness”.
That seems a circular argument. How do you use a self-modifying evolutionary search to find a program whose properties remain stable under self-modifying evolutionary search? Unless you started with the right answer, the search AI would quickly rewrite or reinterpret its own driving goals in a non-friendly way, and who knows what you’d end up with.
It’s how you draw your system box. Evolutionary search is equivalent to a self-modifying program, if you think of the whole search process as the program. The same issues apply.
I think the sequences do a good job at demolishing the idea that human testers can possibly judge friendliness directly, so long as the AI operates as a black box. If you have a debug view into the operation of the AI that is a different story, but then you don’t need friendliness anyway.
Great, you’ve got names for answers you are looking for. That doesn’t mean the answers are any easier to find. You’ve attached a label to the declarative statement which specifies the requirements a solution must meet, but that doesn’t make the search for a solution suddenly have a fixed timeline. It’s uncertain research: it might take 5 years, 10 years, or 50 years, and throwing more people at the problem won’t necessarily make the project go any faster.
And how is trying to build a safe Oracle AI that can solve FAI problems for us not basic research? Or, to make a better statement: how is trying to build an Unfriendly superintelligent paperclip maximizer not basic research, at today’s research frontier?
Logical uncertainty, for example, is a plain, old-fashioned AI problem. We need it for FAI, we’re pretty sure, but it’s turning out UFAI might need it, too.
“Basic research is performed without thought of practical ends.”
“Applied research is systematic study to gain knowledge or understanding necessary to determine the means by which a recognized and specific need may be met.”
-National Science Foundation.
We need to be doing applied research, not basic research. What MIRI should do is construct a complete roadmap to FAI, or better: a study exhaustively listing strategies for achieving a positive singularity, and tactics for achieving friendly or unfriendly AGI, and concluding with a small set of most-likely scenarios. MIRI should then have identified risk factors which affect either the friendliness of the AGI in each scenario, or the capability of the UFAI to do damage (in boxing setups). These risk factors should be prioritized based on how much it is expected knowing more about each would bias the outcome in a positive direction, and it should be these problems as the topics of MIRI workshops.
Instead MIRI is performing basic research. It’s basic research not because it is useless, but because we are not certain at this point in time what relative utility it will have. And if we don’t have a grasp on expected utility, how can we prioritize? There’s a hundred avenues of research which are important to varying degrees to the FAI project. I worked for a number of years at NASA-Ames Research Center, and in the same building as me was the Space Biosciences Division. Great people, don’t get me wrong, and for decades they have funded really cool research on the effects of microgravity and radiation on living organisms, with the justification that such effects and counter-measures need to be known for long duration space voyages, e.g. a 2-year mission to Mars. Never mind that the microgravity issue is trivially solved with a few thousand dollar steel tether connecting the upper stage to the space craft as they spin to create artificial gravity, and the radiation exposure is mitigated by having a storm shelter in the craft and throwing a couple of Martian sandbags on the roof once you get there. It’s spending millions of dollars to develop the pressurized-ink “Space Pen”, when the humble pencil would have done just fine.
Sadly I think MIRI is doing the same thing, and it is represented in one part of your post I take huge issue with:
Logical uncertainty, for example, is a plain, old-fashioned AI problem. We need it for FAI, we’re pretty sure...
If we’re only “pretty sure” it’s needed for FAI, if we can’t quantify exactly what its contribution will be, and how important that contribution is relative to other possible things to be working on.. then we have some meta-level planning to do first. Unfortunately I don’t see MIRI doing any planning like this (or if they are, it’s not public).
Are you on the “Open Problems in Friendly AI” Facebook group? Because much of the planning is on there.
If we’re only “pretty sure” it’s needed for FAI, if we can’t quantify exactly what its contribution will be, and how important that contribution is relative to other possible things to be working on.. then we have some meta-level planning to do first. Unfortunately I don’t see MIRI doing any planning like this (or if they are, it’s not public).
Logical uncertainty lets us put probabilities to sentences in logics. This, supposedly, can help get us around the Loebian Obstacle to proving self-referencing statements and thus generating stable self-improvement in an agent. Logical uncertainty also allows for making techniques like Updateless Decision Theory into real algorithms, and this too is an AI problem: turning planning into inference.
The cognitive stuff about human preferences is the Big Scary Hard Problem of FAI, but utility learning (as Stuart Armstrong has been posting about lately) is a way around that.
If you can create a stably self-improving agent that will learn its utility function from human data, equipped with a decision theory capable of handling both causative games and Timeless situations correctly… then congratulations, you’ve got a working plan for a Friendly AI and you can start considering the expected utility of actually building it (at least, to my limited knowledge).
Around here you should usually clarify whether your uncertainty is logical or indexical ;-).
Or.. you could use a boxed oracle AI to develop singularity technologies for human augmentation, or other mechanisms to keep moral humans in the loop through the whole process, and sidestep the whole issue of FAI and value loading in the first place.
Which approach do you think can be completed earlier with similar probabilities of success? What data did you use to evaluate that, and how certain are you of its accuracy and completeness?
I actually really do think that de novo AI is easier than human intelligence augmentation. We have good cognitive theories for how an agent is supposed to work (including “ideal learner” models of human cognitive algorithms). We do not have very good theories of in-vitro neuroengineering.
This assumes that you have usable, safe Oracle AI which then takes up your chosen line of FAI or neuroengineering problems for you. You are conditioning the hard part on solving the hard part.
You don’t need to solve philosophy to solve FAI, but philosophy is relevant to figuring out, in broad terms, the relative livelihoods of various problems and solutions.
Ok, let me finally get around to answering this.
FAI has definite subproblems. It is not a matter of scratching away at a chalkboard hoping to make some breakthrough in “philosophy” or some other proto-sensical field that will Elucidate Everything and make the problem solvable at all. FAI, right now, is a matter of setting researchers to work on one subproblem after another until they are all solved.
In fact, when I do literature searches for FAI/AGI material, I often find that the narrow AI or machine-learning literature contains a round dozen papers nobody working explicitly on FAI has ever cited, or even appears to know about. This is my view: there is low-hanging fruit in applying existing academic knowledge to FAI problems. Where such low-hanging fruit does not exist, the major open problems can largely be addressed by recourse to higher-hanging fruit within mathematics, or even to empirical science.
Since you believe it’s all so wide-open, I’d like to know what you think of as “the FAI problem”.
If you have an Oracle AI you can trust, you can use it to solve FAI problems for you. This is a fine approach.
Luckily, we don’t need to dick around.
That’s a large portion of the FAI problem right there.
EDIT: To clarify, by this I don’t mean to imply that FAI is easy, but that (trustworthy) Oracle AI is hard.
In-context, what was meant by “Oracle AI” is a very general learning algorithm with some debug output, but no actual decision-theory or utility function whatsoever built in. That would be safe, since it has no capability or desire to do anything.
You have to give it a set of directed goals and a utility function which favors achieving those goals, in order for the oracle AI to be of any use.
Why? How are you structuring your Oracle AI? This sounds like philosophical speculation, not algorithmic knowledge.
Ok, but a system like you’ve described isn’t likely to think about what you want it to think about or produce output that’s actually useful to you either.
Well yes. That’s sort of the problem with building one. Utility functions are certainly useful for specifying where logical uncertainty should be reduced.
Well, ok, but if you agree with this then I don’t see how you can claim that such a system would be particularly useful for solving FAI problems.
Well, I don’t know about the precise construction that would be used. Certainly I could see a human being deliberately focusing the system on some things rather than others.
All existing learning algorithms I know of, and I dare say all that exist, have at least an utility function, and also something that could be interpreted as a decision theory. Consider for example support vector machines, which explicitly try to maximize a margin (that would be the utility function), and any algorithm for computing SVMs can be interpreted as a decision theory. Similar considerations hold for neural networks, genetic algorithms, and even the minimax algorithm.
Thus, I strongly doubt that the notion of a learning algorithm with no utility function makes any sense.
Those are optimization criteria, but they are not decision algorithms in the sense that we usually talk about them in AI. A support vector machine is just finding the extrema of a cost function via its derivative, not planning a sequence of actions.
The most popular algorithm for SVMs does plan a sequence of actions, complete with heuristics as to which action to take. True, the “actions” are internal : they are changes to some data structure within the computer’s memory, rather than changes to the external world. But that is not so different from e.g. a chess AI, which assigns some heuristic score to chess positions and attempts to maximize it using a decision algorithm (to decide which move to make), even though the chessboard is just a data structure within the computer memory.
“Internal” to the “agent” is very different from having an external output to a computational system outside the “agent”. “Actions” that come from an extremely limited, non-Turing-complete “vocabulary” (really: programming language or computational calculus (those two are identical)) are also categorically different from a Turing complete calculus of possible actions.
The same distinction applies for hypothesis class that the learner can learn: if it’s not Turing complete (or some approximation thereof, like a total calculus with coinductive types and corecursive programs), then it is categorically not general learning or general decision-making.
This is why we all employ primitive classifiers every day without danger, and you need something like Solomonoff’s algorithmic probability in order to build AGI.
I agree, of course, that none of the examples I gave (“primitive classifiers”) are dangerous. Indeed, the “plans” they are capable of considering are too simple to pose any threat (they are, as you say, not Turing complete).
But, that doesn’t seem to relevant to the argument at all. You claimed
You claimed that a general learning algorithm without decision-theory or utility function is possible. I pointed out that all (harmless) practical learning algorithms we know of do in fact have decision theories and utility functions. What would “a learning algorithm without decision-theory or utility function, something that has no desire to do anything” even look like? Does the concept even make sense? Eliezer writes here
/facepalm
There is in fact such a thing as a null output. There is in fact such a thing as a learner with a sub-Turing hypothesis class. Such a learner with such a primitive output as “in the class” or “not in the class” does not engage in world optimization, that is: its actions do not, to its own knowledge, skew any probability distribution over future states of any portion of the world outside itself.
It does not narrow the future.
Now, what we’ve been proposing as an Oracle is even less capable. It would truly have no outputs whatsoever, only input and a debug view. It would, by definition, be incapable of narrowing the future of anything, even its own internal states.
Perhaps I have misused terminology, but that is what I was referring to: inability to narrow the outer world’s future.
This thing you are proposing, an “oracle” that is incapable of modeling itself and incapable of modeling its environment (either would require turing-complete hypotheses), what could it possibly be useful for? What could it do that today’s narrow AI can’t?
A) It wasn’t my proposal.
B) The proposed software could model the outer environment, but not act on it.
Physics is turing-complete, so no, a learner that did not consider turing complete hypotheses could not model the outer environment.
You seem to have lost the thread of the conversation. The proposal was to build a learner that can model the environment using Turing-complete models, but which has no power to make decisions or take actions. This would be a Solomonoff Inducer approximation, not an AIXI approximation.
You said
which led me to think you were talking about an oracle even less capable than a learner with a sub-Turing hypothesis class.
If the hypotheses it considers are turing-complete, then, given enough information (and someone would give it enough information, otherwise they couldn’t do anything useful with it), it could model itself, its environment, the relation between its internal states and what shows up on the debug view, and the reactions of its operators on the information they learn from that debug view. Its (internal) actions very much would, to its own knowledge, skew the probability distribution over future states of the outer world.
Name three. FAI contains a number of counterintuitive difficulties and it’s unlikely for someone to do FAI work successfully by accident. On the other hand, someone with a fuzzier model believing that a paper they found sure sounds relevant, why isn’t MIRI citing it, is far more probable from my perspective and prior.
I wouldn’t say that there’s someone out there directly solving FAI problems without having explicitly intended to do so. I would say there’s a lot we can build on.
Keep in mind, I’ve seen enough of a sample of Eld Science being stupid to understand how you can have a very low prior on Eld Science figuring out anything relevant. But lacking more problem guides from you on the delta between plain AI problems and FAI problems, we go on what we can.
One paper on utility learning that relies on a supervised-learning methodology (pairwise comparison data) rather than a de-facto reinforcement learning methodology (which can and will go wrong in well-known ways when put into AGI). One paper on progress towards induction algorithms that operate at multiple levels of abstraction, which could be useful for naturalized induction if someone put more thought and expertise into it.
That’s only two, but I’m a comparative beginner at this stuff and Eld Science isn’t very good at focusing on our problems, so I expect that there’s actually more to discover and I’m just limited by lack of time and knowledge to do the literature searches.
By the way, I’m already trying to follow the semi-official MIRI curriculum, but if you could actually write out some material on the specific deltas where FAI work departs from the preexisting knowledge-base of academic science, that would be really helpful.
Define doing FAI work successfully....
1) Designing a program capable of arbitrary self-modification, yet maintaining guarantees of “correct” behavior according to a goal set that is by necessity included in the modifications as well.
2) Designing such a high level set of goals which ensure “friendliness”.
Designing, not evolving?
That seems a circular argument. How do you use a self-modifying evolutionary search to find a program whose properties remain stable under self-modifying evolutionary search? Unless you started with the right answer, the search AI would quickly rewrite or reinterpret its own driving goals in a non-friendly way, and who knows what you’d end up with.
I don’t see why the search algorithm would need to be self modifying.
I don’t see why you would be searching for stability as opposed to friendliNess. Human testers can judge friendliness directly.
It’s how you draw your system box. Evolutionary search is equivalent to a self-modifying program, if you think of the whole search process as the program. The same issues apply.
I think the sequences do a good job at demolishing the idea that human testers can possibly judge friendliness directly, so long as the AI operates as a black box. If you have a debug view into the operation of the AI that is a different story, but then you don’t need friendliness anyway.
If I draw a box around the selection algorithm and find there is nothing self modifying inside …where’s the circularity?
(1) is naturalized induction, logical uncertainty, and getting around the Loebian Obstacle.
(2) is the cognitive science of evaluative judgements.
Great, you’ve got names for answers you are looking for. That doesn’t mean the answers are any easier to find. You’ve attached a label to the declarative statement which specifies the requirements a solution must meet, but that doesn’t make the search for a solution suddenly have a fixed timeline. It’s uncertain research: it might take 5 years, 10 years, or 50 years, and throwing more people at the problem won’t necessarily make the project go any faster.
And how is trying to build a safe Oracle AI that can solve FAI problems for us not basic research? Or, to make a better statement: how is trying to build an Unfriendly superintelligent paperclip maximizer not basic research, at today’s research frontier?
Logical uncertainty, for example, is a plain, old-fashioned AI problem. We need it for FAI, we’re pretty sure, but it’s turning out UFAI might need it, too.
“Basic research is performed without thought of practical ends.”
“Applied research is systematic study to gain knowledge or understanding necessary to determine the means by which a recognized and specific need may be met.”
-National Science Foundation.
We need to be doing applied research, not basic research. What MIRI should do is construct a complete roadmap to FAI, or better: a study exhaustively listing strategies for achieving a positive singularity, and tactics for achieving friendly or unfriendly AGI, and concluding with a small set of most-likely scenarios. MIRI should then have identified risk factors which affect either the friendliness of the AGI in each scenario, or the capability of the UFAI to do damage (in boxing setups). These risk factors should be prioritized based on how much it is expected knowing more about each would bias the outcome in a positive direction, and it should be these problems as the topics of MIRI workshops.
Instead MIRI is performing basic research. It’s basic research not because it is useless, but because we are not certain at this point in time what relative utility it will have. And if we don’t have a grasp on expected utility, how can we prioritize? There’s a hundred avenues of research which are important to varying degrees to the FAI project. I worked for a number of years at NASA-Ames Research Center, and in the same building as me was the Space Biosciences Division. Great people, don’t get me wrong, and for decades they have funded really cool research on the effects of microgravity and radiation on living organisms, with the justification that such effects and counter-measures need to be known for long duration space voyages, e.g. a 2-year mission to Mars. Never mind that the microgravity issue is trivially solved with a few thousand dollar steel tether connecting the upper stage to the space craft as they spin to create artificial gravity, and the radiation exposure is mitigated by having a storm shelter in the craft and throwing a couple of Martian sandbags on the roof once you get there. It’s spending millions of dollars to develop the pressurized-ink “Space Pen”, when the humble pencil would have done just fine.
Sadly I think MIRI is doing the same thing, and it is represented in one part of your post I take huge issue with:
If we’re only “pretty sure” it’s needed for FAI, if we can’t quantify exactly what its contribution will be, and how important that contribution is relative to other possible things to be working on.. then we have some meta-level planning to do first. Unfortunately I don’t see MIRI doing any planning like this (or if they are, it’s not public).
Are you on the “Open Problems in Friendly AI” Facebook group? Because much of the planning is on there.
Logical uncertainty lets us put probabilities to sentences in logics. This, supposedly, can help get us around the Loebian Obstacle to proving self-referencing statements and thus generating stable self-improvement in an agent. Logical uncertainty also allows for making techniques like Updateless Decision Theory into real algorithms, and this too is an AI problem: turning planning into inference.
The cognitive stuff about human preferences is the Big Scary Hard Problem of FAI, but utility learning (as Stuart Armstrong has been posting about lately) is a way around that.
If you can create a stably self-improving agent that will learn its utility function from human data, equipped with a decision theory capable of handling both causative games and Timeless situations correctly… then congratulations, you’ve got a working plan for a Friendly AI and you can start considering the expected utility of actually building it (at least, to my limited knowledge).
Around here you should usually clarify whether your uncertainty is logical or indexical ;-).
Or.. you could use a boxed oracle AI to develop singularity technologies for human augmentation, or other mechanisms to keep moral humans in the loop through the whole process, and sidestep the whole issue of FAI and value loading in the first place.
Which approach do you think can be completed earlier with similar probabilities of success? What data did you use to evaluate that, and how certain are you of its accuracy and completeness?
I actually really do think that de novo AI is easier than human intelligence augmentation. We have good cognitive theories for how an agent is supposed to work (including “ideal learner” models of human cognitive algorithms). We do not have very good theories of in-vitro neuroengineering.
Yes, but those details would be handled by the post-”FOOM” boxed AI. You get to greatly discount their difficulty.
This assumes that you have usable, safe Oracle AI which then takes up your chosen line of FAI or neuroengineering problems for you. You are conditioning the hard part on solving the hard part.
You don’t need to solve philosophy to solve FAI, but philosophy is relevant to figuring out, in broad terms, the relative livelihoods of various problems and solutions.