Ah, OK, then would suggest adding it to both title and body to make it clear, and to not waste time of people what are not the audience for this.
Anon User
Sorry, feedback on what? Where is your resume/etc—what information to you expect the feedback to be based on?
But here is actional feedback—when asking people to help you for free out of goodness of their hearts (including this post!), you need to get out of your way to make it as easy and straightforward for them as possibl. When asking for feedback provide all the relevant information collected in an easy to navigate package,with TLDR summaries, etc. When asking for a recommendation, introduction, etc provide brief talking points, with more detailed iinformation provided for context (and make it clear you do not expect them to need to review it, and it is provided “just in case you would find it helpful”.
Interesting—your 40/20/40 is a great toy example to think about, thanks! And it does show that a simple instant runoff schema for RCV should not necessarily help that much...
I am not sure about the median researcher. Many fields have a few “big names” that everybody knows and who’s opinions have disproportionate weight.
Finally, we wouldn’t get a second try—any bugs in your AIs, particularly the 2nd one, are very likely to be fatal. We do not know how to create your 2nd AI in such a way that the very first time we turn it on, all the bugs were already found and fixed.
Also, human values, at least the ones we know how to consciously formulate, are pretty fragile—they are things that we want weak/soft optimization for, but would actually be very bad if a superhuman AI would hard-optimize. We do not know how to capture human values in a way that things would not go terribly wrong if the optimization is cranked to the max, and your Values AI is likely to not help enough, as we would not know what missing inputs we are failing to provide it (because they are aspects of our values that would only become important in some future circumstances we cannot even imagine today).
We do not know how to create an AI that would not regularly hallucinate. The Values AI hallucinating would be a bad thing.
In fact, training AI to closer follow human values seems to just cause it to say what humans want to hear, while being objectively incorrect more often.
We do not know how to create an AI that reliability follows the programed values outside of a training set. Your 2nd AI going off the rails outside of the training set would be bad.
Do you care about what kind of peace it is, or just that there is some sort of peace? If latter, I might agree with you on Trump being more likely to quickly get us there. For former, Trump is a horrible choice. On of the easiest way for a US President to force a peace agreement in Ukraine is probably to privately threaten Ukranians to withhold all support, unless they quickly agree to Russian demands. IMHO, Trump is very likely to do something like that. The huge downside is that while this creates a temporary peace, it would encourage Russia to go for it again with other neighbors,and to continue other destabilizing behaviors across the globe (in collaboration with China, Iran, North Korea, etc). Also increases the chances of China going at Taiwan.
Ability to predict how outcome depends on inputs + ability to compute the inverse of the prediction formula + ability to select certain inputs ⇒ ability to determine the output (within limits of what the influencing the inputs can accomplish). The rest is just an ontological difference on what language to use to describe this mechanism. I know that if I place a kettle on a gas stove and turn on the flame, I will get the boiling water, and we colloquially describe this as bowling the water. I do not know all the intricacies of the processes inside the water, and I am not directly controlling individual heat exchange subprocesses inside the kettle, but if would be silly to argue that I am not controlling the outcome of the water getting boiled.
Perhaps you are missing the point of what I am saying here somewhat? The issue is is not the scale of the side-effect of a computation, it’s the fact that the side-effect exists, so any accurate mathematical abstraction of an actual real-world ASI must be prepared to deal with solving a self-referential equation.
I think it’s important to further refine the accuracy criterion—I think another very important criterion (particularly given today’s state of US politics) is how conducive the voting system towards consensus-building vs polarization. In other words, not only pure accuracy matters, but the direction of the error as well. That is, an error towards a more extreme candidate is IMHO a lot more harmful than an equally sized error towards a more consensus candidate.
It seems you are overlooking the notion of superintelligence being able to compute through your decisionmaking process backwards. Yes, it’s you who would be making the decision, but SI can tell you exactly what you need to hear in order for your decision to result in what it wants. It is not going to try to explain how it is manipulating you, it will not try to prove to you it is manipulating you correctly—it will just manipulate you. Internally, it may have a proof, but what reason would it have to show it to you? And if placed into some very constrained setup where it is forced to show you the proof, it will solve a recursive equation, of “What is the proof P, such that P proves that `’when shown P, you will act according to P’s prediction ″ ?”, solve it correctly, and then show you such P that it would be compelling enough for you to follow it to its conclusion.
Your proof actually fails to fully account for the fact that any ASI must actually exist in the world. It would affect the world other then just through its outputs—e.g. if it’s computation produces heat, that heat would also affect the world. Your proof does not show that the sum of all effects of the ASI on the world (both intentional + side-effects of it performing its computation) could be aligned. Further, real computation takes time—it’s not enough for the aligned ASI to produce the right output, it also needs to produce it at the right time. You did not prove it to be possible.
The 3rd paragraph of the Wikipedia page you linked to seems to answer the very question you are asking:
Maximal lotteries do not satisfy the standard notion of strategyproofness [...] Maximal lotteries are also nonmonotonic in probabilities, i.e. it is possible that the probability of an alternative decreases when a voter ranks this alternative up
If your AGI uses a bad decision theory T it would immediately self-modify to use a better one.
Nitpick—while probably a tiny part of the possible design space, there are obvious counterexamples to that, including when using T results in the AGI [incorrectly] concluding T is the best, or otherwise not realizing this self-modification is for the best.
After finishing any task/subtask and before starting the next one, go up the hierarchy at least two levels, and ask yourself—is moving onto the next subtask still the right way to achieve the higher-level goal, and is it still the highest priority thing to tackle next. Also do this anytime there is a significant unexpected difficulty/delay/etc.
Periodically (with period defined at the beginning) do this for the top-level goal regardless of where you are in the [sub]tasks.
There are so many side-effects this overlooks. Winning $110 complicates my taxes by more than $5. In fact, once gambling winnings taxes are considered, the first bet will likely have a negative EV!
Your last figure should have behaviours on the horizontal axis, as this is what you are implying—you are effectively saying, any intelligence capable of understanding “I don’t know what I don’t know” will on.y have power seeking behaviours, regardless of what its ultimate goals are. With that correction, your third figure is not incompatible with the first.
I buy your argument that power seeking is a convergent behavior. In fact, this is a key part of many canonical arguments for why an unaligned AGI is likely to kill us all.
But, on the meta level you seem to argue that this is incompatible with orthogonally thesis? If so, you may be misunderstanding the thesis—the ability of an AGI to have arbitrary utility functions is orthogonal (pun intended) to what behaviors are likely to result from those utility functions. The former is what orthogonality thesis claims, but your argument is about the latter.
https://www.lesswrong.com/tag/recursive-self-improvement