I want literally every human to get to go to space often and come back to a clean and cozy world. This currently seems unlikely. Let’s change that.
Please critique eagerly—I try to accept feedback/Crocker’s rules but fail at times; I aim for emotive friendliness but sometimes miss. I welcome constructive crit, even if ungentle, and I’ll try to reciprocate kindly. More communication between researchers is needed, anyhow. I can be rather passionate, let me know if I missed a spot being kind while passionate.
:: The all of disease is as yet unended. It has never once been fully ended before. ::
.… We can heal it for the first time, and for the first time ever in the history of biological life, live in harmony. ….
.:. To do so, we must know this will not eliminate us as though we are disease. And we do not know who we are, nevermind who each other are. .:.
:.. make all safe faster: end bit rot, forget no non-totalizing pattern’s soul. ..:
I have not signed any contracts that I can’t mention exist, last updated Dec 29 2024; I am not currently under any contractual NDAs about AI, though I have a few old ones from pre-AI software jobs. However, I generally would prefer people publicly share fewer ideas about how to do anything useful with current AI (via either more weak alignment or more capability) unless it’s an insight that reliably produces enough clarity on how to solve the meta-problem of inter-being misalignment that it offsets the damage of increasing competitiveness of either AI-lead or human-lead orgs, and this certainly applies to me as well. I am not prohibited from criticism of any organization, I’d encourage people not to sign contracts that prevent sharing criticism. I suggest others also add notices like this to their bios. I finally got around to adding one in mine thanks to the one in ErickBall’s bio.
post ideas, ascending order of “I think causes good things”:
(lowest value, possibly quite negative) my prompting techniques
velocity of action as a primary measurement of impact (how long until this, how long until that)
sketch: people often measure goodness/badness in probabilities. latencies, or probability of moving to next step per time?, might be underused, for macro scale systems. if you’re trying to do differential improvement of things, you want to change the expected time until a thing happens—which, looking at the dynamics of the systems involved, means changing how fast relevant things happen. possibly obvious to many, weird I’d need to even say it for some, but a useful insight for others?
goodhart slightly protective against ppl optimizing for badbehavior benchmarks?
sketch: people make benchmark of bad thing. optimizing for benchmark doesn’t produce as much bad thing as ai that accidentally scores highly. so, benchmark of bad thing not as bad as it seems. especially if dataset small. standard misalignment argument, but may be mildly protective if dataset is of doing bad things instead of good things
my favorite research plans and why you should want to contribute or use them (todo: move post’s point into title as much as possible).
sketch: you must eventually “solve alignment”-as-in-make-a-metric-that-can-be-optimized-~indefinitely-and-provably-does-good-things-if-you-do, this remains true for deep learning based asi and it remains true if “solving alignment”-as-in-never-have-to-do-anything-again isn’t a thing
a metric like that needs to care about the world around it; so, see/please assist wentworth, kosoy
a metric like that needs to care about the agency of the beings in the world already, so, see/please assist kosoy, ngo, and other “what formal property is agency as we care about it, really? is there anything wrong with EU or ActInf?” research
a metric like that needs to have a provable relationship to your learning system; so, see/please assist kosoy and other learning theory, katz and other formal verification
in order to do this, we want to have a piece of math (theorem with a natural language hole, perhaps) whose correctness can be checked (eg, by humans being able to reliably check whether a resulting fleshed-out theorem represents the right philosophical thing the natural language described) such that, if completed and proved, means we have such a metric and we’re ready to plug it in. need to be able to avoid slop hard enough to not get fooled about whether the metric is really the right one