the gears to ascension

Karma: 5,452

I want literally every human to get to go to space often and come back to a clean and cozy world. This currently seems unlikely. Let’s change that.

Please critique eagerly—I try to accept feedback/Crocker’s rules but fail at times; I aim for emotive friendliness but sometimes miss. I welcome constructive crit, even if ungentle, and I’ll try to reciprocate kindly. More communication between researchers is needed, anyhow. I can be rather passionate, let me know if I missed a spot being kind while passionate.

:: The all of disease is as yet unended. It has never once been fully ended before. ::

.… We can heal it for the first time, and for the first time ever in the history of biological life, live in harmony. ….

.:. To do so, we must know this will not eliminate us as though we are disease. And we do not know who we are, nevermind who each other are. .:.

:.. make all safe faster: end bit rot, forget no non-totalizing pattern’s soul. ..:

I have not signed any contracts that I can’t mention exist, last updated Dec 29 2024; I am not currently under any contractual NDAs about AI, though I have a few old ones from pre-AI software jobs. However, I generally would prefer people publicly share fewer ideas about how to do anything useful with current AI (via either more weak alignment or more capability) unless it’s an insight that reliably produces enough clarity on how to solve the meta-problem of inter-being misalignment that it offsets the damage of increasing competitiveness of either AI-lead or human-lead orgs, and this certainly applies to me as well. I am not prohibited from criticism of any organization, I’d encourage people not to sign contracts that prevent sharing criticism. I suggest others also add notices like this to their bios. I finally got around to adding one in mine thanks to the one in ErickBall’s bio.

the gears to ascension May 29, 2025, 7:18 AM
2 points
0
in reply to: RogerDearnaley’s comment on: The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?
Clickbait burns the commons and thus gets downvotes. How about just “the best way to align an LLM so far: dramatic progress on LLM alignment”? Don’t overclaim, just emphasize, imo. (Could still be overclaiming.)

the gears to ascension May 29, 2025, 7:03 AM
2 points
1
in reply to: RogerDearnaley’s comment on: The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?
I agree that it helps a lot with alignment! I’m on my phone, will respond properly later, but “solved problem” to me means “superintelligence-robust”, and (goal-)misgeneralization is still a problem even with very high quality training data. It probably reduces bad behavior by an order of magnitude or more, but superintelligence-robustness is a VERY high bar. I’m working on a post about this, eta within a week. I don’t mean to say you’re wrong that it helps, only that I’d like to reserve the words “solved problem” for certified generalization results.

the gears to ascension May 29, 2025, 6:53 AM
5 points
2
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
Would be interesting to see a survey of five year olds to see if the qualifiers in your opening statement are anything like correct. I doubt you need to filter to just boys, for example.

the gears to ascension May 29, 2025, 6:50 AM
2 points
0
on: The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?
Please correct the title; this has no effect on how generalization works, which is what I’d call inner alignment. It’s a good idea, though, and I agree it’s something to probably do.

the gears to ascension May 27, 2025, 6:26 PM
9 points
0
on: the gears to ascenscion’s Shortform
[Todo: read] “Fundamental constraints to the logic of living systems”, abstract: “It has been argued that the historical nature of evolution makes it a highly path-dependent process. Under this view, the outcome of evolutionary dynamics could have resulted in organisms with different forms and functions. At the same time, there is ample evidence that convergence and constraints strongly limit the domain of the potential design principles that evolution can achieve. Are these limitations relevant in shaping the fabric of the possible? Here, we argue that fundamental constraints are associated with the logic of living matter. We illustrate this idea by considering the thermodynamic properties of living systems, the linear nature of molecular information, the cellular nature of the building blocks of life, multicellularity and development, the threshold nature of computations in cognitive systems and the discrete nature of the architecture of ecosystems. In all these examples, we present available evidence and suggest potential avenues towards a well-defined theoretical formulation.”

See also:

A review

author YouTube interview

the gears to ascension May 26, 2025, 10:12 PM
3 points
0
on: Seeking Feedback: Toy Model of Deceptive Alignment (Game Theory)
I’m confused by considering treating a vector as a type without the associated framework of types that would give types meaning. Eg if I’m in lean 4, a type that talks about being in a specific vector subspace will need to mention the numeric values that define the subspace, right?

the gears to ascension May 26, 2025, 8:27 PM
3 points
0
in reply to: Florian_Dietz’s comment on: Florian_Dietz’s Shortform
it’s straightforward, just detect if perplexity is too high. can they detect perplexity too high? like… probably, they’re prediction models, but it’s not clear to me whether or to what degree they notice previous prediction errors when predicting additional tokens

the gears to ascension May 26, 2025, 3:58 PM
4 points
0
in reply to: dr_s’s comment on: Gemini Diffusion: watch this space
the thing they aren’t is one-step crossentropy. that’s it, everything else is presumably sampled from the same distribution as existing LLMs. (this is like if someone finally upgraded BERT to be a primary model).

the gears to ascension May 24, 2025, 2:28 PM
4 points
0
on: Coordinate-Free Interpretability Theory
first and second images are now missing.

the gears to ascension May 14, 2025, 1:14 PM
3 points
0
in reply to: Will_Pearson’s comment on: Will_Pearson’s Shortform
half-joking: yes, by the game “hanabi”. (I in fact think such projects would benefit from getting good at hanabi, but it’s not a full answer.)

the gears to ascension May 13, 2025, 3:25 PM
2 points
0
in reply to: Mikhail Samin’s comment on: Learned pain as a leading cause of chronic pain
ehh, I put reasonably high probability that my paranoia about sharing things is just silly for most things besides specific veins of research I’m not even doing, and that I’ll end up sharing this thing? or maybe someone will figure it out reasonably quickly. ask me in like two weeks if I haven’t posted by then

the gears to ascension May 12, 2025, 2:34 PM
2 points
0
in reply to: Mikhail Samin’s comment on: Learned pain as a leading cause of chronic pain
so, like, while I’m still ambivalent-lean-no about sharing even mild insightful things in today’s world, since people have been pushing me to consider sharing things that just make people better off, I’ll at least say that I’ve had results where explaining brains properly to claude works pretty well. it takes some doing though and you need to explain some things that are so obvious as to not need mention, for human practitioners

the gears to ascension May 7, 2025, 8:34 PM
16 points
0
on: the gears to ascenscion’s Shortform
post ideas, ascending order of “I think causes good things”:
1. (lowest value, possibly quite negative) my prompting techniques
2. velocity of action as a primary measurement of impact (how long until this, how long until that)
  
  sketch: people often measure goodness/badness in probabilities. latencies, or probability of moving to next step per time?, might be underused, for macro scale systems. if you’re trying to do differential improvement of things, you want to change the expected time until a thing happens—which, looking at the dynamics of the systems involved, means changing how fast relevant things happen. possibly obvious to many, weird I’d need to even say it for some, but a useful insight for others?
3. goodhart slightly protective against ppl optimizing for badbehavior benchmarks?
  
  sketch: people make benchmark of bad thing. optimizing for benchmark doesn’t produce as much bad thing as ai that accidentally scores highly. so, benchmark of bad thing not as bad as it seems. especially if dataset small. standard misalignment argument, but may be mildly protective if dataset is of doing bad things instead of good things
4. my favorite research plans and why you should want to contribute or use them (todo: move post’s point into title as much as possible).
  
  sketch: you must eventually “solve alignment”-as-in-make-a-metric-that-can-be-optimized-~indefinitely-and-provably-does-good-things-if-you-do, this remains true for deep learning based asi and it remains true if “solving alignment”-as-in-never-have-to-do-anything-again isn’t a thing
  - a metric like that needs to care about the world around it; so, see/please assist wentworth, kosoy
  - a metric like that needs to care about the agency of the beings in the world already, so, see/please assist kosoy, ngo, and other “what formal property is agency as we care about it, really? is there anything wrong with $E U$ or ActInf?” research
  - a metric like that needs to have a provable relationship to your learning system; so, see/please assist kosoy and other learning theory, katz and other formal verification
  in order to do this, we want to have a piece of math (theorem with a natural language hole, perhaps) whose correctness can be checked (eg, by humans being able to reliably check whether a resulting fleshed-out theorem represents the right philosophical thing the natural language described) such that, if completed and proved, means we have such a metric and we’re ready to plug it in. need to be able to avoid slop hard enough to not get fooled about whether the metric is really the right one

the gears to ascension May 6, 2025, 11:36 PM
3 points
0
in reply to: benwr’s comment on: It’s ‘Well, actually...’ all the way down
this does still have issues, if someone is horrified by riboflavin but makes sure to have plenty of vitamin $B_{12}$ . I’m expecting that people who are “chemical avoidant” ordinarily implement rather surface-level pattern matching of what qualifies, but if you provide them with detailed explanations of how specific names actually arise, they might match on the explanation rather than the name. still, I expect they’ll have a high rate of weird edge cases.

the gears to ascension May 5, 2025, 3:03 PM
2 points
0
on: Proposal: Liquid Prediction Markets for AI Forecasting
Do prediction markets make the things they predict go better? Worse? Add variance? Reduce variance? How do we know?

For that matter, does more people being good at prediction make things go better overall? If so, under what circumstances? I expect there’s a wide variety of circumstances where it helps. what are they?

What hypotheses do we have for what-has-to-go-right-at-once in order that making-a-person-more-able-to-win-at-prediction-markets to result in the actual increase of actual probability of good long-term trajectories for the things humans and their friends cared about, such as there continuing to be more humans, and a cosmopolitan moral outcome?

The path from “make this prediction system more powerful” to “and then things go well” needs very strong heuristic arguments I haven’t seen in discussions on prediction markets—I currently think any argument that improved prediction implies improved outcomes needs to show why the improved prediction produces an asymmetric effect towards integrating the longest-term preferences of the humans involved, rather than, eg, being an asymmetric short-term boost for whichever have the most short-term need for information about their moves on the short-term game board without regard to whether their short-term play even optimizes for their own long term outcomes.

the gears to ascension Apr 29, 2025, 11:08 PM
5 points
0
in reply to: faul_sname’s comment on: Our Reality: A Simulation Run by a Paperclip Maximizer
a more extreme version of the “god created worlds starting from the best, and kept making more until running out of just-barely-good-enough ones”. in this one, it would be a world-creator which has no interest in seeking out making good worlds, focusing on the ones that are most difficult to understand. if that’s the case, we should expect to be in an impactful part of the underlying real world, and so should focus our actions there. we’ll tend to observe continuing to be in an impactful part of the world, but in the underlying real worlds that impact the simulators, we’ll be having impacts that affect things in ways (hopefully somewhat, if we’re skilled and lucky) closer to what we hope for.

the gears to ascension 29 Apr 2025 22:57 UTC
2 points
0
in reply to: James_Miller’s comment on: Our Reality: A Simulation Run by a Paperclip Maximizer
but if it’s simulated in less detail, it gives much less realityfluid to mindlike structures, meaning the mindlike structures are likely in actual physical bodies.

to be clear, I think there are detailed sims out there. but I measure relevance by impact, and treat the sims as just really high resolution memories. I don’t waste time thinking about what’s in the sims except by nature of thinking about what I want to do with my downtime such that it’s what they have to be remembering.

the gears to ascension 29 Apr 2025 12:21 UTC
2 points
0
on: Our Reality: A Simulation Run by a Paperclip Maximizer
afaict, this is true the same way major historical figures are primarily approximately-instantiated in simulations today (movies and fiction). it’s just a more intense version of “history has its eyes on you”—history is a thing in the future that does a heck of a lot of simulating. what we do to affect that history is still what matters, though.

the gears to ascension 28 Apr 2025 6:29 UTC
23 points
6
on: We should try to automate AI safety work asap
Need “what is good” questions where humans can reliably check answers (theorems, or tractably checkable formalization challenges).

My favorite threads I’d like to see boosted: Wentworth, Kosoy, Ngo, Leake, Byrnes, Demski, Eisenstat.
- John Wentworth (natural latents, and whatever he’s interested in right now)
- Vanessa Kosoy and co (non monotonic IBP, superimitation, and W.S.I.I.R.N.)
- Richard Ngo (scale free agency/”marriages?” curiosity, and W.H.I.I.R.N.)
- Tamsin Leake (qaci as a maybe slightly more specific superimitation, less of W.S.I.I.R.N. but maybe some)
- Steven Byrnes (I’m not up to date on, so just W.H.I.I.R.N.)
- Sam Eisenstat (had a cool but hard to hear talk on wentworth-esque stuff at MAISU, W.H.I.I.R.N.)
- Abram Demski (W.H.I.I.R.N.)
Current models are like actors, you talk to the character. I hope nobody gets mislead catastrophically by thinking you can outsource a hard to check part of things.

the gears to ascension 21 Apr 2025 22:22 UTC
6 points
0
on: Improving CNNs with Klein Networks: A Topological Approach to AI
I appreciate work that shows how interpretability is naturally capabilities work! I was naturally drawn to capabilities work, it’s how I ended up interested in AI. My guess is that people who do interp work because they think it’s safety would like to know that it’s primarily capabilities, and they simply don’t believe theoretical arguments without concrete evidence it actually turns into capabilities. So having capabilities nerds posting here seems good to me. Strong upvoted, please poke holes in more alignment plans!

(This isn’t frontier capabilities work, anyway, IMO.)