Agent foundations, AI macrostrategy, human enhancement.
I endorse and operate by Crocker’s rules.
I have not signed any agreements whose existence I cannot mention.
Agent foundations, AI macrostrategy, human enhancement.
I endorse and operate by Crocker’s rules.
I have not signed any agreements whose existence I cannot mention.
I know that you didn’t mean it as a serious comment, but I’m nevertheless curious about what you meant by “the universe is a teleology”.
I would appreciate it if you put probabilities on at least some of these propositions.
At this point something even stranger happens.
What this “something even stranger” is seems rather critical.
I think that if you want to compute logical correlations between programs, you need to look at their, well, logic. E.g., if you have some way of extracting a natural abstraction-based representation of their logic, build something like a causal graph from it, and then design a similarity metric for comparing these representations.
I have a suspicion, though, that this is not the right approach for handling ECL because ECL (I think?) involves the agent(s) looking at (some abstraction over) their “source code(s)” and then making a decision based on that. I expect that this ~reflection needs to be modeled explicitly.
My forefrontest thought as I was finishing this essay was “Applying this concept to AI risk is left as an exercise for the reader.”.
Then I thought that AI risk, if anything, is characterized by kinda the opposite dynamic: lots of groups with different risk models, not that rarely explicitly criticizing the strategy/approach of the others as net-negative or implicitly complacent with the baddies, finding it hard to cooperate despite what locally seems like convergent local subgoals. (To be clear: I’m not claiming that everybody’s take/perspective on this is valid or that everybody in this field should cooperate with everybody else or whatever.)
Then I thought that, actually, even within what seems like somewhat coherent factions, we would probably see some tails coming apart once their goal (AI moratorium, PoC aligned AGI, existential security, exiting the acute risk period) is achieved.
And then I thought, well...
GDM, OpenAI, Anthropic, …
Epoch and Mechanize
… there’s probably more examples in the past
And then there were conversations where people I viewed as ~allied turned out to bite bullets that I considered (and still consider) equivalent to moral atrocities.
I may want to think more about this but ATM it seems to me like AI risk as a field (or loose cluster of groups) is failing both on cooperating to achieve locally cooperation-worthy convergent subgoals and also failing on seeing past the moral homophones.
(When I say “failing”, I’m inclined to ask myself what standard I should apply but reality doesn’t grade on a curve and stakes are huge.)
---
Anyway, thanks for the post and the concept!
Which doesn’t make the OP wrong.
I feel like we perhaps need to reach some “escape velocity” to get something like that going, but for ~rationality / deliberately figuring out how to think and act better.
Proposition 8: If and is continuous, then
I’m pretty sure this should be because otherwise the types don’t match.
Reading this made me think that the framing “Everything is alignment-constrained, nothing is capabilities-constrained.” is a rathering and that a more natural/joint-carving framing is:
To the extent that you can get capabilities by your own means (rather than hoping for reality to give you access to a new pool of some resource or whatever), you get them by getting various things to align so that they produce those capabilities.
Generating new frontier knowledge: As in, given a LW post generating interesting comments that add to the conversation, or given some notes on a research topic generating experiment ideas, etc.
Have you tested it on sites/forums other than LW?
Pearson Correlation ⇒ Actual info ⇒ Shannon mutual info.
Shouldn’t it be the other way around?
Pearson Correlation ≤ Actual info ≤ Shannon mutual info.
Can you give some reasons why you think all that or some of all that?
The short answer to “How is it different from corrigibility?” is something like: here we’re thinking about systems that are not sufficiently powerful for us to need them to be fully corrigible.
This sounds to me like you’re imagining just nobody building a more powerful AIs is an option if we already got a lot of value from it (where I don’t really know what level of capability you imagine concretely)? If the world was so reasonable we wouldn’t rush ahead with our abysmal understanding of AI anyways because obviously the risks outweigh the benefits? Also you don’t just need to convince the leading labs because progress will continue and soon enough many many actors will be able to create unaligned powerful AI, and someone will.
The (revealed) perception of risks and benefits depends on many things, including what kind of AI is available/widespread/adopted. Perhaps we can tweak those parameters. (Not claiming that it’s going to be easy.)
I think the right framing of the bounded/corrigible agent agenda is aiming toward a pivotal act.
Something in this direction, yes.
Perhaps you misread the OP as saying “small molecules” rather than “small set of molecules”.
“Maker Power”?
Do you think that it would be worth it to try to partially sort this out in a LW dialogue?
I was very skeptical from the beginning, for largely similar reasons I expressed in my posts. But first I told myself that I should stay a little longer.
IME, in the majority of cases, when I strongly felt like quitting but was also inclined to justify “staying just a little bit longer because XYZ”, and listened to my justifications, staying turned out to be the wrong decision.
I think: To the extent that one even ascribes “is” claims / beliefs to techne and gnosis, those “is”es are grounded in “ought”s (though it’s possible I’m now applying this way of thinking more broadly than the post).
There I think we can make a distinction between pure “is” that exists prior to conceptualization and “is from ought” arising only after such experiences are reified
I don’t think there’s any pure “is” that exists prior to conceptualization.
What are the X’s over which you quantified in the rightmost biconditional? Bots in chains that started with Bot?
How come?