I didn’t read most of the post but it seems like you left out a little known but potentially important way to know whether research is good, which is something we could call “having reasons for thinking that your research will help with AGI alignment and then arguing about those reasons and seeing which reasons make sense”.
A different way of arriving at consensus? I’m kind of annoyed that there’s apparently a practice of not proactively thinking of examples, but ok:
If ~everyone is deferring, then they’ll converge on some combination of whoever isn’t deferring and whatever belief-like objects emerge from the depths in that context.
If ~everyone just wishes to be paid and the payers pay for X, then ~everyone will apparently believe X.
If someone is going around threatening people to believe X, then people will believe X.
Deferring is straightforwardly a thing you can do with your “vote” in eigen-evaluation. As I wrote:
While in the absence of direct feedback this system makes sense, I think it works better when everyone’s contributing their own judgments and starts to degrade when it becomes overwhelmingly about popularity and who defers to who.
Perhaps the word “evaluation” in there is what’s misleading.
Being paid or threatened feel so degenerate (and to result in professions of belief) that I hadn’t really considered them. Still, suppose there are different people paying or voting in different directions, I think how those net out in what’s regarded as “good” will be via an eigen-evaluation process.
On second thought, I do think payment/coercion might be what “people believe X because it’s advantageous” is equivalent to. For example, they end up favoring views/research X because that gets you more access to resources (researchers in X are better funded, etc).
Meta; I think it’s good to proactively think of examples if you can, and good to provide them too.
My position is approx “whenever there’s group aggregate belief, it arises from an eigen- process”. (True even when you’ve got direct-evaluation, though so quantitatively different as to be qualitatively different.)
Predicting that whatever you say will also be eigen-evaluation according to me makes it hard to figure what you think isn’t.
ETA: This perhaps inspires me to write a post arguing for this larger point. Like it’s the same mechanism with “status”, fashion, and humor too.
If everyone calculates 67*23 in their head, they’ll reach a partial consensus. People who disagree with the consensus can ask for an argument, and they’ll get a convincing argument which will convince them of the correct answer; and if the argument is unconvincing, and they present a convincing argument for a different answer, that answer will become the consensus. We thus arrive at consensus with no eigening. If this isn’t how things play out, it’s because there’s something wrong with the consensus / with the people’s epistemics.
Hmm, okay, I think I’ve made an update (not necessarily to agree with you entirely, but still an update on my picture, so thanks).
I was thinking that if a group of people all agree on particular axioms or rules of inferences, etc., then that will be where eigening is occurring even if given sufficiently straightforward axioms, the group members will achieve consensus without further eigening. But possibly you can get consensus on the axioms just via selection and via individuals using their inside-view to adopt them or not. That’s still a degree of “we agreed”, but not eigening.
Huh. Yeah, that’s an interesting case which yeah, plausibly doesn’t require any eigening. I think the plausibility comes from it being a case where someone can so fully do it from their personal inside view (the immediate calculation and also their belief in how the underlying mathematical operations ought to work).
I don’t think it scales to anything interesting (def not alignment research), but it is conceptually interesting for how I’ve been thinking about this.
No, it’s the central example for what would work in alignment. You have to think about the actual problem. The difficulty of the problem and illegibility of intermediate results means eigening becomes dominant, but that’s a failure mode.
I agree that eigening isn’t the key concept for alignment or other scientific process. Sure you could describe any consensus that way, but they could be either very good or just awful depending on how much valid analysis went into each step of doing that eigening. In a really good situation, progress toward consensus is only superficially describable as eigening. The real progress is happening by careful thinking and communicating. The eigening isn’t happening by reputation but by quality of work. In a bad field, eigening is doing most of the work.
Referring to them both as eigening seems to obscure the difference between good and bad science/theory creation.
I’d guess “make sense” means something higher-bar, harder-to-achieve to you than it does to most people. Are there other ways to say “make sense” that pin down the necessary level of making sense you mean to refer to?
This is a reasonable question, but seems hard to answer satisfyingly. Maybe something with a similar spirit to “stands up to multiple rounds of cross-examination and hidden-assumption-explicitization”.
Personally, I’d say that at least part of what I would categorize under ‘make sense’ is objective. Namely, that you have in your scientific proposal a mechanism of action and consequence which is logically coherent. As in, an evaluation of the abstract symbolic logic of the proposal. Meeting that should be a sort of ‘minimum bar’ for considering a scientific proposal as worth discussing.
However, there will always be complications in the world which can’t be simplified down to logical assertions, so that’s only the start of the journey. However, you should feel free to reject proposals which start out making arguments that contradict themselves.
I didn’t read most of the post but it seems like you left out a little known but potentially important way to know whether research is good, which is something we could call “having reasons for thinking that your research will help with AGI alignment and then arguing about those reasons and seeing which reasons make sense”.
Not left out! That’s what “we agree”/”eigen-evaluation” consists of.
My point is that’s crucially different from when we’re able to directly observe the research being useful for our goal.
It’s almost orthogonal to eigen-evaluation. You can arrive at consensus in lots of ways.
Can you give an example you think is a different way? My guess is I will consider it to fall under eigen-evaluation too.
A different way of arriving at consensus? I’m kind of annoyed that there’s apparently a practice of not proactively thinking of examples, but ok:
If ~everyone is deferring, then they’ll converge on some combination of whoever isn’t deferring and whatever belief-like objects emerge from the depths in that context.
If ~everyone just wishes to be paid and the payers pay for X, then ~everyone will apparently believe X.
If someone is going around threatening people to believe X, then people will believe X.
Deferring is straightforwardly a thing you can do with your “vote” in eigen-evaluation. As I wrote:
Perhaps the word “evaluation” in there is what’s misleading.
Being paid or threatened feel so degenerate (and to result in professions of belief) that I hadn’t really considered them. Still, suppose there are different people paying or voting in different directions, I think how those net out in what’s regarded as “good” will be via an eigen-evaluation process.
On second thought, I do think payment/coercion might be what “people believe X because it’s advantageous” is equivalent to. For example, they end up favoring views/research X because that gets you more access to resources (researchers in X are better funded, etc).
Meta; I think it’s good to proactively think of examples if you can, and good to provide them too.
My position is approx “whenever there’s group aggregate belief, it arises from an eigen- process”. (True even when you’ve got direct-evaluation, though so quantitatively different as to be qualitatively different.)
Predicting that whatever you say will also be eigen-evaluation according to me makes it hard to figure what you think isn’t.
ETA: This perhaps inspires me to write a post arguing for this larger point. Like it’s the same mechanism with “status”, fashion, and humor too.
If everyone calculates 67*23 in their head, they’ll reach a partial consensus. People who disagree with the consensus can ask for an argument, and they’ll get a convincing argument which will convince them of the correct answer; and if the argument is unconvincing, and they present a convincing argument for a different answer, that answer will become the consensus. We thus arrive at consensus with no eigening. If this isn’t how things play out, it’s because there’s something wrong with the consensus / with the people’s epistemics.
Hmm, okay, I think I’ve made an update (not necessarily to agree with you entirely, but still an update on my picture, so thanks).
I was thinking that if a group of people all agree on particular axioms or rules of inferences, etc., then that will be where eigening is occurring even if given sufficiently straightforward axioms, the group members will achieve consensus without further eigening. But possibly you can get consensus on the axioms just via selection and via individuals using their inside-view to adopt them or not. That’s still a degree of “we agreed”, but not eigening.
Huh. Yeah, that’s an interesting case which yeah, plausibly doesn’t require any eigening. I think the plausibility comes from it being a case where someone can so fully do it from their personal inside view (the immediate calculation and also their belief in how the underlying mathematical operations ought to work).
I don’t think it scales to anything interesting (def not alignment research), but it is conceptually interesting for how I’ve been thinking about this.
No, it’s the central example for what would work in alignment. You have to think about the actual problem. The difficulty of the problem and illegibility of intermediate results means eigening becomes dominant, but that’s a failure mode.
Interesting to consider it a failure mode. Maybe it is. Or is at least somewhat.
I’ve got another post on eigening in the works, I think that might provide clearer terminology for talking about this, if you’ll have time to read it.
I agree that eigening isn’t the key concept for alignment or other scientific process. Sure you could describe any consensus that way, but they could be either very good or just awful depending on how much valid analysis went into each step of doing that eigening. In a really good situation, progress toward consensus is only superficially describable as eigening. The real progress is happening by careful thinking and communicating. The eigening isn’t happening by reputation but by quality of work. In a bad field, eigening is doing most of the work.
Referring to them both as eigening seems to obscure the difference between good and bad science/theory creation.
But yeah if you mean “I don’t think it scales to successfully staking out territory around a grift” that seems right.
I’d guess “make sense” means something higher-bar, harder-to-achieve to you than it does to most people. Are there other ways to say “make sense” that pin down the necessary level of making sense you mean to refer to?
This is a reasonable question, but seems hard to answer satisfyingly. Maybe something with a similar spirit to “stands up to multiple rounds of cross-examination and hidden-assumption-explicitization”.
Personally, I’d say that at least part of what I would categorize under ‘make sense’ is objective. Namely, that you have in your scientific proposal a mechanism of action and consequence which is logically coherent. As in, an evaluation of the abstract symbolic logic of the proposal. Meeting that should be a sort of ‘minimum bar’ for considering a scientific proposal as worth discussing.
However, there will always be complications in the world which can’t be simplified down to logical assertions, so that’s only the start of the journey. However, you should feel free to reject proposals which start out making arguments that contradict themselves.