You can publish it, including the output of a standard hash function applied to the secret password. “Any real note will contain a preimage of this hash.”
TsviBT
Let me reask a subset of the question that doesn’t use the word “lie”. When he convinced you to not mention Olivia, if you had known that he had also been trying to keep information about Olivia’s involvement in related events siloed away (from whoever), would that have raised a red flag for you like “hey, maybe something group-epistemically anti-truth-seeking is happening here”? Such that e.g. that might have tilted you to make a different decision. I ask because it seems like relevant debugging info.
I admit this was a biased omission, though I don’t think it was a lie
Would you acknowledge that if JDP did this a couple times, then this is a lie-by-proxy, i.e. JDP lied through you?
That’s a big question, like asking a doctor “how do you make people healthy”, except I’m not a doctor and there’s basically no medical science, metaphorically. My literal answer is “make smarter babies” https://www.lesswrong.com/posts/jTiSWHKAtnyA723LE/overview-of-strong-human-intelligence-amplification-methods , but I assume you mean augmenting adults using computer software. For the latter: the only thing I think I know is that you’d have to all of the following steps, in order:
Become really good at watching your own thinking processes, including/especially the murky / inexplicit / difficult / pretheoretic / learning-based parts.
Become really really good at thinking. Like, publish technical research that many people acknowledge is high quality, or something like that (maybe without the acknowledgement, but good luck self-grading). Apply 0.
Figure out what key processes from 1. could have been accelerated with software.
Yes, but this also happens within one person over time, and the habit (of either investing, or not, in long-term costly high-quality efforts) can gain Steam in the one person.
If you keep updating such that you always “think AGI is <10 years away” then you will never work on things that take longer than 15 years to help. This is absolutely a mistake, and it should at least be corrected after the first round of “let’s not work on things that take too long because AGI is coming in the next 10 years”. I will definitely be collecting my Bayes points https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce
I have been very critical of cover ups in lesswrong. I’m not going to name names and maybe you don’t trust me. But I have observed this all directly
Can you give whatever more information you can, e.g. to help people know whether you’re referring to the same or different events that they already know about? E.g., are you talking about this that have already been mentioned on the public internet? What time period/s did the events you’re talking about happen in?
In theory, possibly, but it’s not clear how to save the world given such restricted access. See e.g. https://www.lesswrong.com/posts/NojipcrFFMzNx6Grc/sudo-s-shortform?commentId=onKfTrunn2Q2Gc4Pw
In practice no, because you can’t deal with a superintelligence safely. E.g.
You can’t build a computer system that’s robust to auto-exfiltration. I mean, maybe you can, but you’re taking on a whole bunch more cost, and also hoping you didn’t screw up.
You can’t develop this tech without other people stealing it and running it unsafely.
You can’t develop this tech safely at all, because in order to develop it you have to do a lot more than just get a few outputs, you have to, like, debug your code and stuff.
And so forth. Mainly and so forth.
Less concerned about PR risks than most funders
Just so it’s said somewhere, LTFF is probably still too concerned with PR. (I don’t necessarily mean that people working at LTFF are doing something wrong / making a mistake. I don’t have enough information to make a guess like that. E.g., they may be constrained by other people, etc. Also, I don’t claim there’s another major grant maker that’s less constrained like this.) What I mean is, there are probably projects that are feasibly-knowably good but that LTFF can’t/won’t fund because of PR. So for funders with especially high tolerance for PR and/or ability / interest in investigating PR risks that seem bad from far away, I would recommend against LTFF, in favor of making more specific use of that special status, unless you truly don’t have the bandwidth to do so, even by delegating.
(Off topic, but I like your critique here and want to point you at https://www.lesswrong.com/posts/7RFC74otGcZifXpec/the-possible-shared-craft-of-deliberate-lexicogenesis just in case you’re interested.)
I totally agree, you should apply to PhD programs. (In stem cell biology.)
The former doesn’t necessarily imply the latter in general, because even if we are systematically underestimating the realistic upper bound for our skill level in these areas, we would still have to deal with diminishing marginal returns to investing in any particular one.
On the other hand, even if what you say is true, skill headroom may still imply that it’s worth building shared arts around such skills. Shareability and build-on-ability changes the marginal returns a lot.
Philology is philosophy, because it lets you escape the trap of the language you were born with. Much like mathematics, humanity’s most ambitious such escape attempt, still very much in its infancy.
True...
If you really want to express the truth about what you feel and see, you need to be inventing new languages. And if you want to preserve a culture, you must not lose its language.
I think this is a mistake, made by many. It’s a retreat and an abdication. We are in our native language, so we should work from there.
My conjecture (though beware mind fallacy), is that it’s because you emphasize “naive deference” to others, which looks obviously wrong to me and obviously not what most people I know who suffer from this tend to do (but might be representative of the people you actually met).
Instead, the mental move that I know intimately is what I call “instrumentalization” (or to be more memey, “tyranny of whys”). It’s a move that doesn’t require another or a social context (though it often includes internalized social judgements from others, aka superego); it only requires caring deeply about a goal (the goal doesn’t actually matter that much), and being invested in it, somewhat neurotically.
I’m kinda confused by this. Glancing back at the dialogue, it looks like most of the dialogue emphasizes general “Urgent fake thinking”, related to backchaining and slaving everything to a goal; it mentions social context in passing; and then emphasizes deference in the paragraph starting “I don’t know.”.
But anyway, I strongly encourage you to write something that would communicate to past-Adam the thing that now seems valuable to you. :)
That’s my guess at the level of engagement required to understand something. Maybe just because when I’ve tried to use or modify some research that I thought I understood, I always realise I didn’t understand it deeply enough. I’m probably anchoring too hard on my own experience here, other people often learn faster than me.
Hm. A couple things:
Existing AF research is rooted in core questions about alignment.
Existing AF research, pound for pound / word for word, and even idea for idea, is much more unnecessary stuff than necessary stuff. (Which is to be expected.)
Existing AF research is among the best sources of compute-traces of trying to figure some of this stuff out (next to perhaps some philosophy and some other math).
Empirically, most people who set out to stuff existing AF fail to get many of the deep lessons.
There’s a key dimension of: how much are you always asking for the context? E.g.: Why did this feel like a mainline question to investigate? If we understood this, what could we then do / understand? If we don’t understand this, are we doomed / how are we doomed? Are there ways around that? What’s the argument, more clearly?
It’s more important whether people are doing that, than whether / how exactly they engage with existing AF research.
If people are doing that, they’ll usually migrate away from playing with / extending existing AF, towards the more core (more difficult) problems.
I was thinking “should grantmakers let the money flow to unknown young people who want a chance to prove themselves.”
Ah ok you’re right that that was the original claim. I mentally autosteelmanned.
I’m curious how satisfied people seemed to be with the explanations/descriptions of consciousness that you elicited from them. E.g., on a scale from
“Oh! I figured it out; what I mean when I talk about myself being consciousness, and others being conscious or not, I’m referring to affective states / proprioception / etc.; I feel good about restricting away other potential meanings.”
to
“I still have no idea, maybe it has something to do with X, that seems relevant, but I feel there’s a lot I’m not understanding.”
where did they tend to land, and what was the variance?
We agree this is a crucial lever, and we agree that the bar for funding has to be in some way “high”. I’m arguing for a bar that’s differently shaped. The set of “people established enough in AGI alignment that they get 5 [fund a person for 2 years and maybe more depending how things go in low-bandwidth mentorship, no questions asked] tokens” would hopefully include many people who understand that understanding constraints is key and that past research understood some constraints.
build on past agent foundations research
I don’t really agree with this. Why do you say this?
a lot of wasted effort if you asked for out-of-paradigm ideas.
I agree with this in isolation. I think some programs do state something about OOP ideas, and I agree that the statement itself does not come close to solving the problem.
(Also I’m confused about the discourse in this thread (which is fine), because I thought we were discussing “how / how much should grantmakers let the money flow”.)
upskilling or career transition grants, especially from LTFF, in the last couple of years
Interesting; I’m less aware of these.
How are they falling short?
I’ll answer as though I know what’s going on in various private processes, but I don’t, and therefore could easily be wrong. I assume some of these are sort of done somewhere, but not enough and not together enough.
Favor insightful critiques and orientations as much as constructive ideas. If you have a large search space and little traction, a half-plane of rejects is as or more valuable than a guessed point that you knew how to even generate.
Explicitly allow acceptance by trajectory of thinking, assessed by at least a year of low-bandwidth mentorship; deemphasize agenda-ish-ness.
For initial exploration periods, give longer commitments with less required outputs; something like at least 2 years. Explicitly allow continuation of support by trajectory.
Give a path forward for financial support for out of paradigm things. (The Vitalik fellowship, for example, probably does not qualify, as the professors, when I glanced at the list, seem unlikely to support this sort of work; but I could be wrong.)
Generally emphasize judgement of experienced AGI alignment researchers, and deemphasize judgement of grantmakers.
Explicitly asking for out of paradigm things.
Do a better job of connecting people. (This one is vague but important.)
(TBC, from my full perspective this is mostly a waste because AGI alignment is too hard; you want to instead put resources toward delaying AGI, trying to talk AGI-makers down, and strongly amplifying human intelligence + wisdom.)
grantmakers have tried pulling that lever a bunch of times
What do you mean by this? I can think of lots of things that seem in some broad class of pulling some lever that kinda looks like this, but most of the ones I’m aware of fall greatly short of being an appropriate attempt to leverage smart young creative motivated would-be AGI alignment insight-havers. So the update should be much smaller (or there’s a bunch of stuff I’m not aware of).
FWIW I agree that personality traits are important. A clear case is that you’d want to avoid combining very low conscientiousness with very high disagreeability, because that’s something like antisocial personality disorder or something. But, you don’t want to just select against those traits, because weaker forms might be associated with creative achievement. However, IQ, and more broadly cognitive capacity / problem-solving ability, will not become much less valuable soon.