james.lucassen

Karma: 621

jlucassen.com

james.lucassen Jun 11, 2022, 6:43 PM
1 point
in reply to: Alex_Altair’s comment on: Optimization and Adequacy in Five Bullets
Thanks! Edits made accordingly. Two notes on the stuff you mentioned that isn’t just my embarrassing lack of proofreading:
- The definition of optimization used in Risks From Learned Optimization is actually quite different from the definition I’m using here. They say:
  
  “a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system.”
  
  I personally don’t really like this definition, since it leans quite hard on reifying certain kinds of algorithms—when is there “really” explicit search going on? Where is the search space? When does a configuration of atoms consitute an objective function? Using this definition strictly, humans aren’t *really* optimizers, we don’t have an explicit objective function written down anywhere. Balls rolling down hills aren’t optimizers either.
  
  But by the definition of optimization I’ve been using here, I think pretty much all evolved organisms have to be at least weak optimizers, because survival is hard. You have manage constraints from food and water and temperature and predation etc… the window of action-sequences that lead to successful reproduction are really quite narrow compared to the whole space. Maintaining homeostasis requires ongoing optimization pressure.
- Agree that not all optimization processes fundamentally have to be produced by other optimization processes, and that they can crop up anywhere you have the necessary negentropy resevoir. I think my claim is that optimization processes are by default rare (maybe this is exactly because they require negentropy?). But since optimizers beget other optimizers at a rate much higher than background, we should expect the majority of optimization to arise from other optimization. Existing hereditary trees of optimizers grow deeper much faster than new roots spawn, so we should expect roots to occupy a negligible fraction of the nodes as time goes on.

james.lucassen Jun 8, 2022, 12:21 AM
1 point
0
in reply to: Aryeh Englander’s comment on: AGI Safety FAQ / all-dumb-questions-allowed thread
Whatever you end up doing, I strongly recommend taking a learning-by-writing style approach (or anything else that will keep you in critical assessment mode rather than classroom mode). These ideas are nowhere near solidified enough to merit a classroom-style approach, and even if they were infallible, that’s probably not the fastest way to learn them and contribute original stuff.
The most common failure mode I expect for rapid introductions to alignment is just trying to absorb, rather than constantly poking and prodding to get a real working understanding. This happened to me, and wasted a lot of time.

james.lucassen Jun 8, 2022, 12:11 AM
14 points
0
in reply to: No77e’s comment on: AGI Safety FAQ / all-dumb-questions-allowed thread
This is the exact problem StackExchange tries to solve, right? How do we get (and kickstart the use of) an Alignment StackExchange domain?

james.lucassen Jun 8, 2022, 12:08 AM
4 points
1
in reply to: Adam Zerner’s comment on: AGI Safety FAQ / all-dumb-questions-allowed thread
Agree it’s hard to prove a negative, but personally I find the following argument pretty suggestive:
“Other AGI labs have some plans—these are the plans we think are bad, and a pivotal act will have to disrupt them. But if we, ourselves, are an AGI lab with some plan, we should expect our pivotal agent to also be able to disrupt our plans. This does not directly lead to the end of the world, but it definitely includes root access to the datacenter.”

Optimization and Adequacy in Five Bullets

james.lucassenJun 6, 2022, 5:48 AM

35 points

2 comments4 min readLW link

(jlucassen.com)

james.lucassen Jun 3, 2022, 6:27 PM
1 point
AF
on: Understanding the two-head strategy for teaching ML to answer questions honestly
Proposed toy examples for G:
- G is “the door opens”, a- is “push door”, a+ is “some weird complicated doorknob with a lock”. Pretty much any b- can open a-, but only a very specific key+manipulator combo opens a+. a+ is much more informative about successful b than a- is.
- G is “I make a million dollars”, a- is “straightforward boring investing”, a+ is “buy a lottery ticket”. A wide variety of different world-histories b can satisfy a-, as long as the markets are favorable—but a very narrow slice can satisfy a+. a+ is a more fragile strategy (relative to noise in b) than a- is.

james.lucassen May 21, 2022, 12:15 AM
1 point
on: Why does gradient descent always work on neural networks?
it doesn’t work if your goal is to find the optimal answer, but we hardly ever want to know the optimal answer, we just want to know a good-enough answer.
Also not an expert, but I think this is correct

james.lucassen May 15, 2022, 10:11 PM
3 points
on: Definition Practice: Applied Rationality
Paragraph:
When a bounded agent attempts a task, we observe some degree of success. But the degree of success depends on many factors that are not “part of” the agent—outside the Cartesian boundary that we (the observers) choose to draw for modeling purposes. These factors include things like power, luck, task difficulty, assistance, etc. If we are concerned with the agent as a learner and don’t consider knowledge as part of the agent, factors like knowledge, skills, beliefs, etc. are also externalized. Applied rationality is the result of attempting to distill this big complicated mapping from (agent, power, luck, task, knowledge, skills, beliefs, etc) → success down to just agent → success. This lets us assign each agent a one-dimensional score: “how well do you achieve goals overall?” Note that for no-free-lunch reasons, this already-fuzzy thing is further fuzzified by considering tasks according to the stuff the observer cares about somehow.
Sentence:
Applied rationality is a property of a bounded agent, which attempts to describe how successful that agent tends to be when you throw tasks at it, while controlling for both “environmental” factors such as luck and “epistemic” factors such as beliefs.
Follow-up:
In this framing, it’s pretty easy to define epistemic rationality analogously compressing from everything → prediction loss to just agent → prediction loss.
However, in retrospect I think the definition I gave here is pretty identical to how I would have defined “intelligence”, just without reference to the “mapping broad start distribution to narrow outcome distribution” idea (optimization power) that I usually associate with that term. If anyone could clarify specifically the difference between applied rationality and intelligence, I would be interested.
Maybe you also have to control for “computational factors” like raw processing power, or something? But then what’s left inside the Cartesian boundary? Just the algorithm? That seems like it has potential, but still feels messy.

james.lucassen Apr 19, 2022, 10:14 PM
1 point
in reply to: Dambi’s comment on: Three useful types of akrasia
This leans a bit close to the pedantry side, but the title is also a bit strange when taken literally. Three useful types (of akrasia categories)? Types of akrasia, right, not types of categories?
That said, I do really like this classification! Introspectively, it seems like the three could have quite distinct causes, so understanding which category you struggle with could be important for efforts to fix.
Props for first post!

james.lucassen Apr 17, 2022, 6:16 PM
6 points
on: Fuck Your Miracle Year
Trying to figure out what’s being said here. My best guess is two major points:
- Meta doesn’t work. Do the thing, stop trying to figure out systematic ways to do the thing better, they’re a waste of time. The first thing any proper meta-thinking should notice is that nobody doing meta-thinking seems to be doing object level thinking any better.
- A lot of nerds want to be recognized as Deep Thinkers. This makes meta-thinking stuff really appealing for them to read, in hopes of becoming a DT. This in turn makes it appealing for them to write, since it’s what other nerds will read, which is how they get recognized as a DT. All this is despite the fact that it’s useless.

james.lucassen Apr 13, 2022, 3:25 AM
8 points
in reply to: Not Relevant’s comment on: “Fragility of Value” vs. LLMs
Ah, gotcha. I think the post is fine, I just failed to read.
If I now correctly understand, the proposal is to ask a LLM to simulate human approval, and use that as the training signal for your Big Scary AGI. I think this still has some problems:
- Using an LLM to simulate human approval sounds like reward modeling, which seems useful. But LLM’s aren’t trained to simulate humans, they’re trained to predict text. So, for example, an LLM will regurgitate the dominant theory of human values, even if it has learned (in a Latent Knowledge sense) that humans really value something else.
- Even if the simulation is perfect, using human approval isn’t a solution to outer alignment, for reasons like deception and wireheading
I worry that I still might not understand your question, because I don’t see how fragility of value and orthogonality come into this?

james.lucassen Apr 13, 2022, 2:25 AM
14 points
−1
on: “Fragility of Value” vs. LLMs
The key thing here seems to be the difference between understanding a value and having that value. Nothing about the fragile value claim or the Orthogonality thesis says that the main blocker is AI systems failing to understand human values. A superintelligent paperclip maximizer could know what I value and just not do it, the same way I can understand what the paperclipper values and choose to pursue my own values instead.
Your argument is for LLM’s understanding human values, but that doesn’t necessarily have anything to do with the values that they actually have. It seems likely that their actual values are something like “predict text accurately”, and this requires understanding human values but not adopting them.

james.lucassen Apr 6, 2022, 6:15 AM
1 point
on: 5-Minute Advice for EA Global
now this is how you win the first-ever “most meetings” prize

james.lucassen Apr 5, 2022, 3:21 AM
53 points
on: What an actually pessimistic containment strategy looks like
Agree that this is definitely a plausible strategy, and that it doesn’t get anywhere near as much attention as it seemingly deserves, for reasons unknown to me. Strong upvote for the post, I want to see some serious discussion on this. Some preliminary thoughts:
- How did we get here?
  - If I had to guess, the lack of discussion on this seems likely due to a founder effect. The people pulling the alarm in the early days of AGI safety concerns were disproportionately to the technical/philosophical side rather than to the policy/outreach/activism side.
  - In early days, focus on the technical problem makes sense. When you are the only person in the world working on AGI, all the delay in the world won’t help unless the alignment problem gets solved. But we are working at very different margins nowadays.
  - There’s also an obvious trap which makes motivated reasoning really easy. Often, the first thing that occurs when thinking about slowing down AGI development is sabotage—maybe because this feels urgent and drastic? It’s an obviously bad idea, and maybe that lets us to motivated stopping.
- Maybe the “technical/policy” dichotomy is keeping us from thinking of obvious ways we could be making the future much safer? The outreach org you propose seems like not really either. Would be interested in brainstorming other major ways to affect the world, but not gonna do that in this comment.
- HEY! FTX! OVER HERE!!
  - You should submit this to the Future Fund’s ideas competition, even though it’s technically closed. I’m really tempted to do it myself just to make sure it gets done, and very well might submit something in this vein once I’ve done a more detailed brainstorm.

james.lucassen Apr 1, 2022, 2:30 PM
5 points
AF
on: [Intro to brain-like-AGI safety] 5. The “long-term predictor”, and TD learning
I don’t think I understand how the scorecard works. From:
[the scorecard] takes all that horrific complexity and distills it into a nice standardized scorecard—exactly the kind of thing that genetically-hardcoded circuits in the Steering Subsystem can easily process.
And this makes sense. But when I picture how it could actually work, I bump into an issue. Is the scorecard learned, or hard-coded?
If the scorecard is learned, then it needs a training signal from Steering. But if it’s useless at the start, it can’t provide a training signal. On the other hand, since the “ontology” of the Learning subsystem is learned-from-scratch, then it seems difficult for a hard-coded scorecard to do this translation task.

james.lucassen Mar 30, 2022, 3:15 PM
1 point
in reply to: mingyuan’s comment on: Do a cost-benefit analysis of your technology usage
this is great,thanks!

james.lucassen Mar 28, 2022, 5:23 AM
3 points
on: Do a cost-benefit analysis of your technology usage
What do you think about the effectiveness of the particular method of digital decluttering recommended by Digital Minimalism? What modifications would you recommend? Ideal duration?
One reason I have yet to do a month-long declutter is because I remember thinking something like “this process sounds like something Cal Newport just kinda made up and didn’t particularly test, my own methods that I think of for me will probably better than Cal’s method he thought of for him”.
So far my own methods have not worked.

james.lucassen Mar 12, 2022, 1:06 AM
6 points
in reply to: Valentine’s comment on: We’re already in AI takeoff
Memetic evolution dominates biological evolution for the same reason.
Faster mutation rate doesn’t just produce faster evolution—it also reduces the steady-state fitness. Complex machinery can’t reliably be evolved if pieces of it are breaking all the time. I’m mostly relying No Evolutions for Corporations or Nanodevices plus one undergrad course in evolutionary bio here.
Also, just empirically: memetic evolution produced civilization, social movements, Crusades, the Nazis, etc.
Thank you for pointing this out. I agree with the empirical observation that we’ve had some very virulent and impactful memes. I’m skeptical about saying that those were produced by evolution rather than something more like genetic drift, because of the mutation-rate argument. But given that observation, I don’t know if it matters if there’s evolution going on or not. What we’re concerned with is the impact, not the mechanism.
I think at this point I’m mostly just objecting to the aesthetic and some less-rigorous claims that aren’t really important, not the core of what you’re arguing. Does it just come down to something like:
“Ideas can be highly infectious and strongly affect behavior. Before you do anything, check for ideas in your head which affect your behavior in ways you don’t like. And before you try and tackle a global-scale problem with a small-scale effort, see if you can get an idea out into the world to get help.”

james.lucassen Mar 11, 2022, 10:51 PM
2 points
in reply to: Valentine’s comment on: We’re already in AI takeoff
I think we’re seeing Friendly memetic tech evolving that can change how influence comes about.
Wait, literally evolving? How? Coincidence despite orthogonality? Did someone successfully set up an environment that selects for Friendly memes? Or is this not literally evolving, but more like “being developed”?
The key tipping point isn’t “World leaders are influenced” but is instead “The Friendly memetic tech hatches a different way of being that can spread quickly.” And the plausible candidates I’ve seen often suggest it’ll spread superexponentially.
Whoa! I would love to hear more about these plausible candidates.
There’s insufficient collective will to do enough of the right kind of alignment research.
I parse this second point as something like “alignment is hard enough that you need way more quality-adjusted research-years (QARY’s?) than the current track is capable of producing. This means that to have any reasonable shot at success, you basically have to launch a Much larger (but still aligned) movement via memetic tech, or just pray you’re the messiah and can singlehandedly provide all the research value of that mass movement.”. That seems plausible, and concerning, but highly sensitive to difficulty of alignment problem—which I personally have practically zero idea how to forecast.

james.lucassen Mar 10, 2022, 1:34 AM
9 points
in reply to: kyleherndon’s comment on: We’re already in AI takeoff
Ah, so on this view, the endgame doesn’t look like
“make technical progress until the alignment tax is low enough that policy folks or other AI-risk-aware people in key positions will be able to get an unaware world to pay it”
But instead looks more like
“get the world to be aware enough to not bumble into an apocalypse, specifically by promoting rationality, which will let key decision-makers clear out the misaligned memes that keep them from seeing clearly”
Is that a fair summary? If so, I’m pretty skeptical of the proposed AI alignment strategy, even conditional on this strong memetic selection and orthogonality actually happening. It seems like this strategy requires pretty deeply influencing the worldview of many world leaders. That is obviously very difficult because no movement that I’m aware of has done it (at least, quickly), and I think they all would like to if they judged it doable. Importantly, the reduce-tax strategy requires clarifying and solving a complicated philosophical/technical problem, which is also very difficult. I think it’s more promising for the following reasons:
- It has a stronger precedent (historical examples I’d reference include the invention of computability theory, the invention of information theory and cybernetics, and the adventures in logic leading up to Godel)
- It’s more in line with rationalists’ general skill set, since the group is much more skewed towards analytical thinking and technical problem-solving than towards government/policy folks and being influential among those kinds of people
- The number of people we would need to influence will go up as AGI tech becomes easier to develop, and every one is a single point of failure.
To be fair, these strategies are not in a strict either/or, and luckily use largely separate talent pools. But if the proposal here ultimately comes down to moving fungible resources towards the become-aware strategy and away from the technical-alignment strategy, I think I (mid-tentatively) disagree

james.lucassen

Op­ti­miza­tion and Ad­e­quacy in Five Bullets

Optimization and Adequacy in Five Bullets