Maxwell Clarke

Karma: 85

Maxwell Clarke Aug 7, 2023, 1:30 AM
0 points
0
in reply to: Alexander Gietelink Oldenziel’s comment on: Darcy’s Shortform
In NZ we have biting bugs called sandflies which don’t do this—you can often tell the moment they get you.

Maxwell Clarke May 24, 2023, 7:02 PM
4 points
0
in reply to: Gunnar_Zarncke’s comment on: No—AI is just as energy-efficient as your brain.
Yes, that’s fair. I was ignoring scale but you’re right that it’s a better comparison if it is between a marginal new human and a marginal new AI.

Maxwell Clarke May 24, 2023, 6:58 PM
2 points
0
in reply to: Joey Marcellino’s comment on: No—AI is just as energy-efficient as your brain.
Well, yes, the point of my post is just to point out that the number that actually matters is the end-to-end energy efficiency — and it is completely comparable to humans.

The per-flop efficiency is obviously worse. But, that’s irrelevant if AI is already cheaper for a given task in real terms.

I admit the title is a little clickbaity but i am responding to a real argument (that humans are still “superior” to AI because the brain is more thermodynamically efficient per-flop)

Maxwell Clarke May 24, 2023, 2:55 AM
7 points
0
in reply to: mako yass’s comment on: No—AI is just as energy-efficient as your brain.
I saw some numbers for algae being 1-2% efficient but it was for biomass rather than dietary energy. Even if you put the brain in the same organism, you wouldn’t expect as good efficiency as that. The difference is that creating biomass (which is mostly long chains of glucose) is the first step, and then the brain must use the glucose, which is a second lossy step.
But I mean there is definitely far-future biopunk options eg. I’d guess it’s easy to create some kind of solar panel organism which grows silicon crystals instead of using chlorophyll.

No—AI is just as energy-efficient as your brain.

Maxwell ClarkeMay 24, 2023, 2:30 AM

11 points

7 comments1 min readLW link

Maxwell Clarke Jan 18, 2023, 12:29 AM
7 points
0
in reply to: MikkW’s comment on: Models Don’t “Get Reward”
Fully agree—if the dog were only trying to get biscuits, it wouldn’t continue to sit later on in it’s life when you are no longer rewarding that behavior.Training dogs is actually some mix of the dog consciously expecting a biscuit, and raw updating on the actions previously taken.

Hear sit → Get biscuit → feel good
becomes
Hear sit → Feel good → get biscuit → feel good
becomes
Hear sit → feel good
At which point the dog likes sitting, it even reinforces itself, you can stop giving biscuits and start training something else

Maxwell Clarke Jan 8, 2023, 1:02 AM
LW: 1 AF: 1
0
AF
on: Categorizing failures as “outer” or “inner” misalignment is often confused
This is a good post, definitely shows that these concepts are confused. In a sense both examples are failures of both inner and outer alignment -
- Training the AI with reinforcement learning is a failure of outer alignment, because it does not provide enough information to fully specify the goal.
- The model develops within the possibilities allowed by the under-specified goal, and has behaviours misaligned with the goal we intended.
Also, the choice to train the AI on pull requests at all is in a sense an outer alignment failure.

Maxwell Clarke Dec 30, 2022, 8:11 PM
1 point
0
on: Exploring Mild Behaviour in Embedded Agents
If we could use negentropy as a cost, rather than computation time or energy use, then the system would be genuinely bounded.

Maxwell Clarke Nov 10, 2022, 3:40 AM
1 point
0
on: A Mystery About High Dimensional Concept Encoding
Gender seems unusually likely to have many connotations & thus redundant representations in the model. What if you try testing some information the model has inferred, but which is only ever used for one binary query? Something where the model starts off not representing that thing, then if it represents it perfectly it will only ever change one type of thing. Like idk, whether or not the text is British or American English? Although that probably has some other connotations. Or whether or not the form of some word (lead or lead) is a verb or a noun.

Agree that gender is a more useful example, just not one tha necessarily provides clarity.

Maxwell Clarke Nov 7, 2022, 1:00 PM
3 points
0
on: A philosopher’s critique of RLHF
Yeah I think this is the fundamental problem. But it’s a very simple way to state it. Perhaps useful for someone who doesn’t believe ai alignment is a problem?

Here’s my summary: Even at the limit of the amount of data & variety you can provide via RLHF, when the learned policy generalizes perfectly to all new situations you can throw at it, the result will still almost certainly be malign because there are still near infinite such policies, and they each behave differently on the infinite remaining types of situation you didn’t manage to train it on yet. Because the particular policy is just one of many, it is unlikely to be correct.

But more importantly, behavior upon self improvement and reflection is likely something we didn’t test. Because we can’t. The alignment problem now requires we look into the details of generalization. This is where all the interesting stuff is.
What links here?
- Compendium of problems with RLHF by Charbel-Raphaël (Jan 29, 2023, 11:40 AM; 120 points)
- Compendium of problems with RLHF by Raphaël S (EA Forum; Jan 30, 2023, 8:48 AM; 18 points)

Maxwell Clarke Nov 7, 2022, 11:04 AM
1 point
1
in reply to: Oliver Siegel’s comment on: How to store human values on a computer
Respect for thinking about this stuff yourself. You seem new to alignment (correct me if I’m wrong) - I think it might be helpful to view posting as primarily about getting feedback rather than contributing directly, unless you have read most of the other people’s thoughts on whichever topic you are thinking/writing about.

Maxwell Clarke Nov 6, 2022, 11:22 AM
1 point
0
in reply to: Maxwell Clarke’s comment on: How to store human values on a computer
Oh or EA forum, I see it’s crossposted

Maxwell Clarke Nov 6, 2022, 11:21 AM
1 point
1
in reply to: Oliver Siegel’s comment on: How to store human values on a computer
I think you might also be interested in this: https://www.lesswrong.com/posts/Nwgdq6kHke5LY692J/alignment-by-default In general John Wentworths alignment agenda is essentially extrapolating your thoughts here and dealing with the problems in it.

It’s unfortunate but I agree with Ruby- your post is fine but a top-level lesswrong post isn’t really the place for it anymore. I’m not sure where the best place to get feedback on this kind of thing is (maybe publish here on LW but as a short-form or draft?) - but you’re always welcome to send stuff to me! (Although busy finishing master’s next couple of weeks)

Maxwell Clarke Nov 6, 2022, 10:54 AM
1 point
0
in reply to: Lonnie Chrisman’s comment on: AI X-risk >35% mostly based on a recent peer-reviewed argument
Great comment, this clarified the distinction of these arguments to me. And IMO this (Michael’s) argument is obviously the correct way to look at it.

Maxwell Clarke Nov 6, 2022, 9:40 AM
LW: 14 AF: 4
3
AF
on: AI X-risk >35% mostly based on a recent peer-reviewed argument
Hey, wanted to chip into the comments here because they are disappointingly negative.

I think your paper and this post are extremely good work. They won’t push forward the all-things-considered viewpoint, but they surely push forward the lower bound (or adversarial) viewpoint. Also because Open Phil and Future Fund use some fraction of lower-end risk in their estimate, this should hopefully wipe that put. Together they much more rigorously lay out classic x-risk arguments.

I think that getting the prior work peer reviewed is also a massive win at least in a social sense. While it isn’t much of a signal here on LW, it is in the wider world. I have very high confidence that I will be referring to that paper in arguments I have in the future, any time the other participant doesn’t give me the benefit of the doubt.
What links here?
- What should I ask Joe Carlsmith — Open Phil researcher, philosopher and blogger? by Robert_Wiblin (EA Forum; Nov 9, 2022, 10:04 PM; 33 points)

Maxwell Clarke Nov 4, 2022, 9:13 AM
1 point
0
on: Humans do acausal coordination all the time
I fully agree*. I think the reason most people disagree, and thing the post is missing is a big disclaimer about exactly when this applies. It applies if and only if another person is following the same decision procedure to you.

For the recycling case, this is actually common!

For voting, it’s common only in certain cases. e.g. here in NZ last election there was a party TOP which I ran this algorithm for, and had this same take re. voting, and thought actually a sizable fraction of the voters (maybe >30% of people who might vote for that party) were probably following the same algorithm. I made my decision based on what I thought the other voters would do, which I thought was that probably somehat fewer would vote for TOP than in the last election (where the party didn’t get into parliament), and decided not to vote for TOP. Lo and behold, TOP got around half the votes they did the previous election! (I think this was the correct move because I don’t think the number of people following that decision procedure increased)

*except confused by the taxes example?

Maxwell Clarke Nov 4, 2022, 9:00 AM
1 point
0
in reply to: Ruby’s comment on: Humans do acausal coordination all the time
Props for showing moderation in public

Maxwell Clarke Oct 21, 2022, 3:59 AM
2 points
0
AF
on: Distilled Representations Research Agenda
Hey—reccommend looking at this paper: https://arxiv.org/abs/1807.07306

It shows a more elegant way than KL regularization for bounding the bit-rate of an auto-encoder bottleneck. This can be used to find the representations which are most important at a given level of information.

Maxwell Clarke Oct 19, 2022, 11:51 AM
1 point
0
in reply to: alexlyzhov’s comment on: My tentative interpretability research agenda—topology matching.
Thanks for these links, especially the top one is pretty interesting work

Maxwell Clarke Oct 14, 2022, 4:14 AM
1 point
0
in reply to: Gerald Monroe’s comment on: Objects in Mirror are Closer Than They Appear
Great—yeah just because it’s an attractor state doesn’t mean it’s simple to achieve—still needs the right setup to realize the compounding returns to intelligence. The core hard thing is that improvements to the system need to cause further improvements to the system, but in the initial stages that’s not true—all improvements are done by the human.

Maxwell Clarke

No—AI is just as en­ergy-effi­cient as your brain.

No—AI is just as energy-efficient as your brain.