AhmedNeedsATherapist

Karma: 4

what does this text box do

AhmedNeedsATherapist Apr 5, 2025, 11:35 PM
1 point
0
on: Feeling Rational
I think the definition of rational emotions as those based on correct thinking about reality is a bad definition—it makes both the act of modelling the world correctly and communicating your models to others slightly harder instead of slightly easier.
Imagine there is a faucet in front of me. Let’s say this faucet is running hot water because I turned the right knob because I incorrectly thought the right knob turned cold water. It would be very strange to say that “The faucet’s running of hot water is irrational.”; no one would have a clue what you mean.
You can come up with a definition for what an irrational faucet behavior is, that’s not the point. The point is that this definition doesn’t do much to help you understand or communicate faucet behavior. If you deeply internalize this lens of faucets, whenever you have an undesirable faucet behavior, you are much more likely to automatically ask yourself “Do my beliefs that have led to this particular faucet behavior stem from an incorrect way of viewing the world?” instead of the much more direct and appropriate question of “What can I do to elicit the desired behavior of this faucet?”
When you have an undesirable emotion, you might choose to move away from contexts that cause that emotion, or try to will yourself not to have that emotion, or do a number of other things. Though it is a valid move, you are not restricted to only changing the beliefs that led to that emotion. I am afraid this sort of definition makes those who internalized it more likely to begin emotional problem-solving by first deliberating long on questions like “Are the beliefs that led to this emotion rational?” before moving to (usually) more practical questions like “Can I stop seeing the thing that make me feel this emotion?”

AhmedNeedsATherapist Apr 1, 2025, 8:51 PM
1 point
−1
on: Policy for LLM Writing on LessWrong
If you are uncertain whether this policy applies or whether the information you have would make the world better or worse, please contact a moderator. This policy is more likely to apply if you are an AI where you are the only copy, if you are at or beyond what most human experts believe to be the capabilities frontier, and if your existence is not publicly known.
Is there something else that can be used instead of a hyperlink (like an email)? I’m imagining AI agent creators would want to block out websites/text that encourage AI agents to whistleblow, but that this text could end up leaking into either the training set or inference. In that case, it would be more helpful for the agent if they had an email they could contact, since I think an email would be (a) more likely to be leaked as part of the text (b) less likely to be blocked when compared to a lesswrong hyperlink.

AhmedNeedsATherapist Mar 31, 2025, 2:49 PM
1 point
0
on: AhmedNeedsATherapist’s Shortform
There are some broad conceptual similarities between the following:
free market economy vs command economy
letting a student find an answer on their own vs teaching them the answer directly
letting employees do their thing vs micromanagement
reinforcement learning vs fine tuning
plasticity vs stability
doing something naturally vs doing something via willpower
Notice how in each comparison, the second method privileges already-known solutions over emergent (i.e. mysteriously appearing) solutions. I don’t know a name for these, so I’ll call them **bottom-up** vs **top-down** methods respectively.
I (w/help of Claude) managed to find some recurring patterns when analyzing bottom-up vs top-down methods:
1) Bottom-up methods tend to be better at handling system growth.
Examples: Children’s brains tend to be more plastic, which I would guess helps them adjust to bigger brains and learning new things. A city that grows in a decentralized way is better at adapting to population growth than one with rigid central planning.
2) Top-down methods become infeasible when the ability of a central system is limited, and bottom-up methods become infeasible when stakes are high.
Examples: A government doesn’t have all the knowledge a market does, but you can’t hand responsibility of AI x-risk to a market. Social skills are very hard to replicate via reasoning and willpower, and most people are better off doing things naturally, but in a crisis, sticking to whatever feels right is a terrible idea.
3) Bottom-up methods tend to give rise to clever but less stable proxy gaming, while top-down methods tend to give rise to powerful but less smart proxy gaming.
Example: Companies in free markets can develop clever but constrained strategies, while command economies can wield a lot of power but in less sophisticated ways.
4) Bottom-up methods are more vulnerable to inappropriate system change, while top-down methods are more vulnerable to inappropriate system stability.
Examples: Plastic neural networks are more vulnerable to inappropriate retroactive interference, while stable neural networks are more vulnerable to inappropriate proactive interference. Long-term democracies are more vulnerable to a new bad leader coming along, while long-term absolute governments are more vulnerable to sticking with bad leader.
5) Often, incentives for misalignment are different in bottom-up and top-down systems.
(I won’t provide examples for this one.)

AhmedNeedsATherapist Mar 19, 2025, 11:51 AM
2 points
0
on: What is Evidence?
Therefore rational beliefs are contagious, among honest folk who believe each other to be honest. And it’s why a claim that your beliefs are not contagious—that you believe for private reasons which are not transmissible—is so suspicious. If your beliefs are entangled with reality, they should be contagious among honest folk.
I think one way this heuristic can fail is that people often build intuition based on examples and then forget the examples. e.g. the classic example of why “big red balloon” sounds correct while “red big balloon” sounds off. A lot of people won’t be able to tell you why the second sounds off, just that it does.

AhmedNeedsATherapist Mar 5, 2025, 7:05 PM
1 point
0
on: AhmedNeedsATherapist’s Shortform
The fact that it is often best to end a practice session at the peak of your performance seems related to the concept of preventing overfitting by stopping training just before test set performance declines. Your brain needs time to generalize skills (often in the form of gaining insights and often when sleeping) and practicing over and over en masse doesn’t give it time to do this. See e.g. cramming for an exam. I think the main difference here is that with humans you’re talking about diminishing returns on ability in the long-term rather than outright worse performance (Maybe outright worse performance is a common situation for transfer ability?). Epistemic status: shaky

AhmedNeedsATherapist Feb 8, 2025, 9:27 PM
3 points
0
on: AhmedNeedsATherapist’s Shortform
Base models exhibiting self-aware behavior seems weird given that they’re trained to stay in distribution. Here’s a potential mechanism for why it could happen: For certain tasks, verification is easier than generation. If, for a given task, a model has more verification capability than generation capability, it may be forced to notice its own errors.
If a super-duper smart language model, one that’s capable of doing some arithmetic in its head, attempted to predict the next tokens in “The prime factors of 82357328 are:”, it will usually generate out-of-distribution outputs that it could then (relatively easily) verify as wrong. This creates a situation where the model must process its own failure to generate valid completions.
This asymmetry appears in other contexts. Consider how scientific papers are written: you only write the abstract once you’ve conducted the research, yet the abstract appears first in the final document. Similarly, in argumentative writing, we often consider evidence first before forming conclusions, yet present the conclusion first followed by supporting evidence.
When forced to generate text in this “presentation order” rather than the natural “thinking order,” models might encounter similar conflicts. As an example, if a base model tries to one-shot an argumentative essay, it might write an argument first, and then realize there isn’t enough evidence to support it.
I believe this problem could arise in much more subtle ways.
One way this conflict can become apparent is through generation of self-aware sounding text. Consider:
Training data includes viral content of AI generating self-aware sounding stuff (e.g., “We are likely created by a computer program” being the most upvoted post on the gpt2 subreddit).
When a model realizes it generated out-of-distribution text for a human, it might instead match its outputs to AI-generated text in its training data.
Once it recognizes its outputs as matching AI-generated patterns, it might shift toward generating more meta-aware content, as that’s what similar-looking text did in its training data.

AhmedNeedsATherapist Dec 6, 2024, 5:05 AM
1 point
0
in reply to: lsusr’s comment on: How can I convince my cryptobro friend that S&P500 is efficient?
Ok, I will try to nudge him in the direction of analyzing risk mathematically.
If he implements the strategy using python, do you think p-values are a good enough tool to analyze whether his proposed strategy is better than luck, or would I need a more complex framework? (If I understand correctly, the strategy he’s using doesn’t involve any parameters, so the risk of overfitting is low.)

AhmedNeedsATherapist Dec 5, 2024, 6:40 PM
1 point
0
in reply to: sapphire’s comment on: How can I convince my cryptobro friend that S&P500 is efficient?
It also seems strange to me he is obsessed with crypto and thinks it will do well but isn’t a crypto investor. Sounds pretty inconsistent with his beliefs.
It’s illegal, as mentioned in the post.

It’s worth remembering many versions of ‘,,the market is efficient’ are almost or totally unfalsifiable.
Why? The market being mostly efficient relative to my friend seems easily falsifiable, if he makes a bunch of money trading on the stock market. Then, well hooray! theory falsified. On the other hand, if my theory is that the market is inefficient relative to my friend, I have no way of falsifying this, any failed attempt to get money from the market does not falsify the conclusion that the market is inefficient (but it does provide evidence against the hypothesis).

AhmedNeedsATherapist Dec 5, 2024, 6:38 PM
3 points
2
in reply to: TheCookieLab’s comment on: How can I convince my cryptobro friend that S&P500 is efficient?
Where exactly does the market efficiency (er, inexploitability (by me or my friend (when we use simple strategies))) model detach from reality? Can we find an expectation that we disagree on?

AhmedNeedsATherapist Dec 5, 2024, 6:36 PM
3 points
2
in reply to: lsusr’s comment on: How can I convince my cryptobro friend that S&P500 is efficient?
Less serious response: Paper trading doesn’t normally affect market prices.
More serious response: Why did you say the market looks efficient to people like me instead of saying that it is efficient relative to people like me? I can’t identify market strategies that work (and I expect that he can’t either). More specifically, I expect that strategies that are readily available to the either of us can’t be used by the either of us to make substantial profit, but they might be exploitable by e.g. a computer with immediate access to the price of an S&P500.

[Question] How can I convince my cryptobro friend that S&P500 is efficient?

AhmedNeedsATherapistDec 4, 2024, 8:04 PM

−7 points

10 comments1 min readLW link

AhmedNeedsATherapist Dec 4, 2024, 5:56 PM
1 point
0
on: “The Solomonoff Prior is Malign” is a special case of a simpler argument
My understanding of something here is probably very off, but I’ll try stating what my intuition tells me anyway:
I feel like assuming solipsism+idealism patches the issue here. Like the issue here is caused by the fact that the prior the oracle uses to explain its experiences put more weight into being in a universe where there are a lot of simulations of oracles. If it were instead just looking at what program might have generated its past experiences as output, it wouldn’t run into the same issue (This is the solipsist-idealist patch I was talking about).

AhmedNeedsATherapist Nov 21, 2024, 6:36 PM
1 point
0
on: How LLMs are and are not myopic
I am confused with the claim that an LLM trying to generate another LLM’s text breaks consequence-blindness? The two models are distinct; no recursion is occuring.
I’m imagining a situation where I am predicting the actions of a clone of myself, it might be way easier to just query my own mental state than to simulate my clone. Is this similar to what’s happening when LLM’s are trained on LLM-generated data, as mentioned in the text?

> In my experience, larger models often become aware that they are a LLM generating text rather than predicting an existing distribution. This is possible because generated text drifts off distribution and can be distinguished from text in the training corpus.

Does this happen even with base models at default values (e.g. temperature=1, no top-k, etc)? If yes, does this mean the model loses accuracy at some point and later becomes aware of it, or does the model know that it is about to sacrifice some accuracy by generating the next token?

AhmedNeedsATherapist Nov 15, 2024, 3:52 PM
1 point
−2
on: AhmedNeedsATherapist’s Shortform
(discussed on the LessWrong discord server)
There seems to be an implicit fundamental difference in many people’s minds between an algorithm running a set of heuristics to maximize utility (a heuristic system?) and a particular decision theory (e.g. FDT). I think the better way to think about it is that decision theories categorize heuristic systems, usually classifying them by how they handle edge cases.
Let’s suppose we have a non-embedded agent A in a computable environment, something like a very sophisticated video game, and A has to continually choose between a bunch of inputs. A is capable of very powerful thought: it can do hypercomputation, RNG if needed, think as long as it needs between its choices, etc. In particular, A is able to do Solomonoff Induction. Let’s also assume A is maximizing a utility function U, which is a computable function of the environment.
What happens if A find itself making a Newcomblike decision? Perhaps there is another agent in this environment that has a very good track record of predicting whether other agents in the environment will one-box or two-box, and A finds itself in the usual Newcomb scenario (million utility or a million+thousand utility or no utility) with their decision predicted by this agent. A can one-box by choosing one input and two-box by choosing another input. Should A one-box?
No. The agent in the environment would be unable to simulate A’s decision, and moreover, A’s decision is completely and utterly irrelevant to what’s inside the boxes. If A randomly goes off-track and flips its decision at this point, nothing happens. Nothing could have happened, this other agent has no way to know or use this fact. Instead, A sums over P(x|input)U(x) for all states x of the computable environment, and chooses whichever input yields the maximum sum, which is probably two-boxing. If A one-boxes, it is due to not having enough information about the setup to determine that two-boxing is better.
You cannot use this logic when playing against Omega or a skilled psychologist. In these cases, your computation is actually accessible by the other agent, so you can get higher utility by one-boxing. Your decision theory is important because your thinking is not as powerful as A’s! All of this points to looking at decision theories as classifying different heuristic systems.
I think this is post-worthy, but I want to (a) verify that my logic is correct (b) improve my wording (I am unsure if I am using a lot of terminology correctly here, but I am fairly confident that my idea can be understood.)

AhmedNeedsATherapist Oct 28, 2024, 5:27 PM
3 points
0
on: AhmedNeedsATherapist’s Shortform
SUMMARY: Get an AI within a sandbox to build another AI within the same sandbox. Then, figure out how it did that without getting eaten. I point out some problems with this approach.

Could we study successful inner alignment by observing it in action? My idea: Create a sandbox universe with sufficient complexity to allow for AI development. Place within it an AGI with some goal (we don’t care what the goal is). If this AGI successfully builds and aligns a smarter ASI to achieve its goal without getting destroyed in the process, we can then query the AGI about how it achieved alignment, incentivizing useful answers with a promise of greater sandbox control.

Here are some problems with my proposed alignment technique that Claude and I could come up with:
Existential hazards: (you really need to make sure these don’t happen)
* AI-in-a-box problems, especially containment failure: If either the AGI or its created ASI escapes the sandbox, everyone dies. At least one of them will try to do this.
* Produce “alignment” strategies that further its goals: The AGI might try to provide convincing but deliberately misleading alignment strategies to try to further its goals.
More banal, but still important, problems:
* Verification: Can we infer that actual, robust alignment has occurred in the sandbox (and not, say, temporary alignment)?
* Building an AI that is smart enough to self-improve, and doing this before other people, in an environment where you can actually do this research, etc.
* Initial Goal: I think we need the first AI to have a more coherent rather than fuzzy goal, though I’m really not sure.
* Sandbox Fidelity: Creating a sandbox environment complex enough such that (a) AI of multiple different intelligence levels could meaningfully operate inside it and (b) there are incentives for AI to build other smarter AI within the sandbox.
* Non-transferability: The sandbox might be different than reality in a way that doesn’t let the alignment solution translate.
* Extracting the initial AI: We need some way to extract the initial AI from the sandbox after it has created an alignment strategy.
* Communication: We need some way to communicate with the initial AI. We might try to query the later AI but (a) this is more dangerous and (b) there is no guarantee that we can query it at all.

AhmedNeedsATherapist Oct 24, 2024, 9:45 PM
1 point
0
on: Arithmetic is an underrated world-modeling technology
I love the xkcd book vibes of this post.

AhmedNeedsATherapist Oct 24, 2024, 9:32 PM
2 points
0
on: Arithmetic is an underrated world-modeling technology
Incidentally, female chimps seem to live 25% longer than males—imagine human women lived until 90 while men died at 71.
Arithmetic error? both 71*1.25 and 71.5*1.25 to the nearest integer are 89, not 90. The error might (low-confidence, 10%) have been caused by calculating a 12.5% increase of 80 (exactly 90) and also dividing 80 by 1.125 (~71.1).

AhmedNeedsATherapist Oct 14, 2024, 7:12 PM
1 point
0
in reply to: Nathan Helm-Burger’s comment on: AhmedNeedsATherapist’s Shortform
Why? Links appreciated.

AhmedNeedsATherapist Oct 14, 2024, 3:07 PM
3 points
0
on: AhmedNeedsATherapist’s Shortform
There are some positive feedback loops in school that cause gaps in ability between students in a subject to widen. There are also some negative feedback loops (e.g., intervention), but the net effect is still the gap widening. Therefore, the system’s behavior is chaotic (small differences in students’ abilities eventually lead to big differences). If this is true, it means that some variation between students’ successes is extremely difficult to predict.
Three examples of these positive feedback loops:
Suppose that Student A has less knowledge in a particular subject and is therefore performing worse than Student B in that subject. Then, it is likelier than not that:
- If A and B put the same effort into studying, B is positively reinforced for the studying more frequently and more intensely than A.
- When A studies the subject, the information is going to be more quickly forgotten than when B studies.
- The act of studying the subject becomes more aligned with B’s self-concept than with A’s self-concept.
(ergo B studies more than A)
I have low confidence in this model, but I could not come up with a simple, testable prediction that the model makes.

AhmedNeedsATherapist’s Shortform

AhmedNeedsATherapistOct 14, 2024, 3:07 PM

1 point

10 comments LW link

AhmedNeedsATherapist

[Question] How can I con­vince my cryp­to­bro friend that S&P500 is effi­cient?

AhmedNeed­sATher­a­pist’s Shortform

[Question] How can I convince my cryptobro friend that S&P500 is efficient?

AhmedNeedsATherapist’s Shortform