Does Deepseek actually mean that Nvidia is over valued?
I wrote this a few days before the 2025-01-27 market crash but could not post it due to rate limits. One change I made is adding actually to the 1st line.
To be clear I have no intention whatsoever of shorting NVDA
Epistemic status—very speculative but not quite a DMT hallucination
Let’s imagine a very different world…
Super human AI will run on computers not much more expensive than personal computers but perhaps with highly specialized chips maybe even specialized for the task of running a single AI instance
Investment in AI proper will be small relative to AI directed production
There will be a period of increasing marginal returns from AI; but this will eventually become diminishing marginal returns
Even during the period of increasing marginal returns more $$$ will go to AI directed production than AI proper
Companies that most successfully transition to AI will blow the competition away; some of these companies will have a moat & continue to make high profits. But how can such high profits be justified? Maybe the government needs to take 50% of the shares & create trust funds for its citizens.
Companies that buy up the right kinds of land & natural resources will also do well
Companies that are least affected by AI will benefit b/c of the Baumol effect
So what are the get rick quick schemes? Specialized chips, incorporating AI into production systems for non AI goods, strategically buying up the right land & land rights, AI resistant industries!?
Interesting times maybe too interesting
In conclusion I’m agnostic as to whether Nvidia is or is not over valued but other companies may benefit even more as AI advances. I think it’s more about leadership & seizing opportunities more so than a few companies having an overwhelmingly dominant position.
Hzn
For simplicity I’m assuming the activation functions are the step function h(x)=[x>0]…
For ‘backpropagation’ pretend the derivative of this step function is a positive number (A). A=1 being the most obvious choice.
I would also try reverse Hebbian learning ie give the model random input & apply the rule in reverse
“expanding an architecture that works well with one hidden layer and a given learning rule to an architecture with many hidden layers but the same rule universally decreased performance”—personally I don’t find this surprising
NB for h only relative weight matters eg h(5-x+y) = h(0.5-(x-y)/10) so weights going to extreme values effectively decreases the temperature & L1 & L2 penalties may have odd effect